Uploaded image for project: 'Spark Connector'
  1. Spark Connector
  2. SPARK-341

Gracefully handle Decimals with larger precision than the schema

    • Type: Icon: Bug Bug
    • Resolution: Won't Fix
    • Priority: Icon: Unknown Unknown
    • None
    • Affects Version/s: None
    • Component/s: Reads

      We are facing a very strange issue, with PySpark and read on Mongodb.

      When doing reads on a collection that contains +1B records and each record has 101 columns, 96 of them are of the type Decimal128. Sometimes the read by PySpark returns null values for some of those 96 columns. The cols that are not of the type Decimal128 are not impacted, they always return the correct values.

      There's nothing in the PySpark logs that looks like a warning nor error.

      Let's say we do 5 times the read in PySpark, 1 time it's ok, 4 times it's not ok.

      When we do the same read by pymongo it's ok, also with the shell it's ok.

      We did a .validate() on the collection, there it says everything is fine.

        1. chk_ok.log
          35 kB
        2. chk_nok.log
          34 kB

            Assignee:
            ross@mongodb.com Ross Lawley
            Reporter:
            thierry@turpin.be thierry turpin
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

              Created:
              Updated:
              Resolved: