Gracefully handle Decimals with larger precision than the schema

XMLWordPrintableJSON

    • Type: Bug
    • Resolution: Won't Fix
    • Priority: Unknown
    • None
    • Affects Version/s: None
    • Component/s: Reads
    • None
    • None
    • None
    • None
    • None
    • None

      We are facing a very strange issue, with PySpark and read on Mongodb.

      When doing reads on a collection that contains +1B records and each record has 101 columns, 96 of them are of the type Decimal128. Sometimes the read by PySpark returns null values for some of those 96 columns. The cols that are not of the type Decimal128 are not impacted, they always return the correct values.

      There's nothing in the PySpark logs that looks like a warning nor error.

      Let's say we do 5 times the read in PySpark, 1 time it's ok, 4 times it's not ok.

      When we do the same read by pymongo it's ok, also with the shell it's ok.

      We did a .validate() on the collection, there it says everything is fine.

        1. chk_nok.log
          34 kB
        2. chk_ok.log
          35 kB

              Assignee:
              Ross Lawley
              Reporter:
              thierry turpin
              None
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

                Created:
                Updated:
                Resolved: