-
Type: Bug
-
Resolution: Won't Fix
-
Priority: Unknown
-
None
-
Affects Version/s: None
-
Component/s: Reads
We are facing a very strange issue, with PySpark and read on Mongodb.
When doing reads on a collection that contains +1B records and each record has 101 columns, 96 of them are of the type Decimal128. Sometimes the read by PySpark returns null values for some of those 96 columns. The cols that are not of the type Decimal128 are not impacted, they always return the correct values.
There's nothing in the PySpark logs that looks like a warning nor error.
Let's say we do 5 times the read in PySpark, 1 time it's ok, 4 times it's not ok.
When we do the same read by pymongo it's ok, also with the shell it's ok.
We did a .validate() on the collection, there it says everything is fine.