Loading...

XML

Word

Printable

JSON

Type: Bug
Resolution: Won't Fix
Priority: Unknown
Fix Version/s: None
Affects Version/s: None
Component/s: Reads
Labels:
- external-user

Quarter:
- FY23Q4

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Link:
None
Goal Name(s):
None

We are facing a very strange issue, with PySpark and read on Mongodb.

When doing reads on a collection that contains +1B records and each record has 101 columns, 96 of them are of the type Decimal128. Sometimes the read by PySpark returns null values for some of those 96 columns. The cols that are not of the type Decimal128 are not impacted, they always return the correct values.

There's nothing in the PySpark logs that looks like a warning nor error.

Let's say we do 5 times the read in PySpark, 1 time it's ok, 4 times it's not ok.

When we do the same read by pymongo it's ok, also with the shell it's ok.

We did a .validate() on the collection, there it says everything is fine.

- - Sort By Name
  - Sort By Date
  - Ascending
  - Descending
  - Thumbnails
  - List
  - Download All

chk_nok.log
34 kB
Apr 20 2022 10:14:53 AM UTC
chk_ok.log
35 kB
Apr 20 2022 10:14:53 AM UTC

Assignee:: Ross Lawley
Reporter:: thierry turpin
Reviewers:: None
Votes:: 0 Vote for this issue
Watchers:: 1 Start watching this issue

Created:: Apr 20 2022 07:28:36 AM UTC
Updated:: Nov 18 2022 12:11:03 PM UTC
Resolved:: Nov 18 2022 12:11:03 PM UTC

Details

Description

Attachments

Attachments

Forms

Activity

People

Dates