-
Type: Bug
-
Resolution: Fixed
-
Priority: Unknown
-
Affects Version/s: None
-
Component/s: Schemas
-
None
-
Not Needed
Discovered in ARROW-158, our auto schema detection can yield different table on missing values:
def test_auto_schema_missing_values(self): docs = [ {"a": []}, {"a": ["str"]}, {"a": []}, ] self.coll.delete_many({}) self.coll.insert_many(docs) actual = find_arrow_all(self.coll, {}, projection={"_id": 0}) expected = find_arrow_all(self.coll, {}, projection={"_id": 0}, schema=Schema({"a": list_(string())})) self.assertEqual(actual.schema, expected.schema) self.assertEqual(actual, expected)
Output:
> self.assertEqual(actual, expected) E AssertionError: pyarr[19 chars]em: string> E child 0, item: string E ---- E a: [[null,["str"],[]]] != pyarr[19 chars]em: string> E child 0, item: string E ---- E a: [[[],["str"],[]]] test/test_arrow.py:488: AssertionError
- causes
-
INTPYTHON-418 Arrow regression: ValueError: Schema and number of arrays unequal
- Needs Triage
- has to be finished together with
-
INTPYTHON-179 Use custom Builder class for efficient nested extension type schema generation
- Closed
-
INTPYTHON-230 Improper handling of documents with empty embedded arrays
- Closed
- is related to
-
INTPYTHON-250 Data Loss in PyMongoArrow when working with large volume of data
- Closed
- related to
-
INTPYTHON-158 Fix handling of Document Type Extraction for Auto Schemas
- Closed