Discovered in ARROW-158, our auto schema detection can yield different table on missing values:
def test_auto_schema_missing_values(self): docs = [ {"a": []}, {"a": ["str"]}, {"a": []}, ] self.coll.delete_many({}) self.coll.insert_many(docs) actual = find_arrow_all(self.coll, {}, projection={"_id": 0}) expected = find_arrow_all(self.coll, {}, projection={"_id": 0}, schema=Schema({"a": list_(string())})) self.assertEqual(actual.schema, expected.schema) self.assertEqual(actual, expected)
Output:
> self.assertEqual(actual, expected) E AssertionError: pyarr[19 chars]em: string> E child 0, item: string E ---- E a: [[null,["str"],[]]] != pyarr[19 chars]em: string> E child 0, item: string E ---- E a: [[[],["str"],[]]] test/test_arrow.py:488: AssertionError
- related to
-
ARROW-158 Fix handling of Document Type Extraction for Auto Schemas
- Closed