Auto schema detection can yield different table on missing values

XMLWordPrintableJSON

    • Type: Bug
    • Resolution: Fixed
    • Priority: Unknown
    • pymongoarrow-1.6
    • Affects Version/s: None
    • Component/s: Schemas
    • None
    • None
    • Not Needed
    • None
    • None
    • None
    • None
    • None
    • None

      Discovered in ARROW-158, our auto schema detection can yield different table on missing values:

          def test_auto_schema_missing_values(self):
              docs = [
                  {"a": []},
                  {"a": ["str"]},
                  {"a": []},
              ]
              self.coll.delete_many({})
              self.coll.insert_many(docs)
              actual = find_arrow_all(self.coll, {}, projection={"_id": 0})
              expected = find_arrow_all(self.coll, {}, projection={"_id": 0}, schema=Schema({"a": list_(string())}))
              self.assertEqual(actual.schema, expected.schema)
              self.assertEqual(actual, expected)
      

      Output:

      >       self.assertEqual(actual, expected)
      E       AssertionError: pyarr[19 chars]em: string>
      E         child 0, item: string
      E       ----
      E       a: [[null,["str"],[]]] != pyarr[19 chars]em: string>
      E         child 0, item: string
      E       ----
      E       a: [[[],["str"],[]]]
      
      test/test_arrow.py:488: AssertionError
      

              Assignee:
              Steve Silvester
              Reporter:
              Shane Harvey
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Created:
                Updated:
                Resolved: