Uploaded image for project: 'pymongoarrow'
  1. pymongoarrow
  2. ARROW-165

Auto schema detection can yield different table on missing values

    • Type: Icon: Bug Bug
    • Resolution: Unresolved
    • Priority: Icon: Unknown Unknown
    • 1.6
    • Affects Version/s: None
    • Component/s: Schemas
    • None

      Discovered in ARROW-158, our auto schema detection can yield different table on missing values:

          def test_auto_schema_missing_values(self):
              docs = [
                  {"a": []},
                  {"a": ["str"]},
                  {"a": []},
              ]
              self.coll.delete_many({})
              self.coll.insert_many(docs)
              actual = find_arrow_all(self.coll, {}, projection={"_id": 0})
              expected = find_arrow_all(self.coll, {}, projection={"_id": 0}, schema=Schema({"a": list_(string())}))
              self.assertEqual(actual.schema, expected.schema)
              self.assertEqual(actual, expected)
      

      Output:

      >       self.assertEqual(actual, expected)
      E       AssertionError: pyarr[19 chars]em: string>
      E         child 0, item: string
      E       ----
      E       a: [[null,["str"],[]]] != pyarr[19 chars]em: string>
      E         child 0, item: string
      E       ----
      E       a: [[[],["str"],[]]]
      
      test/test_arrow.py:488: AssertionError
      

            Assignee:
            Unassigned Unassigned
            Reporter:
            shane.harvey@mongodb.com Shane Harvey
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

              Created:
              Updated: