Uploaded image for project: 'pymongoarrow'
  1. pymongoarrow
  2. ARROW-214

Support Binary Arrow Data

    • Type: Icon: Task Task
    • Resolution: Unresolved
    • Priority: Icon: Major - P3 Major - P3
    • 1.6
    • Affects Version/s: None
    • Component/s: None
    • None
    • Python Drivers

      PyMongoArrow's treatment of the "binary" data type is murky. It is handled as a PyArrow ExtensionType. It is closest to PyArrow's FixedSizeBinaryType. This ticket is to add support for pyarrow.binary() and pyarrow.large_binary. The following gives a sense of the so-called murkiness.

      pa.binary()
      Out[12]: DataType(binary)
      pa.binary(12)
      Out[13]: FixedSizeBinaryType(fixed_size_binary[12])
      pa.large_binary()
      Out[14]: DataType(large_binary)
      from pymongoarrow.types import BinaryType
      BinaryType(10)
      Out[16]: BinaryType(DataType(binary))
      

      More concretely, the following attempt to write a pyarrow.Table with DataType(binary) fails.

      import pyarrow as pa
      from pymongoarrow.api import write
      from pymongo import MongoClient
      
      coll = MongoClient().db.coll
      aschema = pa.schema([("Binary", pa.binary())])
      table_in = pa.Table.from_pydict({"Binary": [b"1", b"one"]}, schema=aschema)
      write(coll, table_in)
      

      with the following

        File "/Users/casey.clements/src/mongo-arrow/bindings/python/pymongoarrow/api.py", line 432, in write
          _validate_schema(tabular.schema.types)
        File "/Users/casey.clements/src/mongo-arrow/bindings/python/pymongoarrow/types.py", line 324, in _validate_schema
          raise ValueError(msg)
      ValueError: Unsupported data type "binary" in schema
      

            Assignee:
            casey.clements@mongodb.com Casey Clements
            Reporter:
            casey.clements@mongodb.com Casey Clements
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated: