Uploaded image for project: 'Python Integrations'
  1. Python Integrations
  2. INTPYTHON-210

Add support for large_list and large_string PyArrow DataTypes

    • Type: Icon: Improvement Improvement
    • Resolution: Fixed
    • Priority: Icon: Unknown Unknown
    • 1.3
    • Affects Version/s: None
    • Component/s: None
    • None
    • Python Drivers

      Context

      Describe the background behind the problem.

      During development of ARROW-197, adding Polars support, we noticed that `polars.DataFrame.to_arrow()` returned a field defined as`large_list<item: large_string>)`.
      Found this out when attempting to call `_validate_schema`. The call failed because the "checkers" did not include `pyarrow.types.is_large_list()`.

      Definition of done

      What must be done to consider the task complete?

      It turns out that only a few changes are needed. Most importantly, we did not need to create new Builders. We could use ListBuilder and StringBuilder. Their bodies pass most of the work down to pyarrow. We then use the original table's schema to ensure roundtrip integrity.

      To be done, we'll add tests of the new types.

      Pitfalls

      What should the implementer watch out for? What are the risks?

      It would be beneficial for us to discuss good corner cases, the 'large' cases in particular.

            Assignee:
            casey.clements@mongodb.com Casey Clements
            Reporter:
            casey.clements@mongodb.com Casey Clements
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

              Created:
              Updated:
              Resolved: