Uploaded image for project: 'Python Integrations'
  1. Python Integrations
  2. INTPYTHON-155

Ensure compatibility with Pandas 2.0

    • Type: Icon: Task Task
    • Resolution: Fixed
    • Priority: Icon: Unknown Unknown
    • 1.0
    • Affects Version/s: None
    • Component/s: None
    • None

      Pandas just released 2.0.0rc0, we should ensure we are compatible. I ran against the ARROW-15 branch and got the following errors:

      ======================================================= FAILURES =======================================================
      ____________________________________________ TestExplicitPandasApi.test_csv ____________________________________________
      
      self = <test.test_pandas.TestExplicitPandasApi testMethod=test_csv>
      
          def test_csv(self):
              # Pandas csv does not support nested data.
              # cf https://github.com/pandas-dev/pandas/issues/40652
              _, data = self._create_data()
              for name in data.columns.to_list():
                  if isinstance(data[name].dtype, PandasBSONDtype):
                      data = data.drop(labels=[name], axis=1)
      
              with tempfile.NamedTemporaryFile(suffix=".csv") as f:
                  f.close()
                  # May give RuntimeWarning due to the nulls.
                  with warnings.catch_warnings():
                      warnings.simplefilter("ignore", RuntimeWarning)
                      data.to_csv(f.name, index=False, na_rep="")
                  out = pd.read_csv(f.name)
      >           self._assert_frames_equal(data, out)
      
      test/test_pandas.py:315:
      _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
      test/test_pandas.py:108: in _assert_frames_equal
          pd.testing.assert_series_equal(in_col, out_col)
      pandas/_libs/testing.pyx:52: in pandas._libs.testing.assert_almost_equal
          ???
      _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
      
      >   ???
      E   AssertionError: Series are different
      E
      E   Series values are different (33.33333 %)
      E   [index]: [0, 1, 2]
      E   [left]:  [a0, a1, None]
      E   [right]: [a0, a1, nan]
      E   At positional index 2, first diff: None != nan
      
      pandas/_libs/testing.pyx:172: AssertionError
      __________________________________________ TestSetitem.test_setitem_2d_values __________________________________________
      
      self = <test.pandas_types.test_binary.TestSetitem object at 0x14c9dc820>
      data = <PandasBinaryArray>
      [ Binary(b'0.02177712209590621', 10),  Binary(b'0.41848933357903795', 10),
        Binary(b'0.1320307731...    nan,
        Binary(b'0.41125169736048484', 10),     Binary(b'0.59626778121896', 10)]
      Length: 100, dtype: bson_Binary[10]
      
          def test_setitem_2d_values(self, data):
              # GH50085
              original = data.copy()
              df = pd.DataFrame({"a": data, "b": data})
              df.loc[[0, 1], :] = df.loc[[1, 0], :].values
      >       assert (df.loc[0, :] == original[1]).all()
      E       AssertionError
      
      ../../../.venvs/mongo-arrow/lib/python3.10/site-packages/pandas/tests/extension/base/setitem.py:427: AssertionError
      =================================================== warnings summary ===================================================
      test/pandas_types/test_binary.py::TestSetitem::test_setitem_2d_values
        /Users/steve.silvester/workspace/mongo-arrow/bindings/python/pymongoarrow/pandas_types.py:150: DeprecationWarning: elementwise comparison failed; this will raise an error in the future.
          return self.data == other
      
      -- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
      =============================================== short test summary info ================================================
      FAILED test/test_pandas.py::TestExplicitPandasApi::test_csv - AssertionError: Series are different
      FAILED test/pandas_types/test_binary.py::TestSetitem::test_setitem_2d_values - AssertionError
      

            Assignee:
            steve.silvester@mongodb.com Steve Silvester
            Reporter:
            steve.silvester@mongodb.com Steve Silvester
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

              Created:
              Updated:
              Resolved: