Uploaded image for project: 'pymongoarrow'
  1. pymongoarrow
  2. ARROW-82

With pymongo>=3.12 pymongoarrow is slower than the naive approach

    • Type: Icon: Bug Bug
    • Resolution: Fixed
    • Priority: Icon: Unknown Unknown
    • None
    • Affects Version/s: None
    • Component/s: None
    • Labels:
      None

      After the changes in PYTHON-1860, pymongoarrow is slower than the naive pd.DataFrame(list(coll.find())) approach.

      Before PYTHON-1860:

      $ python benchmark.py  # With pymongo 3.11 (pre PYTHON-1860)
      100000 small docs, 40 bytes each with 3 keys
      1000 large docs, 153k each with 2600 keys
      
                          BENCH:   SMALL   LARGE
        conventional-to-ndarray:    0.29    1.34 
          pymongoarrow-to-numpy:    0.11    1.31 
         conventional-to-pandas:    0.37    2.01 
         pymongoarrow-to-pandas:    0.11    1.28 
          pymongoarrow-to-arrow:    0.11    1.29 
      $ pip list
      Package         Version    Editable project location
      --------------- ---------- --------------------------------------------
      Cython          0.29.22
      numpy           1.20.1
      pandas          1.2.3
      pip             22.0.4
      pyarrow         7.0.0
      pymongo         3.11.4
      pymongoarrow    0.4.0.dev0 /Users/shane/git/mongo-arrow/bindings/python
      python-dateutil 2.8.1
      pytz            2021.1
      pyupgrade       2.13.0
      setuptools      53.0.0
      six             1.15.0
      tokenize-rt     4.1.0
      wheel           0.37.0
      

      After PYTHON-1860:

      $ pip install --upgrade 'pymongo<4'
      ...
      $ python benchmark.py  # With pymongo 3.12 (post PYTHON-1860)
      100000 small docs, 40 bytes each with 3 keys
      1000 large docs, 153k each with 2600 keys
      
                          BENCH:   SMALL   LARGE
        conventional-to-ndarray:    0.29    1.29 
          pymongoarrow-to-numpy:    0.30    1.76 
         conventional-to-pandas:    0.36    2.29 
         pymongoarrow-to-pandas:    0.39    2.11 
          pymongoarrow-to-arrow:    0.31    1.76 
      

      One way to fix this would be for the server to finally implement OP_MSG Payload Type 1 stream responses.

            Assignee:
            steve.silvester@mongodb.com Steve Silvester
            Reporter:
            shane.harvey@mongodb.com Shane Harvey
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated:
              Resolved:

                Error rendering 'slack.nextup.jira:slack-integration-plus'. Please contact your Jira administrators.