Loading...

XML

Word

Printable

JSON

Type: Bug
Resolution: Fixed
Priority: Unknown
Fix Version/s: None
Affects Version/s: None
Component/s: None
Labels:
None

Quarter:
- FY23Q3
Confidence Status:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Link:
None
Goal Name(s):
None

After the changes in ~~PYTHON-1860~~, pymongoarrow is slower than the naive pd.DataFrame(list(coll.find())) approach.

Before ~~PYTHON-1860~~:

$ python benchmark.py  # With pymongo 3.11 (pre PYTHON-1860)
100000 small docs, 40 bytes each with 3 keys
1000 large docs, 153k each with 2600 keys

                    BENCH:   SMALL   LARGE
  conventional-to-ndarray:    0.29    1.34 
    pymongoarrow-to-numpy:    0.11    1.31 
   conventional-to-pandas:    0.37    2.01 
   pymongoarrow-to-pandas:    0.11    1.28 
    pymongoarrow-to-arrow:    0.11    1.29 
$ pip list
Package         Version    Editable project location
--------------- ---------- --------------------------------------------
Cython          0.29.22
numpy           1.20.1
pandas          1.2.3
pip             22.0.4
pyarrow         7.0.0
pymongo         3.11.4
pymongoarrow    0.4.0.dev0 /Users/shane/git/mongo-arrow/bindings/python
python-dateutil 2.8.1
pytz            2021.1
pyupgrade       2.13.0
setuptools      53.0.0
six             1.15.0
tokenize-rt     4.1.0
wheel           0.37.0

After ~~PYTHON-1860~~:

$ pip install --upgrade 'pymongo<4'
...
$ python benchmark.py  # With pymongo 3.12 (post PYTHON-1860)
100000 small docs, 40 bytes each with 3 keys
1000 large docs, 153k each with 2600 keys

                    BENCH:   SMALL   LARGE
  conventional-to-ndarray:    0.29    1.29 
    pymongoarrow-to-numpy:    0.30    1.76 
   conventional-to-pandas:    0.36    2.29 
   pymongoarrow-to-pandas:    0.39    2.11 
    pymongoarrow-to-arrow:    0.31    1.76

One way to fix this would be for the server to finally implement OP_MSG Payload Type 1 stream responses.

depends on

PYTHON-2722 Improve performance of find/aggregate_raw_batches

Closed

is caused by

PYTHON-1860 Use OP_MSG not OP_GET_MORE in find_raw_batches and aggregate_raw_batches

Closed

is depended on by

INTPYTHON-101 Use pymongoarrow in dask-mongo

Backlog

Assignee:: Steve Silvester
Reporter:: Shane Harvey
Votes:: 0 Vote for this issue
Watchers:: 2 Start watching this issue

Created:: Apr 05 2022 09:24:58 PM UTC
Updated:: Oct 28 2023 10:21:38 AM UTC
Resolved:: Oct 19 2022 08:05:36 PM UTC

Details

Description

Attachments

Issue Links

Activity

People

Dates