- 
    Type:Bug 
- 
    Resolution: Fixed
- 
    Priority:Unknown 
- 
    None
- 
    Affects Version/s: None
- 
    Component/s: None
- 
    None
- 
        None
- 
        None
- 
        None
- 
        None
- 
        None
- 
        None
- 
        None
After the changes in PYTHON-1860, pymongoarrow is slower than the naive pd.DataFrame(list(coll.find())) approach.
Before PYTHON-1860:
$ python benchmark.py  # With pymongo 3.11 (pre PYTHON-1860)
100000 small docs, 40 bytes each with 3 keys
1000 large docs, 153k each with 2600 keys
                    BENCH:   SMALL   LARGE
  conventional-to-ndarray:    0.29    1.34 
    pymongoarrow-to-numpy:    0.11    1.31 
   conventional-to-pandas:    0.37    2.01 
   pymongoarrow-to-pandas:    0.11    1.28 
    pymongoarrow-to-arrow:    0.11    1.29 
$ pip list
Package         Version    Editable project location
--------------- ---------- --------------------------------------------
Cython          0.29.22
numpy           1.20.1
pandas          1.2.3
pip             22.0.4
pyarrow         7.0.0
pymongo         3.11.4
pymongoarrow    0.4.0.dev0 /Users/shane/git/mongo-arrow/bindings/python
python-dateutil 2.8.1
pytz            2021.1
pyupgrade       2.13.0
setuptools      53.0.0
six             1.15.0
tokenize-rt     4.1.0
wheel           0.37.0
After PYTHON-1860:
$ pip install --upgrade 'pymongo<4'
...
$ python benchmark.py  # With pymongo 3.12 (post PYTHON-1860)
100000 small docs, 40 bytes each with 3 keys
1000 large docs, 153k each with 2600 keys
                    BENCH:   SMALL   LARGE
  conventional-to-ndarray:    0.29    1.29 
    pymongoarrow-to-numpy:    0.30    1.76 
   conventional-to-pandas:    0.36    2.29 
   pymongoarrow-to-pandas:    0.39    2.11 
    pymongoarrow-to-arrow:    0.31    1.76 
One way to fix this would be for the server to finally implement OP_MSG Payload Type 1 stream responses.
- depends on
- 
                    PYTHON-2722 Improve performance of find/aggregate_raw_batches -         
- Closed
 
-         
- is caused by
- 
                    PYTHON-1860 Use OP_MSG not OP_GET_MORE in find_raw_batches and aggregate_raw_batches -         
- Closed
 
-         
- is depended on by
- 
                    INTPYTHON-101 Use pymongoarrow in dask-mongo -         
- Backlog
 
-