Loading...

XML

Word

Printable

JSON

Type: Bug
Resolution: Fixed
Priority: Unknown
Fix Version/s: pymongoarrow-1.13.0
Affects Version/s: None
Component/s: None
Labels:
- [PyMongoArrow]

Confidence Status:
None

Assigned Teams:

Python Drivers

Documentation Changes:
Not Needed

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Link:
None
Goal Name(s):
None

Bug Description

Problem: When using parallel batch processing (parallelism="threads" or
parallelism="processes"), PyMongoArrow fails when different batches have different schemas due
to type inference.

Specific scenario:
• First batch contains small integers (inferred as int32)
• Later batch contains large integers requiring int64
• The parallel code paths process each batch independently, creating separate Arrow tables
• When concatenating these tables with pa.concat_tables(), the schemas don't match (int32 vs
int64), causing an error

Root causes:
1. In `api.py`: Used promote_options="default" which requires exact schema matches and doesn't
allow type promotion
2. In `lib.pyx`: When promoting int32→int64 during schema inference, the old builder was
discarded, losing all previously appended int32 values

The fix:
1. Change promote_options="permissive" to allow type promotion when concatenating tables
2. Preserve existing int32 values by casting them to int64 and re-appending to the new int64
builder

This ensures parallel and non-parallel code paths produce consistent results when schema
inference encounters mixed integer sizes.

Assignee:: Casey Clements
Reporter:: Casey Clements
Votes:: 0 Vote for this issue
Watchers:: 1 Start watching this issue

Created:: Mar 10 2026 08:59:47 PM UTC
Updated:: Mar 13 2026 12:19:25 AM UTC
Resolved:: Mar 12 2026 03:34:14 PM UTC

Details

Description

Attachments

Activity

People

Dates