- 
    Type:Bug 
- 
    Resolution: Works as Designed
- 
    Priority:Major - P3 
- 
    None
- 
    Affects Version/s: None
- 
    Component/s: None
- 
    None
- 
        None
- 
        None
- 
        None
- 
        None
- 
        None
- 
        None
- 
        None
'code': 225, 'codeName': 'TransactionTooOld', 'errmsg': 'Cannot start transaction 3 on session ab828da0-dead-4cbd-beef-ac612334a5c1 - ugh4jez5/+Zo0w7yt4WMrZ1cJoa3zmk86txJfzwiQ18= because a newer transaction 4 has already started.
I'm running a sharded cluster, where each replica set is a PSA (primary, secondary, arbiter).
I'm processing a very large csv file where the data looks like this:
| _id | tag | 
|---|---|
| 1 | 100 | 
| 1 | 101 | 
| 2 | 100 | 
| 3 | 100 | 
| 3 | 101 | 
I need to group the tags by _id, so i use bulk operations in pymongo like:
UpdateOne({"_id": row["_id"]}, {"$addToSet": {"tag": row["tag"]}}, upsert=True)
which i run in batches of 5000.
If i run only one thread, i get no errors. If i split the csv file into 8 and run 8 parallel processes, i start getting the error above after a while, but it runs successfully for a few minutes. I'm suspecting that i hit a region of the csv file where i have the same _id over and over again. This looks similar to #14322, which i also had problems with, a few years ago, in an identical scenario.
What does that error even mean? What workaround can i try?
- is caused by
- 
                    PYTHON-1660 Driver session pools must be cleared after forking -         
- Closed
 
-         
- related to
- 
                    PYTHON-1745 Raise an error if an opened MongoClient is used after a fork -         
- Closed
 
-