[SERVER-46466] Race with findAndModify retryable write and session migration Created: 27/Feb/20  Updated: 29/Oct/23  Resolved: 04/Mar/20

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: 3.6.0, 4.0.0
Fix Version/s: 3.6.18, 4.0.17

Type: Bug Priority: Critical - P2
Reporter: Randolph Tan Assignee: Randolph Tan
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: File test.js    
Issue Links:
Backports
Depends
Duplicate
duplicates SERVER-44055 All secondary crashed in SessionUpdat... Closed
Related
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Requested:
v4.0, v3.6
Sprint: Sharding 2020-03-09
Participants:
Case:

 Description   

Race:

1. FindAndModify write with txnNumber 10 is executed in shardA
2. Migration of chunk from shardA to shardB starts.
3. Session migration thread pulled oplog for write in step#1 and passed all the checks and about to write oplog here
4. A new retryable write with txnNumber 11 starts and successfully writes to oplog.
5. Session migration thread writes oplog for txnNumber 10. Primary successfully wrote an oplog with higher optime but lower txnNumber.

Consequence:

Secondaries can potentially hit this fassert:
https://github.com/mongodb/mongo/blob/r4.0.15/src/mongo/db/repl/session_update_tracker.cpp#L98

Note: this race is no longer possible in v4.2 because we checkout the session when session migration thread tries to process the oplog entries, so the interleaving is no longer possible.

Here are the conditions to hit to this race:

  • running older than v4.2
  • using retryable writes with findAndModify
  • migrations happening while using retryable write


 Comments   
Comment by Randolph Tan [ 24/Jun/21 ]

Yes, this is a script used to demonstrate the race. However, as noted in the comment inside the script, you will need a custom mongod build with sleep inserted in a certain location to make the race more likely to occur.

Comment by xudong gao [ 24/Jun/21 ]

what's this test.js used for ?  is it for trigger this bug for test ?  i too have encountered this bug , and i want to know how to trigger this bug manually and then update to a fix version to see if this fixed version work or not ? 

Comment by Githook User [ 05/Mar/20 ]

Author:

{'name': 'Randolph Tan', 'username': 'renctan', 'email': 'randolph@10gen.com'}

Message: SERVER-46466 Make session migration destination check out session

(cherry picked from commit 6931d7b2d6b5f6864a3995554f9af9e30fe859e9)
Branch: v3.6
https://github.com/mongodb/mongo/commit/b3e25a1f353cdd7e47e3efe0119ef3a0770c093e

Comment by Githook User [ 04/Mar/20 ]

Author:

{'username': 'renctan', 'name': 'Randolph Tan', 'email': 'randolph@10gen.com'}

Message: SERVER-46466 Make session migration destination check out session
Branch: v4.0
https://github.com/mongodb/mongo/commit/6931d7b2d6b5f6864a3995554f9af9e30fe859e9

Generated at Thu Feb 08 05:11:34 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.