[SERVER-33991] Pass txnNumber all the getMore command on mongos Created: 19/Mar/18  Updated: 29/Oct/23  Resolved: 23/Apr/18

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: None
Fix Version/s: 3.7.6

Type: Task Priority: Major - P3
Reporter: Misha Tyulenev Assignee: Jack Mulrow
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Duplicate
is duplicated by SERVER-33702 Move sessionId and txnNumber addition... Closed
Backwards Compatibility: Fully Compatible
Sprint: Sharding 2018-04-23, Sharding 2018-05-07
Participants:

 Description   

Old Description:

txnNumber must be passed through getMore command to implement global snapshot reads.

Proposed approach:

1. Add boost::optional private field _txnNumber to:

  • AsyncResultsMerger
  • ClusterClientCursor(Impl / Mock)
  • ClusterCursorManager::PinnedCursor

2. In the ClusterClientCursorImpl constructors

  • Take the txnNumber from the opCtx

3. In AsyncResultsMerger constructor

  • Take the txnNumber from the opCtx
    • I'm pretty sure this is guaranteed to be the same opCtx and txnNumber as the one used in the ClusterClientCursorImpl constructors, but if it's not, I can also thread the txnNumber from the ClusterClientCursor down into the AsyncResultsMerger some other way. Using the opCtx seemed cleaner for a first attempt though.

Note: I need to store it in two places because the ARM needs the txnNumber to attach it to getMore requests and the ClusterClientCursor needs it to surface it to the PinnedCursor checked out in ClusterFind::runGetMore so we can validate the txnNumber sent by the user in subsequent getMores

4. In AsyncResultsMerger::_askForNextBatch

  • If the ARM has a txnNumber, attach it to the getMore request (by std::moving the getMore cmdObj into a BSONObjBuilder to avoid copying) here

5. In ClusterFind::runGetMore

  • Assert the txnNumber on the opCtx equals the txnNumber on the checked out pinned cursor here
  • If they don't match, return the cursor to the CursorManager so it isn't deleted and can be used by subsequent requests (this seems to match the behavior on mongod)

6. Delete getMore txnNumber check in ShardingTaskExecutor here

Here's a POC I made that passes the existing global snapshot reads tests we have, if this approach is approved, I'll add more testing, esp. unit tests: https://mongodbcr.appspot.com/201070002



 Comments   
Comment by Githook User [ 23/Apr/18 ]

Author:

{'email': 'jack.mulrow@mongodb.com', 'username': 'jsmulrow', 'name': 'Jack Mulrow'}

Message: SERVER-33991 Pass txnNumber in getMore requests through mongos
Branch: master
https://github.com/mongodb/mongo/commit/8941fac630e0ac31649a1e32b0ea54c3930bdfec

Comment by Charlie Swanson [ 12/Apr/18 ]

jack.mulrow this plan sounds reasonable to me. We will likely have to do the same thing in the ARM for the lsid for SERVER-34204. I wonder if we should consolidate all the things that should be the same into one object to attach to each getMore? e.g. batchSize, lsid, and txnNumber, possibly maxTimeMS for tailable cursors. I can take a stab at that for SERVER-34204 though.

Comment by Jack Mulrow [ 11/Apr/18 ]

misha.tyulenev and charlie.swanson, can you guys review the approach / POC above? Thanks!

Generated at Thu Feb 08 04:35:13 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.