[SERVER-58938] Update DBClientCursor::_postBatchResumeToken when running an aggregate command and pass it to the callback function in RSLocalClient::runAggregation Created: 28/Jul/21  Updated: 06/Dec/22  Resolved: 01/Dec/21

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: None
Fix Version/s: None

Type: Task Priority: Major - P3
Reporter: Janna Golden Assignee: [DO NOT USE] Backlog - Sharding NYC
Resolution: Won't Do Votes: 0
Labels: PM-234, PM-234-M3, PM-234-T-oplog-fetch, sharding-nyc-subteam1
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
depends on SERVER-49897 Insert no-op entries into oplog buffe... Closed
Related
related to SERVER-61841 Delete TODO SERVER-58938 comment from... Closed
Assigned Teams:
Sharding NYC
Participants:
Story Points: 3

 Description   

Currently we don't set _postBatchResumeToken on DBClientCursor for an aggregate command. We should, and then should pass it to the callback function in RSLocalClient::runAggregation to allow a resharding recipient that is fetching from itself (because it's also a donor) to resume more efficiently.



 Comments   
Comment by Max Hirschhorn [ 01/Dec/21 ]

Confirmed with Kal that ShardLocal is used only by the config server locally to talk to itself. SERVER-49897 solved the entirety of the issue that shards face around efficient resuming in the lack of user writes to the source collection during a resharding operation. Closing this ticket as "Won't Do".

Comment by Max Hirschhorn [ 18/Nov/21 ]

Either write a JavaScript test to verify that documents are now written to the localOplogBuffer collection when they weren't before.

We should write this JavaScript test first because based on the state of a core dump I was debugging in BF-23392, it seems like replica set shards will use ShardRemote even for their own shard name. Perhaps ShardLocal is only a special case on the config server replica set?

Comment by Max Hirschhorn [ 08/Nov/21 ]

Acceptance criteria:

  • Verify that DBCommandCursor no longer guards updating its _postBatchResumeToken member variable around the command being issued as a find command. (Resharding uses an aggregate command.)
  • Follow the semantics in ShardRemote::runAggregation() to ignore the postBatchResumeToken value except when the getMore batch was empty.
  • Either write a JavaScript test to verify that documents are now written to the localOplogBuffer collection when they weren't before. Or add a C++ test which does the same because it can take advantage of everything being ShardLocal.
Generated at Thu Feb 08 05:45:54 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.