[SERVER-54959] Prevent cursors established by ReshardingCollectionCloner from being timed out while in use on some shards Created: 04/Mar/21  Updated: 29/Oct/23  Resolved: 22/Apr/21

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: None
Fix Version/s: 5.0.0-rc0

Type: Task Priority: Major - P3
Reporter: Max Hirschhorn Assignee: Max Hirschhorn
Resolution: Fixed Votes: 0
Labels: PM-234-M2.5, PM-234-T-data-clone
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
Backwards Compatibility: Fully Compatible
Sprint: Sharding 2021-03-22, Sharding 2021-04-05, Sharding 2021-04-19, Sharding 2021-05-03
Participants:
Story Points: 2

 Description   

A shard could have only a small number of documents meant for a particular recipient and therefore return batches infrequently. If a batch isn't returned within 10 minutes then the cursors which are idle on the other donor shards will time out. We should therefore set noCursorTimeout for the aggregation request sent to all donor shards in the ReshardingCollectionCloner.



 Comments   
Comment by Githook User [ 22/Apr/21 ]

Author:

{'name': 'Max Hirschhorn', 'email': 'max.hirschhorn@mongodb.com', 'username': 'visemet'}

Message: SERVER-54959 Avoid cursor timeout during resharding collection cloning.
Branch: master
https://github.com/mongodb/mongo/commit/a40b234aa48880b8e5d72889353832411b631150

Comment by Max Hirschhorn [ 07/Apr/21 ]

It looks like #1 is sufficient on its own and that doing #2 isn't needed (referring to my earlier comment).

Jason Carey pointed me to this part of LogicalSessionCacheImpl::_refresh() which bumps the lastUse for any logical session associated with an active operation context. This means so long as a cursor for the collection cloning pipeline is active on some shard, then the cursors across all shards will be kept alive.

Comment by Max Hirschhorn [ 23/Mar/21 ]

The following satisfies #1, although I think it'd be better for ReshardingCollectionCloner to receive the LogicalSessionId as an argument to the constructor. This is because implementing #2 is going to require having a dedicated thread for running LogicalSessionCache::vivify(lsid) while the thread in ReshardingCollectionCloner::run() is blocked waiting for a result from Pipeline::getNext(). If a new logical session ID is generated each time ReshardingCollectionCloner::_restartPipeline() is called then that other thread needs to know about the new logical session ID too so it can switch which logical session it is refreshing.

diff --git a/src/mongo/db/s/resharding/resharding_collection_cloner.cpp b/src/mongo/db/s/resharding/resharding_collection_cloner.cpp
index 61c2c4d7c5..ed7425fe68 100644
--- a/src/mongo/db/s/resharding/resharding_collection_cloner.cpp
+++ b/src/mongo/db/s/resharding/resharding_collection_cloner.cpp
@@ -42,6 +42,7 @@
 #include "mongo/db/concurrency/write_conflict_exception.h"
 #include "mongo/db/curop.h"
 #include "mongo/db/exec/document_value/document.h"
+#include "mongo/db/logical_session_id_helpers.h"
 #include "mongo/db/pipeline/aggregation_request_helper.h"
 #include "mongo/db/pipeline/document_source_lookup.h"
 #include "mongo/db/pipeline/document_source_match.h"
@@ -215,6 +216,12 @@ std::unique_ptr<Pipeline, PipelineDeleter> ReshardingCollectionCloner::makePipel
 
 std::unique_ptr<Pipeline, PipelineDeleter> ReshardingCollectionCloner::_targetAggregationRequest(
     OperationContext* opCtx, const Pipeline& pipeline) {
+    // We associate the aggregation cursors established on each donor shard with a logical session
+    // to prevent them from killing the cursor when it is idle locally. Due to the cursor's merging
+    // behavior across all donor shards, it is possible for the cursor to be active on one donor
+    // shard while idle for a long period on another donor shard.
+    opCtx->setLogicalSessionId(makeLogicalSessionId(opCtx));
+
     AggregateCommand request(_sourceNss, pipeline.serializeToBson());
     request.setCollectionUUID(_sourceUUID);

Comment by Max Hirschhorn [ 23/Mar/21 ]

I hadn't realized aggregation cursors don't support the noCursorTimeout option. The changes from SERVER-6036 made it so CursorManager won't kill idle cursors after 10 minutes if they are associated with a logical session. Instead, the cursor will be killed as part of logical session expiry (30 minutes by default). I think work on this ticket needs to do two things:

  1. Include an 'lsid' field in the aggregate request being sent to the donor shards. This would make it so each donor shard's CursorManager won't kill the cursor even if it is idle for >10 minutes.
  2. The recipient must periodically run LogicalSessionCache::vivify(lsid) to prevent the cursors established on the donor shards from being killed. Note that LogicalSessionCache::vivify(lsid) is function underlying the refreshSessions command.
Comment by Max Hirschhorn [ 04/Mar/21 ]

One thought as part of testing this change would be to use the "cursor.open.noTimeout" serverStatus metric reported by mongod by having collection cloning paused with the cursors still open. We can use the same trick from resharding_clones_duplicate_key.js to use large documents to ensure there's more than one batch of documents to clone.

Generated at Thu Feb 08 05:35:02 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.