[SERVER-31978] Add an invariant that DocumentSourceCloseCursor does not execute on a mongod for a sharded $changeStream Created: 15/Nov/17  Updated: 30/Oct/23  Resolved: 20/Nov/17

Status: Closed
Project: Core Server
Component/s: Aggregation Framework, Querying
Affects Version/s: None
Fix Version/s: 3.6.1, 3.7.1

Type: Task Priority: Major - P3
Reporter: David Storch Assignee: David Storch
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Related
is related to SERVER-30834 Make mongos reload the shard registry... Closed
is related to SERVER-63020 Investigate relevance of SERVER-31978... Closed
Backwards Compatibility: Fully Compatible
Backport Requested:
v3.6
Sprint: Query 2017-12-04
Participants:
Linked BF Score: 0

 Description   

DocumentSourceCloseCursor is part of the internal $changeStream machinery. It is used to close change stream cursors that have been invalidated, due to an event such as a collection drop or database drop.

DocumentSourceCloseCursor should always run on the mongos in the case that the $changeStream is run in a sharded configuration. This is because the mongos cursor manager is not prepared to correctly handle its child cursor being closed out from under it. Instead, the cursor should be closed via the DocumentSourceCloseCursor running on mongos. Cleanup of the mongos cursor should cause the underlying cursors on the shards to be cleaned up as well.

As of commit d4a526fdcf under SERVER-30834, $changeStream will never pass through completely to the shards. Therefore, the DocumentSourceCloseCursor should always end up on mongos. Since it is not obviously wrong to pass through the change stream, we should add an invariant to make sure that this doesn't regress.



 Comments   
Comment by Githook User [ 06/Dec/17 ]

Author:

{'name': 'David Storch', 'username': 'dstorch', 'email': 'david.storch@10gen.com'}

Message: SERVER-31978 Add invariant that changeStream pipeline sent from mongos to mongod has been split.

(cherry picked from commit 11c3a16c20532b77e6e8a2b45ddb18c45913699d)
Branch: v3.6
https://github.com/mongodb/mongo/commit/7431c4c6d9cfb91d85259687966dea5b1a8f1435

Comment by Githook User [ 20/Nov/17 ]

Author:

{'name': 'David Storch', 'username': 'dstorch', 'email': 'david.storch@10gen.com'}

Message: SERVER-31978 Add invariant that changeStream pipeline sent from mongos to mongod has been split.
Branch: master
https://github.com/mongodb/mongo/commit/11c3a16c20532b77e6e8a2b45ddb18c45913699d

Comment by David Storch [ 15/Nov/17 ]

The problem with running DocumentSourceCloseCursors on mongod in sharding is subtle, and was exposed by our continuous integration testing. The problem is a race between the check for remotesExhausted() and the awaitData timeout in RouterStageMerge. I can reproduce reliably the failure at githash f19da233fa after applying the following patch:

diff --git a/src/mongo/s/query/async_results_merger.cpp b/src/mongo/s/query/async_results_merger.cpp
index 45f4cbf350..358df1e3ad 100644
--- a/src/mongo/s/query/async_results_merger.cpp
+++ b/src/mongo/s/query/async_results_merger.cpp
@@ -538,6 +538,12 @@ void AsyncResultsMerger::_processBatchResults(WithLock lk,
     // Update the cursorId; it is sent as '0' when the cursor has been exhausted on the shard.
     remote.cursorId = cursorResponse.getCursorId();
 
+    if (remote.cursorId == 0 && _params->tailableMode == TailableMode::kTailableAndAwaitData) {
+        log() << "AsyncResultsMerger()::_processBatchResults going to sleep";
+        sleepmillis(5000);
+        log() << "AsyncResultsMerger()::_processBatchResults woke up";
+    }
+
     // Save the batch in the remote's buffer.
     if (!_addBatchToBuffer(lk, remoteIndex, cursorResponse)) {
         return;

This patch instruments the code with a sleep to induce the following sequence of events:

  1. The change stream is forwarded entirely to the shard using the aggPassthrough() path. At this revision, the pipeline is never split. All of the change stream machinery ends up on the shard.
  2. The test causes the shard to generate an invalidate entry and close its cursor. It sends back a batch with a single invalidate entry and a cursor id of zero.
  3. The mongos processes the batch. It marks the remote cursor as closed by copying over the cursor id of zero here.
  4. Around the same time that the response batch is being handled on mongos, the awaitData timeout in RouterStageMerge expires. This causes RouterStageMerge to propagate an EOF up through the layers of router stages inside the mongos cursor.
  5. ClusterFind::runGetMore() handles the EOF. Typically the mongos will keep a tailable cursor open even when it is exhausted. However, you'll notice that this check marks a tailable mongos cursor as exhausted when the remote cursors on the shards are closed. Since the cursor has already been closed on the remote node due to the change stream invalidation, we mark the mongos cursor as exhausted. This causes us to fail to return the invalidate entry to the client.
Generated at Thu Feb 08 04:28:47 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.