Loading...

XML

Word

Printable

JSON

Type: Bug
Resolution: Unresolved
Priority: Major - P3
Fix Version/s: None
Affects Version/s: None
Component/s: None
Labels:

Assigned Teams:

Cluster Scalability
Operating System:
ALL
CAR Domain/s:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

During a resharding operation, if a donor shard's primary node fails over shortly before entering the critical section, the recipient shard is not notified promptly.

The recipient's aggregation request, which reads from the donor's oplog, isn't interrupted immediately. Instead, it experiences a long delay until it times out, even though the primary node stepped down much earlier.

This was observed on a 3-shard cluster. The recipient on shard1 failed to apply changes because the primary on the donor, shard2, failed over. The recipient's OplogFetcher aggregation query took approximately 2.5 minutes to be interrupted after the failover occurred on shard2.

The logs below show the primary on shard2 stepping down at 04:06:37, but the recipient's query isn't interrupted until 04:08:54.

Shard2:

{"t":{"$date":"2025-07-17T04:06:37.235+00:00"},"s":"I",  "c":"REPL",     "id":21358,   "svc":"S", "ctx":"ReplCoord-197","msg":"Replica set state transition","attr":{"newState":"SECONDARY","oldState":"PRIMARY"}}
....
....
{"t":{"$date":"2025-07-17T04:08:54.092+00:00"},"s":"I",  "c":"COMMAND",  "id":51803,   "svc":"S", "ctx":"conn522","msg":"Slow query","attr":{"type":"command","isFromUserConnection":true,"ns":"local.oplog.rs","collectionType":"local","command":{"aggregate":"oplog.rs","pipeline":[{"$match":{"ts":{"$gte":{"$timestamp":{"t":1752724855,"i":61}}}}},{"$match":{"$or":[{"op":{"$in":["i","u","d","n"]},"ui":{"$uuid":"b7c6c53e-a831-4502-b5b0-aceaca7ecc1c"},"destinedRecipient":"rs1"},{"op":"c","o.applyOps":{"$exists":true},"o.partialTxn":{"$exists":false},"o.prepare":{"$exists":false}},{"op":"c","o.commitTransaction":{"$exists":true}},{"op":"c","o.abortTransaction":{"$exists":true}},{"op":"c","ui":{"$uuid":"b7c6c53e-a831-4502-b5b0-aceaca7ecc1c"}}]}},{"$_internalReshardingIterateTransaction":{"includeCommitTransactionTimestamp":true}},{"$_internalFindAndModifyImageLookup":{"includeCommitTransactionTimestamp":true}},{"$_addReshardingResumeId":{}},{"$match":{"$or":[{"op":{"$ne":"c"}},{"op":"c","o.applyOps":{"$exists":false}},{"op":"c","o.applyOps":{"$elemMatch":{"destinedRecipient":"rs1","ui":{"$uuid":"b7c6c53e-a831-4502-b5b0-aceaca7ecc1c"}}}}]}},{"$match":{"_id":{"$gt":{"clusterTime":{"$timestamp":{"t":1752724855,"i":61}},"ts":{"$timestamp":{"t":1752724855,"i":61}}}}}},{"$addFields":{"o":{"applyOps":{"$cond":[{"$eq":["$op",{"$const":"c"}]},{"$filter":{"input":"$o.applyOps","as":"this","cond":{"$and":[{"$eq":["$$this.ui",{"$const":{"$uuid":"b7c6c53e-a831-4502-b5b0-aceaca7ecc1c"}}]},{"$eq":["$$this.destinedRecipient",{"$const":"rs1"}]}]}}},"$o.applyOps"]}}}}],"cursor":{"batchSize":101},"hint":{"$natural":1},"$_requestReshardingResumeToken":true,"readConcern":{"level":"majority","afterClusterTime":{"$timestamp":{"t":1752724855,"i":61}}},"writeConcern":{"w":1,"wtimeout":0},"$queryOptions":{"$readPreference":{"mode":"primary"}},"$clusterTime":{"clusterTime":{"$timestamp":{"t":1752724856,"i":300}}},"$configTime":{"$timestamp":{"t":1752724852,"i":120}},"$topologyTime":{"$timestamp":{"t":1752707771,"i":32}},"$db":"local"},"planSummary":"COLLSCAN","planningTimeMicros":11550118,"numYields":150,"planCacheShapeHash":"06FAEC01","queryHash":"06FAEC01","planCacheKey":"93AD6807","queryFramework":"classic","ok":0,"errMsg":"Executor error during aggregate command on namespace: local.oplog.rs :: caused by :: operation was interrupted","errName":"InterruptedDueToReplStateChange","errCode":11602,"reslen":386,"locks":{"Global":{"acquireCount":{"r":179}}},"readConcern":{"level":"majority","afterClusterTime":{"$timestamp":{"t":1752724855,"i":61}},"provenance":"clientSupplied"},"writeConcern":{"w":1,"wtimeout":0,"provenance":"clientSupplied"},"cpuNanos":946771602,"remote":"10.2.0.203:55912","protocol":"op_msg","queues":{"ingress":{"admissions":1},"execution":{"admissions":180}},"workingMillis":459558,"interruptLatencyNanos":31498342493,"durationMillis":459558}}
}

Resharding Critical Section start/end times:

{
              "state": "blocking-writes",
              "startTime": "2025-07-17T04:07:58.055+00:00",
              "endTime": "2025-07-17T04:08:03.060+00:00",
              "durationSeconds": 5.005
}

The effect of this is that the resharding operation fails with Critical Section Timeout and loses all progress.

Assignee:: Unassigned
Reporter:: Abdul Qadeer
Participants:: Abdul Qadeer
Votes:: 0 Vote for this issue
Watchers:: 6 Start watching this issue

Created:: Jul 23 2025 07:37:08 PM UTC
Updated:: Oct 10 2025 03:57:45 PM UTC

Details

Description

Attachments

Activity

People

Dates