-
Type:
Bug
-
Resolution: Unresolved
-
Priority:
Major - P3
-
None
-
Affects Version/s: None
-
Component/s: None
-
Cluster Scalability
-
ALL
-
None
-
3
-
TBD
-
None
-
None
-
None
-
None
-
None
-
None
-
None
During a resharding operation, if a donor shard's primary node fails over shortly before entering the critical section, the recipient shard is not notified promptly.
The recipient's aggregation request, which reads from the donor's oplog, isn't interrupted immediately. Instead, it experiences a long delay until it times out, even though the primary node stepped down much earlier.
This was observed on a 3-shard cluster. The recipient on shard1 failed to apply changes because the primary on the donor, shard2, failed over. The recipient's OplogFetcher aggregation query took approximately 2.5 minutes to be interrupted after the failover occurred on shard2.
The logs below show the primary on shard2 stepping down at 04:06:37, but the recipient's query isn't interrupted until 04:08:54.
Shard2:
{"t":{"$date":"2025-07-17T04:06:37.235+00:00"},"s":"I", "c":"REPL", "id":21358, "svc":"S", "ctx":"ReplCoord-197","msg":"Replica set state transition","attr":{"newState":"SECONDARY","oldState":"PRIMARY"}} .... .... {"t":{"$date":"2025-07-17T04:08:54.092+00:00"},"s":"I", "c":"COMMAND", "id":51803, "svc":"S", "ctx":"conn522","msg":"Slow query","attr":{"type":"command","isFromUserConnection":true,"ns":"local.oplog.rs","collectionType":"local","command":{"aggregate":"oplog.rs","pipeline":[{"$match":{"ts":{"$gte":{"$timestamp":{"t":1752724855,"i":61}}}}},{"$match":{"$or":[{"op":{"$in":["i","u","d","n"]},"ui":{"$uuid":"b7c6c53e-a831-4502-b5b0-aceaca7ecc1c"},"destinedRecipient":"rs1"},{"op":"c","o.applyOps":{"$exists":true},"o.partialTxn":{"$exists":false},"o.prepare":{"$exists":false}},{"op":"c","o.commitTransaction":{"$exists":true}},{"op":"c","o.abortTransaction":{"$exists":true}},{"op":"c","ui":{"$uuid":"b7c6c53e-a831-4502-b5b0-aceaca7ecc1c"}}]}},{"$_internalReshardingIterateTransaction":{"includeCommitTransactionTimestamp":true}},{"$_internalFindAndModifyImageLookup":{"includeCommitTransactionTimestamp":true}},{"$_addReshardingResumeId":{}},{"$match":{"$or":[{"op":{"$ne":"c"}},{"op":"c","o.applyOps":{"$exists":false}},{"op":"c","o.applyOps":{"$elemMatch":{"destinedRecipient":"rs1","ui":{"$uuid":"b7c6c53e-a831-4502-b5b0-aceaca7ecc1c"}}}}]}},{"$match":{"_id":{"$gt":{"clusterTime":{"$timestamp":{"t":1752724855,"i":61}},"ts":{"$timestamp":{"t":1752724855,"i":61}}}}}},{"$addFields":{"o":{"applyOps":{"$cond":[{"$eq":["$op",{"$const":"c"}]},{"$filter":{"input":"$o.applyOps","as":"this","cond":{"$and":[{"$eq":["$$this.ui",{"$const":{"$uuid":"b7c6c53e-a831-4502-b5b0-aceaca7ecc1c"}}]},{"$eq":["$$this.destinedRecipient",{"$const":"rs1"}]}]}}},"$o.applyOps"]}}}}],"cursor":{"batchSize":101},"hint":{"$natural":1},"$_requestReshardingResumeToken":true,"readConcern":{"level":"majority","afterClusterTime":{"$timestamp":{"t":1752724855,"i":61}}},"writeConcern":{"w":1,"wtimeout":0},"$queryOptions":{"$readPreference":{"mode":"primary"}},"$clusterTime":{"clusterTime":{"$timestamp":{"t":1752724856,"i":300}}},"$configTime":{"$timestamp":{"t":1752724852,"i":120}},"$topologyTime":{"$timestamp":{"t":1752707771,"i":32}},"$db":"local"},"planSummary":"COLLSCAN","planningTimeMicros":11550118,"numYields":150,"planCacheShapeHash":"06FAEC01","queryHash":"06FAEC01","planCacheKey":"93AD6807","queryFramework":"classic","ok":0,"errMsg":"Executor error during aggregate command on namespace: local.oplog.rs :: caused by :: operation was interrupted","errName":"InterruptedDueToReplStateChange","errCode":11602,"reslen":386,"locks":{"Global":{"acquireCount":{"r":179}}},"readConcern":{"level":"majority","afterClusterTime":{"$timestamp":{"t":1752724855,"i":61}},"provenance":"clientSupplied"},"writeConcern":{"w":1,"wtimeout":0,"provenance":"clientSupplied"},"cpuNanos":946771602,"remote":"10.2.0.203:55912","protocol":"op_msg","queues":{"ingress":{"admissions":1},"execution":{"admissions":180}},"workingMillis":459558,"interruptLatencyNanos":31498342493,"durationMillis":459558}} }
Resharding Critical Section start/end times:
{ "state": "blocking-writes", "startTime": "2025-07-17T04:07:58.055+00:00", "endTime": "2025-07-17T04:08:03.060+00:00", "durationSeconds": 5.005 }
The effect of this is that the resharding operation fails with Critical Section Timeout and loses all progress.