The changes from d591387 as part of SERVER-45100 made it so writes are no longer re-issued to shards we've already received a successful response from.
For updateOne by _id operations which target all shards that may own data for the collection, this can lead to a WriteOp being put in the WriteOpState_Pending state without scheduling more work on the ARS. Since BatchWriteOp::noteBatchResponse() must be called to transition the WriteOp to the WriteOpState_Completed or WriteOpState_Error states and that only happens when getting a response from the ARS, the WriteOp is left stranded in the WriteOpState_Pending state until it exhausts the numRoundsWithoutProgress counter.
- Collection is sharded with all chunks on shard0. (E.g. when using range-based sharding)
- Chunk migration from shard0 to shard1 begins (but committing it doesn't complete for the remainder of these steps).
- UpdateOne by _id operation targets both shard0 and shard1.
- MongoS receives acknowledgement of successful update from shard0.
- MongoS receives StaleShardVersion error response from shard1.
- MongoS re-targets and only considers resending updateOne by _id operation to shard0 because shard1 doesn't yet own any chunks.
- MongoS doesn't resend the updateOne by _id to shard0 because it has already got a successful response from shard0. WriteOp is put into the WriteOpState_Pending state without there being any more requests to send.
- MongoS reports a write result with a NoProgressMade error:
{ ok: 1, nModified: 1, n: 1, writeErrors: [ { index: 0, code: 82, codeName: "NoProgressMade", errmsg: "no progress was made executing batch write op in test2_fsmdb0.fsmcoll0 after 5 rounds (0 ops completed in 6 rounds total)" } ] }
- is caused by
-
SERVER-45100 Make the BatchWriteExecutor retry multi-writes only against unsuccessful shards
- Closed
- is depended on by
-
SERVER-32198 Missing collection metadata on the shard implies both UNSHARDED and "metadata not loaded yet"
- Closed