-
Type:
Bug
-
Status: Closed
-
Priority:
Major - P3
-
Resolution: Fixed
-
Affects Version/s: 4.2.3
-
Labels:
-
Backwards Compatibility:Fully Compatible
-
Operating System:ALL
-
Backport Requested:v4.4, v4.2, v4.0
-
Sprint:Sharding 2020-04-20
-
Linked BF Score:17
The changes from d591387 as part of SERVER-45100 made it so writes are no longer re-issued to shards we've already received a successful response from.
For updateOne by _id operations which target all shards that may own data for the collection, this can lead to a WriteOp being put in the WriteOpState_Pending state without scheduling more work on the ARS. Since BatchWriteOp::noteBatchResponse() must be called to transition the WriteOp to the WriteOpState_Completed or WriteOpState_Error states and that only happens when getting a response from the ARS, the WriteOp is left stranded in the WriteOpState_Pending state until it exhausts the numRoundsWithoutProgress counter.
- Collection is sharded with all chunks on shard0. (E.g. when using range-based sharding)
- Chunk migration from shard0 to shard1 begins (but committing it doesn't complete for the remainder of these steps).
- UpdateOne by _id operation targets both shard0 and shard1.
- MongoS receives acknowledgement of successful update from shard0.
- MongoS receives StaleShardVersion error response from shard1.
- MongoS re-targets and only considers resending updateOne by _id operation to shard0 because shard1 doesn't yet own any chunks.
- MongoS doesn't resend the updateOne by _id to shard0 because it has already got a successful response from shard0. WriteOp is put into the WriteOpState_Pending state without there being any more requests to send.
- MongoS reports a write result with a NoProgressMade error:
{
|
ok: 1,
|
nModified: 1,
|
n: 1,
|
writeErrors: [
|
{
|
index: 0,
|
code: 82,
|
codeName: "NoProgressMade",
|
errmsg: "no progress was made executing batch write op in test2_fsmdb0.fsmcoll0 after 5 rounds (0 ops completed in 6 rounds total)"
|
}
|
]
|
}
|
- is caused by
-
SERVER-45100 Make the BatchWriteExecutor retry multi-writes only against unsuccessful shards
-
- Closed
-
- is depended on by
-
SERVER-32198 Missing collection metadata on the shard implies both UNSHARDED and "metadata not loaded yet"
-
- Closed
-