-
Type:
Bug
-
Resolution: Won't Fix
-
Priority:
Major - P3
-
None
-
Affects Version/s: None
-
Component/s: None
-
None
-
Cluster Scalability
-
ALL
-
200
-
None
-
None
-
None
-
None
-
None
-
None
-
None
There are two bugs in batch write exec code path wherein a sub batch's error is not promoted to top-level error and the other where WCE error is not extracted correctly from a write error response from shard for a retryable write without shard key.
Following is an explanation of how this happens:
1. mongos sends the update which is a write without shard key to both the shards as part of two phase write protocol and receives {{ WouldChangeOwningShard }} error from the shard that has the document in query. This error is supposed to be propagated to ClusterWriteCmd::InvocationBase::runImpl later which will handle wouldChangeOwningShard error. However this doesn't happen correctly as noted in next steps.
2. mongos aborts the transaction by sending abortTransaction to both the shards however one of them responds with a WCE.
3. This WCE error from abortTransaction response is handled by taking the path where !responseStatus.isOK and processing the response in processErrorResponseFromLocal. This function passes the error as a WriteError to BatchWriteOp::noteBatchError where we have an emulated response that sets top level status to OK:1 incorrectly and it doesn't extract or parse the writeConcern error it received. There is logic in noteBatchResponse to extract WCE after it is called with an emulated response, but the branch is not taken.
4. The WCOS error is sent as a write error to the place where we expect the router to catch it and retry but the router decides it does not need to be handled as the top level status doesn't contain WCOS error.
- related to
-
SERVER-98461 findAndModify where query does not have shard key does not return WCE on failure
-
- Closed
-
-
SERVER-102404 Exclude updateOne_without_shard_key/*.js tests from concurrency_sharded_with_stepdowns suite
-
- Closed
-
-
SERVER-102951 Complete TODO listed in SERVER-102111
-
- Backlog
-
-
SERVER-100435 Handling WCE in server code POC
-
- Closed
-