-
Type:
Bug
-
Resolution: Unresolved
-
Priority:
Major - P3
-
None
-
Affects Version/s: 8.0.0, 8.2.0
-
Component/s: None
-
None
-
Catalog and Routing
-
🟦 Shard Catalog
-
None
-
None
-
None
-
None
-
None
-
None
When a shard executing a bulk write consisting of multiple operations fails some of them, for example due to StaleConfig error, it produces an ok:1 response with a payload indicating which writes succeeded vs which ones did not. This allows correct reporting of the operation outcome and also allows mongos to perform retries when it is safe to do so.
Before responding to the mongos, shards attempt to recover the sharding metadata. However, if that fails due to an Interruption error, shards overwrite the ok: 1 response and instead throw top-level ok: 0 response. This causes the detailed per-operation outcome to be lost, which makes the mongos unable to determine the appropriate reties. That error is then propagated to the driver, without any information about what operations succeeded vs failed.
In the case of retryableWrites=true, the driver is able to retry safely the whole operation, so this is transparent to the app, although with some inefficiency due to retrying operations that definitely had succeed already.
In the case of retryableWrites=false, the driver is not able to retry and the app simply gets a top-level error that doesn't report the individual writes outcomes.
Shards should avoid discarding the response payload indicating the individual write operation outcomes.
- is caused by
-
SERVER-84623 Shard-local re-execution of a command might bubble up a misleading StaleConfig exception to the router
-
- Closed
-