-
Type:
Task
-
Resolution: Unresolved
-
Priority:
Major - P3
-
None
-
Affects Version/s: None
-
Component/s: None
-
None
-
Catalog and Routing
-
None
-
None
-
None
-
None
-
None
-
None
-
None
As part of the investigation into ExceededTimeLimit retriability (SERVER-117235), we identified that the behavior introduced by SERVER-84623 acts as a minor regression for multi-document transactions.
Currently, when a transaction waits for a refresh in a critical section and times out, the ExceededTimeLimit error bubbles up to the driver (labeled as TransientTransactionError). Previously, this manifested as a StaleConfig exception, which the MongoS strategy layer retried transparently without involving the driver.
Investigation Proposal
We would like to investigate if we can solve the original "shutdown masking" problem (SERVER-84623) by modifying the order of operations in the command execution path, rather than overriding errors.
We propose investigating the following:
- Can we check the replication status (verifying the node is not shutting down) before we check the versioning protocol?
- If we can detect that a node is shutting down before entering the versioning logic, we should no longer need to override refresh errors to catch other exceptions.
If this hypothesis holds, we should implement this reordering and revert SERVER-84623. This would restore the behavior where ExceededTimeLimit in transactions is converted to StaleConfig, allowing MongoS to handle the retry internally.
- is related to
-
SERVER-84623 Shard-local re-execution of a command might bubble up a misleading StaleConfig exception to the router
-
- Closed
-
-
SERVER-117235 Investigate ExceededTimeLimit retriability in multi-document transactions when waiting for refreshes
-
- Closed
-