[SERVER-48318] Risk of StaleChunkHistory errors in sharded transactions Created: 20/May/20 Updated: 29/Oct/23 Resolved: 09/Sep/20 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Sharding |
| Affects Version/s: | None |
| Fix Version/s: | 4.7.0 |
| Type: | Improvement | Priority: | Major - P3 |
| Reporter: | A. Jesse Jiryu Davis | Assignee: | Cheahuychou Mao |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | sharding-wfbf-day | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||
| Sprint: | Sharding 2020-09-21 | ||||||||
| Participants: | |||||||||
| Description |
|
While reviewing the changes for A new transaction always chooses a recent timestamp, even with readConcern majority. This is the "speculative majority" behavior. But transactions have a default 60-second lifetime, and chunk history only lasts 10 seconds. Do we see the following?:
Transactions cannot retry StaleChunkHistory (SERVER-39704) and I think this particular case could never be retried, since the history is truly gone. If the client uses a driver's withTransaction API then TransientTransactionError will compel it to retry the transaction from the start and probably succeed. It can retry for up to 120 seconds. It would have to be unlucky for the sequence above to repeat for that long. However, I think we can reduce the incidence of retries by keeping chunk history for at least transactionLifetimeLimitSeconds. |
| Comments |
| Comment by Githook User [ 11/Sep/20 ] |
|
Author: {'name': 'Cheahuychou Mao', 'email': 'cheahuychou.mao@mongodb.com', 'username': 'cheahuychou'}Message: |
| Comment by Githook User [ 09/Sep/20 ] |
|
Author: {'name': 'Cheahuychou Mao', 'email': 'cheahuychou.mao@mongodb.com', 'username': 'cheahuychou'}Message: |
| Comment by Max Hirschhorn [ 28/Aug/20 ] |
|
There's a concern that users who modify transactionLifetimeLimitSeconds won't know to do the same on their CSRS and so using the higher of minSnapshotHistoryWindowInSeconds and transactionLifetimeLimitSeconds for the minSnapshotHistoryWindowInSeconds setting won't be effective.
line to be updated to mention raising the transactionLifetimeLimitSeconds on the config servers as well with a rationale of something along the lines of "so that routing table history on the config server is maintained for at least as long as the transaction lifetime limit on shards." |
| Comment by Randolph Tan [ 30/Jul/20 ] |
|
We should consider increasing the window to be max(transactionLifetime, snapshotWindow, 10). However, it is worth nothing that transactionLifetime and snapshotWidonw setting is per mongod, so it's only best effort. I don't think this will substantially increase the size of the chunkHistory unless the same chunk gets moved multiple times during the window. |
| Comment by Randolph Tan [ 24/Jun/20 ] |
|
I think this makes sense and probably won't have a negative impact unless the same chunk is moved being moved constantly. |