[SERVER-48318] Risk of StaleChunkHistory errors in sharded transactions Created: 20/May/20  Updated: 29/Oct/23  Resolved: 09/Sep/20

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: None
Fix Version/s: 4.7.0

Type: Improvement Priority: Major - P3
Reporter: A. Jesse Jiryu Davis Assignee: Cheahuychou Mao
Resolution: Fixed Votes: 0
Labels: sharding-wfbf-day
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Documented
is documented by DOCS-13868 Investigate changes in SERVER-48318: ... Closed
Backwards Compatibility: Fully Compatible
Sprint: Sharding 2020-09-21
Participants:

 Description   

While reviewing the changes for SERVER-47785 with renctan, we wondered if the previous version of the code had a bug. Before, ShardingCatalogManager::commitChunkMigration removed all chunk history entries older than 10 seconds whenever it writes a new entry. Even after, it removes all but one of them.

A new transaction always chooses a recent timestamp, even with readConcern majority. This is the "speculative majority" behavior. But transactions have a default 60-second lifetime, and chunk history only lasts 10 seconds. Do we see the following?:

  • Start a sharded transaction
  • Choose transaction read timestamp T
  • 10 seconds pass
  • A chunkMove clears history entries before T for chunk C
  • The transaction continues and targets C
  • ChunkInfo::getShardIdAt tries to read at T, throws StaleChunkHistory error
  • mongos returns error to the client with TransientTransactionError label

Transactions cannot retry StaleChunkHistory (SERVER-39704) and I think this particular case could never be retried, since the history is truly gone.

If the client uses a driver's withTransaction API then TransientTransactionError will compel it to retry the transaction from the start and probably succeed. It can retry for up to 120 seconds. It would have to be unlucky for the sequence above to repeat for that long.

However, I think we can reduce the incidence of retries by keeping chunk history for at least transactionLifetimeLimitSeconds.



 Comments   
Comment by Githook User [ 11/Sep/20 ]

Author:

{'name': 'Cheahuychou Mao', 'email': 'cheahuychou.mao@mongodb.com', 'username': 'cheahuychou'}

Message: SERVER-48318 Make snapshot window equal to max of minSnapshotHistoryWindowInSeconds and transactionLifetimeLimitSeconds
Branch: svilen-optimizer-poc
https://github.com/mongodb/mongo/commit/2795d76c634e778a6d0bf672cd26c45c8193d2f4

Comment by Githook User [ 09/Sep/20 ]

Author:

{'name': 'Cheahuychou Mao', 'email': 'cheahuychou.mao@mongodb.com', 'username': 'cheahuychou'}

Message: SERVER-48318 Make snapshot window equal to max of minSnapshotHistoryWindowInSeconds and transactionLifetimeLimitSeconds
Branch: master
https://github.com/mongodb/mongo/commit/2795d76c634e778a6d0bf672cd26c45c8193d2f4

Comment by Max Hirschhorn [ 28/Aug/20 ]

There's a concern that users who modify transactionLifetimeLimitSeconds won't know to do the same on their CSRS and so using the higher of minSnapshotHistoryWindowInSeconds and transactionLifetimeLimitSeconds for the minSnapshotHistoryWindowInSeconds setting won't be effective.

Raising the default value for minSnapshotHistoryWindowInSeconds to 60 seconds to match transactionLifetimeLimitSeconds would satisfy the default case without much code complexity. And we should consider requesting a DOCS ticket to mention changing minSnapshotHistoryWindowInSeconds for sharded clusters in situations where a user would be changing transactionLifetimeLimitSeconds. Edit: Raising the default value for minSnapshotHistoryWindowInSeconds has other implications for history retention in WiredTiger. We're still wanting the

To set the parameter for a sharded cluster, the parameter must be modified for all shard replica set members.

https://docs.mongodb.com/manual/reference/parameters/#param.transactionLifetimeLimitSeconds

line to be updated to mention raising the transactionLifetimeLimitSeconds on the config servers as well with a rationale of something along the lines of "so that routing table history on the config server is maintained for at least as long as the transaction lifetime limit on shards."

Comment by Randolph Tan [ 30/Jul/20 ]

We should consider increasing the window to be max(transactionLifetime, snapshotWindow, 10). However, it is worth nothing that transactionLifetime and snapshotWidonw setting is per mongod, so it's only best effort. I don't think this will substantially increase the size of the chunkHistory unless the same chunk gets moved multiple times during the window.

Comment by Randolph Tan [ 24/Jun/20 ]

I think this makes sense and probably won't have a negative impact unless the same chunk is moved being moved constantly.

Generated at Thu Feb 08 05:16:47 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.