Loading...

XML

Word

Printable

JSON

Type: Improvement
Resolution: Fixed
Priority: Major - P3
Fix Version/s: 4.7.0
Affects Version/s: None
Component/s: Sharding
Labels:
- sharding-wfbf-day

Backwards Compatibility:
Fully Compatible
Sprint:
Sharding 2020-09-21
Confidence Status:
None
Work Order:
3
CAR Domain/s:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

While reviewing the changes for ~~SERVER-47785~~ with renctan, we wondered if the previous version of the code had a bug. Before, ShardingCatalogManager::commitChunkMigration removed all chunk history entries older than 10 seconds whenever it writes a new entry. Even after, it removes all but one of them.

A new transaction always chooses a recent timestamp, even with readConcern majority. This is the "speculative majority" behavior. But transactions have a default 60-second lifetime, and chunk history only lasts 10 seconds. Do we see the following?:

Start a sharded transaction
Choose transaction read timestamp T
10 seconds pass
A chunkMove clears history entries before T for chunk C
The transaction continues and targets C
ChunkInfo::getShardIdAt tries to read at T, throws StaleChunkHistory error
mongos returns error to the client with TransientTransactionError label

Transactions cannot retry StaleChunkHistory (SERVER-39704) and I think this particular case could never be retried, since the history is truly gone.

If the client uses a driver's withTransaction API then TransientTransactionError will compel it to retry the transaction from the start and probably succeed. It can retry for up to 120 seconds. It would have to be unlucky for the sequence above to repeat for that long.

However, I think we can reduce the incidence of retries by keeping chunk history for at least transactionLifetimeLimitSeconds.

Assignee:: Cheahuychou Mao
Reporter:: A. Jesse Jiryu Davis
Participants:: A. Jesse Jiryu Davis, Cheahuychou Mao, Githook User, Max Hirschhorn, Randolph Tan
Votes:: 0 Vote for this issue
Watchers:: 5 Start watching this issue

Created:: May 20 2020 12:52:04 PM UTC
Updated:: Oct 29 2023 10:07:58 PM UTC
Resolved:: Sep 09 2020 07:14:19 PM UTC
Confidence Status Last Update:: 03/Sep/20 6:11 PM

Details

Description

Attachments

Activity

People

Dates