Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-48318

Risk of StaleChunkHistory errors in sharded transactions

    XMLWordPrintable

    Details

    • Type: Improvement
    • Status: Closed
    • Priority: Major - P3
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 4.7.0
    • Component/s: Sharding
    • Backwards Compatibility:
      Fully Compatible
    • Sprint:
      Sharding 2020-09-21

      Description

      While reviewing the changes for SERVER-47785 with Randolph Tan, we wondered if the previous version of the code had a bug. Before, ShardingCatalogManager::commitChunkMigration removed all chunk history entries older than 10 seconds whenever it writes a new entry. Even after, it removes all but one of them.

      A new transaction always chooses a recent timestamp, even with readConcern majority. This is the "speculative majority" behavior. But transactions have a default 60-second lifetime, and chunk history only lasts 10 seconds. Do we see the following?:

      • Start a sharded transaction
      • Choose transaction read timestamp T
      • 10 seconds pass
      • A chunkMove clears history entries before T for chunk C
      • The transaction continues and targets C
      • ChunkInfo::getShardIdAt tries to read at T, throws StaleChunkHistory error
      • mongos returns error to the client with TransientTransactionError label

      Transactions cannot retry StaleChunkHistory (SERVER-39704) and I think this particular case could never be retried, since the history is truly gone.

      If the client uses a driver's withTransaction API then TransientTransactionError will compel it to retry the transaction from the start and probably succeed. It can retry for up to 120 seconds. It would have to be unlucky for the sequence above to repeat for that long.

      However, I think we can reduce the incidence of retries by keeping chunk history for at least transactionLifetimeLimitSeconds.

        Attachments

          Issue Links

            Activity

              People

              Assignee:
              cheahuychou.mao Cheahuychou Mao
              Reporter:
              jesse A. Jesse Jiryu Davis
              Participants:
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved: