Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-43229

Merge chunk can fail if config failover just after metadata committed on primary config

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major - P3
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 4.3.1
    • Component/s: Sharding
    • Backwards Compatibility:
      Fully Compatible
    • Operating System:
      ALL
    • Sprint:
      Sharding 2019-10-07
    • Linked BF Score:
      8

      Description

      The mergeChunk command will fail if there is a config failover just after _configsvrCommitChunkMerge finishes and the new primary does not have the updated metadata in its majority committed snapshot. The following is the scenario in which this happens:

      1. Shard is running mergeChunk, sends _configsvrCommitChunkMerge to the config server.
      2. Config server completes _configsvrCommitChunkMerge and updates its local metadata.
      3. Config server primary steps down immediately.
      4. Shard gets response from config server and flushes its filtering metadata before checking the response form the config. This will refresh from the new config primary, which does not have the updated metadata in its majority snapshot yet.
      5. Shard gets a write concern error from the config.
      6. The command is retried. The will resend _configsvrCommitMergeChunk to the new primary with the old chunks to be merged.
      7. The new primary now has the updated metadata in its majority commit snapshot, so will fail to find the chunks to be merged and fail the command .

        Attachments

          Activity

            People

            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: