-
Type: Bug
-
Resolution: Fixed
-
Priority: Major - P3
-
Affects Version/s: None
-
Component/s: Sharding
-
Fully Compatible
-
ALL
-
v4.2, v4.0
-
Sharding 2019-10-07
-
8
The mergeChunk command will fail if there is a config failover just after _configsvrCommitChunkMerge finishes and the new primary does not have the updated metadata in its majority committed snapshot. The following is the scenario in which this happens:
1. Shard is running mergeChunk, sends _configsvrCommitChunkMerge to the config server.
2. Config server completes _configsvrCommitChunkMerge and updates its local metadata.
3. Config server primary steps down immediately.
4. Shard gets response from config server and flushes its filtering metadata before checking the response form the config. This will refresh from the new config primary, which does not have the updated metadata in its majority snapshot yet.
5. Shard gets a write concern error from the config.
6. The command is retried. The will resend _configsvrCommitMergeChunk to the new primary with the old chunks to be merged.
7. The new primary now has the updated metadata in its majority commit snapshot, so will fail to find the chunks to be merged and fail the command .