Loading...

XML

Word

Printable

JSON

Type: Bug
Resolution: Fixed
Priority: Major - P3
Fix Version/s: 4.3.1
Affects Version/s: None
Component/s: Sharding
Labels:
- sharding-csrs-stepdown-only
- sharding-wfbf-day

Backwards Compatibility:
Fully Compatible
Operating System:
ALL
Backport Requested:

v4.2, v4.0
Sprint:
Sharding 2019-10-07
Linked BF Score:
8
Confidence Status:
None
Work Order:
3
CAR Domain/s:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

The mergeChunk command will fail if there is a config failover just after _configsvrCommitChunkMerge finishes and the new primary does not have the updated metadata in its majority committed snapshot. The following is the scenario in which this happens:

1. Shard is running mergeChunk, sends _configsvrCommitChunkMerge to the config server.
2. Config server completes _configsvrCommitChunkMerge and updates its local metadata.
3. Config server primary steps down immediately.
4. Shard gets response from config server and flushes its filtering metadata before checking the response form the config. This will refresh from the new config primary, which does not have the updated metadata in its majority snapshot yet.
5. Shard gets a write concern error from the config.
6. The command is retried. The will resend _configsvrCommitMergeChunk to the new primary with the old chunks to be merged.
7. The new primary now has the updated metadata in its majority commit snapshot, so will fail to find the chunks to be merged and fail the command .

Assignee:: Janna Golden
Reporter:: Janna Golden
Participants:: Githook User, Janna Golden
Votes:: 0 Vote for this issue
Watchers:: 3 Start watching this issue

Created:: Sep 09 2019 03:34:06 PM UTC
Updated:: Oct 29 2023 10:17:23 PM UTC
Resolved:: Sep 24 2019 03:04:19 PM UTC
Confidence Status Last Update:: 23/Sep/19 2:52 PM

Details

Description

Attachments

Activity

People

Dates