Loading...

XML

Word

Printable

JSON

Type: Bug
Resolution: Fixed
Priority: Major - P3
Fix Version/s: 4.4.0-rc4, 4.7.0
Affects Version/s: 4.5.1, 4.4.0-rc1
Component/s: Replication
Labels:
- safe-reconfig-related

Backwards Compatibility:
Fully Compatible
Operating System:
ALL
Backport Requested:

v4.4
Steps To Reproduce:
Hide

Applying this diff (force_reconfig_drain_mode_repro.diff) on this commit and running the following commands should reproduce the bug:

ninja -j400 build/ninja/mongo/db/repl/db_repl_coordinator_test build/ninja/mongo/db/repl/db_repl_coordinator_test --suite ReplCoordTest --filter NodeReturnsNotMasterWhenRunningForceReconfigWhileInDrainMode
Show
Applying this diff ( force_reconfig_drain_mode_repro.diff ) on this commit and running the following commands should reproduce the bug: ninja -j400 build/ninja/mongo/db/repl/db_repl_coordinator_test build/ninja/mongo/db/repl/db_repl_coordinator_test --suite ReplCoordTest --filter NodeReturnsNotMasterWhenRunningForceReconfigWhileInDrainMode
Sprint:
Repl 2020-05-04
Linked BF Score:
42
Confidence Status:
None
Work Order:
3
CAR Domain/s:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

After a node has been elected primary and drained the ops from its buffer, it will check if it needs to run a reconfig to increment its config term. It does this under the replication coordinator mutex, but then releases the lock before running the reconfig. If a force reconfig is running concurrently it may install a new config with term -1 after we do this check and release our lock but before we run the reconfig. If this happens, we will then try to run a reconfig where we set the config version to the version installed by the force reconfig, and the config term to the node's current term. If the force reconfig installed version 'version' and the node's current term is 'term', then we will run a reconfig to (version, term), while our current config is (version, -1). Since we ignore terms for config comparison if either term is -1, this will not pass the validation check that the new config has a newer version and term than the current config. We will return this error and then fassert.

To address this, we may want to consider preventing force reconfigs from running concurrently with a node while in drain mode. For non force reconfigs, we should already prevent this since we check canAcceptNonLocalWrites, but we bypass these checks for force reconfigs, since they can run on a secondary.

- - Sort By Name
  - Sort By Date
  - Ascending
  - Descending
  - Thumbnails
  - List
  - Download All

force_reconfig_drain_mode_repro.diff
3 kB
Apr 17 2020 06:19:49 PM UTC

related to

SERVER-47142 Check primary before writing replset config and no-op

Closed

Assignee:: Will Schultz
Reporter:: Will Schultz
Participants:: Githook User, Siyuan Zhou, Will Schultz
Votes:: 0 Vote for this issue
Watchers:: 3 Start watching this issue

Created:: Apr 17 2020 06:22:05 PM UTC
Updated:: Oct 29 2023 10:09:19 PM UTC
Resolved:: Apr 22 2020 03:07:58 PM UTC
Confidence Status Last Update:: 20/Apr/20 8:12 PM

Details

Description

Attachments

Attachments

Issue Links

Forms

Activity

People

Dates