Loading...

XML

Word

Printable

JSON

Type: Bug
Resolution: Fixed
Priority: Major - P3
Fix Version/s: 4.3.1, 4.2.2, 4.0.14
Affects Version/s: None
Component/s: Replication
Labels:
None

Backwards Compatibility:
Fully Compatible
Operating System:
ALL
Backport Requested:

v4.2, v4.0, v3.6, v3.4
Linked BF Score:
8
CAR Domain/s:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

In periodic_kill_secondaries.py we call _kill_secondaries() at the start of each test. For each secondary, that function, disables the rsSyncStopApply failpoint, sleeps 100 ms, then kills the secondary.

In sync_tail.cpp, in _oplogApplication, we have:

        if (MONGO_FAIL_POINT(rsSyncApplyStop)) {
            while (MONGO_FAIL_POINT(rsSyncApplyStop)) {
                // Tests should not trigger clean shutdown while that failpoint is active. If we
                // think we need this, we need to think hard about what the behavior should be.
                if (inShutdown()) {
                    severe() << "Turn off rsSyncApplyStop before attempting clean shutdown";
                    fassertFailedNoTrace(40304);
                }
                sleepmillis(10);
            }
        }

I think there's a clear race, if the oplog application thread happens to hang for longer than 100 ms between checking MONGO_FAIL_POINT(rsSyncApplyStop) and checking inShutdown(), then periodic_kill_secondaries.py can turn off the failpoint and start shutting down mongod during that hang. By the time the thread calls inShutdown(), its value is true and we fassert.

As the comment says, we need to think hard about what the behavior should be. One idea is for periodic_kill_secondaries.py to wait to shut down mongod until the mongod code has definitely left the while loop; we could add a log message after the while loop which periodic_kill_secondaries.py could wait for.

I prefer a different idea: Let's handle the case where rsSyncApplyStop is still enabled when mongod shuts down. As a side effect, this change will also fix the race condition if rsSyncApplyStop is disabled immediately before mongod shuts down. We can simply exit the while loop if inShutdown() is true, and from there we proceed to the normal shutdown path.

Assignee:: A. Jesse Jiryu Davis
Reporter:: A. Jesse Jiryu Davis
Participants:: A. Jesse Jiryu Davis, Githook User
Votes:: 0 Vote for this issue
Watchers:: 2 Start watching this issue

Created:: Sep 28 2019 10:25:51 PM UTC
Updated:: Oct 29 2023 10:16:41 PM UTC
Resolved:: Oct 05 2019 03:10:46 AM UTC

Details

Description

Attachments

Activity

People

Dates