Loading...

XML

Word

Printable

JSON

Type: Bug
Resolution: Fixed
Priority: Major - P3
Fix Version/s: 4.9.0, 4.4.2, 4.2.12
Affects Version/s: 4.2.8
Component/s: Replication
Labels:
None

Backwards Compatibility:
Fully Compatible
Operating System:
ALL
Backport Requested:

v4.4, v4.2
Sprint:
Repl 2020-10-05, Repl 2020-10-19
Confidence Status:
None
Work Order:
3
CAR Domain/s:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

The bgsync _producer() method runs in a loop until stop() is called asynchronously.

If, after this critical section is run

https://github.com/mongodb/mongo/blob/ea1ad14260cad77823a549a22a32a97621d58a35/src/mongo/db/repl/bgsync.cpp#L425

stop() is called (as it would be during step-up), then before

https://github.com/mongodb/mongo/blob/ea1ad14260cad77823a549a22a32a97621d58a35/src/mongo/db/repl/bgsync.cpp#L441

is reached, the primary clears the applied-through time (as it normally does), the applied-through time will be re-set to the last applied optime. This state will persist until the next time the node becomes secondary and applies a batch. If the node restarts during that time, it will invariant and need to be re-synced.

We need to hold the mutex and ensure the producer is running while checking if applied-through is clear and setting it.

related to

SERVER-62379 Fix deadlock between ReplicationCoordinator and BackgroundSync on stepUp

Closed

Assignee:: Samyukta Lanka
Reporter:: Matthew Russotto
Participants:: Githook User, Matthew Russotto, Samyukta Lanka, Siyuan Zhou
Votes:: 0 Vote for this issue
Watchers:: 10 Start watching this issue

Created:: Sep 10 2020 06:57:43 PM UTC
Updated:: Oct 29 2023 10:03:22 PM UTC
Resolved:: Oct 07 2020 10:49:33 PM UTC
Confidence Status Last Update:: 25/Sep/20 10:43 PM

Details

Description

Attachments

Issue Links

Forms

Activity

People

Dates