[SERVER-39776] Initial sync and replication threads simultaneous startup and shutdown races Created: 22/Feb/19  Updated: 29/Oct/23  Resolved: 16/Aug/19

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: 4.2.1, 4.3.1

Type: Bug Priority: Major - P3
Reporter: Matthew Russotto Assignee: Matthew Russotto
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Depends
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Requested:
v4.2
Sprint: Repl 2019-07-15, Repl 2019-07-29, Repl 2019-08-12, Repl 2019-08-26
Participants:
Linked BF Score: 57

 Description   

If a mongod is shut down while it is still starting up (but after its config state is set to steady state), there are at least two races

1) The initial syncer may be created after we would have shut it down. This can be fixed by checking for _inShutdown in the critical section in _startDataReplication where we create the initial syncer; if it is set we should not create the initial syncer.

2) The data replication threads in ReplicationCoordinatorExternalStateImpl may be started after shutdown is called. This can be fixed by both setting _inShutdown to true even if _startedThreads is false in ReplicationCoordinatorExternalStateImpl::shutdown(), and also checking _inShutdown in ReplicationCoordinatorExternalStateImpl::startThreads.



 Comments   
Comment by Githook User [ 23/Aug/19 ]

Author:

{'name': 'Matthew Russotto', 'email': 'matthew.russotto@10gen.com', 'username': 'mtrussotto'}

Message: SERVER-39776 Fix race between initial syncer startup, data replication thread startup and shutdown

(cherry picked from commit d362c1c39ca79dd20e0aa6e9f93171fc5bd2cdec)
Branch: v4.2
https://github.com/mongodb/mongo/commit/7897a0a4ef9ee0e6beb9d384f5ea5a6e7c187fce

Comment by Githook User [ 23/Aug/19 ]

Author:

{'name': 'Matthew Russotto', 'email': 'matthew.russotto@10gen.com', 'username': 'mtrussotto'}

Message: SERVER-39776 Remove legacy UniqueLock and LockGuard from repl_coordinator_external_state_impl.cpp

(cherry picked from commit 025c02738625a57ff738942b660832b794510fb1)
Branch: v4.2
https://github.com/mongodb/mongo/commit/adecec418601ccc2690c4e4750d06087ad43fb10

Comment by Githook User [ 16/Aug/19 ]

Author:

{'name': 'Matthew Russotto', 'email': 'matthew.russotto@10gen.com', 'username': 'mtrussotto'}

Message: SERVER-39776 Fix race between initial syncer startup, data replication thread startup and shutdown
Branch: master
https://github.com/mongodb/mongo/commit/d362c1c39ca79dd20e0aa6e9f93171fc5bd2cdec

Comment by Githook User [ 16/Aug/19 ]

Author:

{'name': 'Matthew Russotto', 'email': 'matthew.russotto@10gen.com', 'username': 'mtrussotto'}

Message: SERVER-39776 Remove legacy UniqueLock and LockGuard from repl_coordinator_external_state_impl.cpp
Branch: master
https://github.com/mongodb/mongo/commit/025c02738625a57ff738942b660832b794510fb1

Comment by Matthew Russotto [ 31/Jul/19 ]

Code Review URL: https://mongodbcr.appspot.com/502410005/

Generated at Thu Feb 08 04:53:05 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.