[SERVER-59005] Storage engine clean shutdown can race with startup Created: 02/Aug/21  Updated: 29/Oct/23  Resolved: 27/Aug/21

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: 5.1.0-rc0

Type: Bug Priority: Major - P3
Reporter: Louis Williams Assignee: Benety Goh
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
Related
is related to SERVER-38128 Create a periodic task associated wit... Closed
is related to SERVER-38962 The second phase of two-phase drop sh... Closed
is related to SERVER-52562 Turn on Lock-Free reads for standalon... Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Sprint: Execution Team 2021-09-06
Participants:
Linked BF Score: 10

 Description   

In certain circumstances, storage engine startup can race with clean shutdown, and lead to the following invariant failure:

Invariant failure !listenerNotRegistered

The shutdown task that is called from the signal handler to cleanly shut down the storage engine holds a Global X lock. But the initAndListen thread, which initializes the storage engine, and which registers the TimestampMonitor listener, does not hold this lock.

The shutdown path assumes that the storage engine has been completely initialized, but that is not the case. So the server can crash if it is shut down cleanly before the storage engine finishes starting up.

I'm surprised we don't already hold the Global X lock during storage engine initialization, but perhaps we should. An alternative to taking a global lock would be to keep shutdown expeditious and permit this type of race by relaxing the existing invariant.



 Comments   
Comment by Vivian Ge (Inactive) [ 06/Oct/21 ]

Updating the fixversion since branching activities occurred yesterday. This ticket will be in rc0 when it’s been triggered. For more active release information, please keep an eye on #server-release. Thank you!

Comment by Githook User [ 27/Aug/21 ]

Author:

{'name': 'Benety Goh', 'email': 'benety@mongodb.com', 'username': 'benety'}

Message: SERVER-59005 rename TimestampMonitor::addListener() to be test-only
Branch: master
https://github.com/mongodb/mongo/commit/e502f2d3965ac4147d303e956a582b7c4eef8232

Comment by Githook User [ 27/Aug/21 ]

Author:

{'name': 'Benety Goh', 'email': 'benety@mongodb.com', 'username': 'benety'}

Message: SERVER-59005 replace TimestampMonitor::removeListener() with clearListeners
Branch: master
https://github.com/mongodb/mongo/commit/775f3f943d2c6ee13c67c495a24fffa958e98df3

Comment by Githook User [ 27/Aug/21 ]

Author:

{'name': 'Benety Goh', 'email': 'benety@mongodb.com', 'username': 'benety'}

Message: SERVER-59005 TimestampMonitor accepts initial listener on construction
Branch: master
https://github.com/mongodb/mongo/commit/efafb725229d0388549b7f7c0389dfa50f2142e5

Comment by Githook User [ 26/Aug/21 ]

Author:

{'name': 'Benety Goh', 'email': 'benety@mongodb.com', 'username': 'benety'}

Message: SERVER-59005 TimestampMonitor starts periodic job on construction
Branch: master
https://github.com/mongodb/mongo/commit/845ddb7e68889dfe32c5adec421a714dd14060f4

Comment by Benety Goh [ 24/Aug/21 ]

SERVER-52562 is not directly related but is interesting for the standalone scenario.

Comment by Benety Goh [ 24/Aug/21 ]

Each server instance registers a single TimestampListener to observe changes in TimestampMonitor::TimestampType::kMinOfCheckpointAndOldest. We register the listener at process startup and remove it at shutdown.

This was a new TimestampType constant introduced in SERVER-39962.

Comment by Benety Goh [ 24/Aug/21 ]

The StorageEngineImpl::removeListener() invariant was added in SERVER-38128.

Generated at Thu Feb 08 05:46:04 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.