[SERVER-61007] ReplSetGetStatus calls storage with no lock Created: 26/Oct/21  Updated: 07/Dec/23  Resolved: 01/Nov/21

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: 5.2.0

Type: Bug Priority: Major - P3
Reporter: Matthew Russotto Assignee: Huayu Ouyang
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
Problem/Incident
causes SERVER-83955 Fix wrong warning messages in ReplSet... Closed
Related
Backwards Compatibility: Fully Compatible
Operating System: ALL
Sprint: Repl 2021-11-01
Participants:
Linked BF Score: 151

 Description   

If we do a replSetGetStatus with getLastStableRecoveryTimestamp, we can access storage with no lock; this races with shutdown. File Copy Based Initial Sync makes this much more likely to happen (by shutting down storage in times other than shutdown)

https://github.com/10gen/mongo/blob/02add56a2100bef135281938a0cadaf374279f03/src/mongo/db/repl/repl_set_commands.cpp#L138

We should fix by doing what we do in curop

http://morningstar/mongodb/source/src/mongo/db/curop.cpp#460

Try to take a global lock with a very short timeout; if we don't get it, just return that we have no stable recovery timestamp.

We might also consider dasserts or invariants in the storage interface for this and similar cases; however, some of the routines take ServiceContext and not OperationContext and so don't have access to the locker.



 Comments   
Comment by Githook User [ 29/Oct/21 ]

Author:

{'name': 'Huayu Ouyang', 'email': 'huayu.ouyang@mongodb.com', 'username': 'huayu-ouyang'}

Message: SERVER-61007 ReplSetGetStatus calls storage with no lock
Branch: master
https://github.com/mongodb/mongo/commit/b08e22b646d2ba2893bf890bf20d25ce5d4ff1b6

Generated at Thu Feb 08 05:51:18 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.