[SERVER-57290] Shutting down SingleServerDiscoveryMonitor can lead to a nullptr dereference Created: 28/May/21  Updated: 29/Oct/23  Resolved: 29/Jun/21

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: None
Fix Version/s: 5.1.0-rc0

Type: Bug Priority: Major - P3
Reporter: Max Hirschhorn Assignee: Haley Connelly
Resolution: Fixed Votes: 0
Labels: sharding-wfbf-day
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
Backwards Compatibility: Fully Compatible
Operating System: ALL
Sprint: Sharding 2021-06-28
Participants:
Linked BF Score: 12

 Description   

SingleServerDiscoveryMonitor::_executor is reset as part of SingleServerDiscoveryMonitor::shutdown() but may still be referenced by a task running on the task executor.

It looks like a similiar issue also affects the non-streamable version of the SingleServerDiscoveryMonitor (--setParameter replicaSetMonitorProtocol=sdam).



 Comments   
Comment by Vivian Ge (Inactive) [ 06/Oct/21 ]

Updating the fixversion since branching activities occurred yesterday. This ticket will be in rc0 when it’s been triggered. For more active release information, please keep an eye on #server-release. Thank you!

Comment by Githook User [ 25/Jun/21 ]

Author:

{'name': 'Haley Connelly', 'email': 'haley.connelly@mongodb.com', 'username': 'haleyConnelly'}

Message: SERVER-57290 Prevent nullptr dereference when SingleServerDiscoveryMonitor is shutting down
Branch: master
https://github.com/mongodb/mongo/commit/93a3c12fb7bf3bbd6689c4728c27cc3cb12eb98a

Comment by Lamont Nelson [ 28/May/21 ]

I think the fix for this is to just not null the executor pointer. We check if the instance is shutdown already, which should cause the callbacks to exit. Once they all exit the instance will be destroyed.

Comment by Lamont Nelson [ 28/May/21 ]

In general, the state of those classes should be protected by the mutex. We shouldn't need to call the shutdown method of the executor at all. I think we might be able to remove setting the executor to nullptr without consequence. It will not change after it's passed it to the constructor.

The linked examples are called under the mutex here: https://github.com/mongodb/mongo/blob/8d5547cdd6c45dd68c95dcf353e459e62da5222b/src/mongo/client/server_discovery_monitor.cpp#L212.

The executor's lifetime is governed by the ReplicaSetMonitorManager, which should outlive any instance of the RSM: created / shutdown

Comment by Max Hirschhorn [ 28/May/21 ]

lamont.nelson, what exactly are the synchronization rules for accessing SingleServerDiscoveryMonitor::_executor? I see SingleServerDiscoveryMonitor::_executor is reset while holding SingleServerDiscoveryMonitor::_mutex but not every access of _executor is done while holding the mutex.

I'm back to wondering if the fix here should be to explicitly call _executor->shutdown() instead of resetting _executor. It also seems deadlock prone to me that we implicitly call _executor->_join() while holding SingleServerDiscoveryMonitor::_mutex. (I'm assuming nothing else is keeping the reference alive.)

Generated at Thu Feb 08 05:41:28 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.