[SERVER-57290] Shutting down SingleServerDiscoveryMonitor can lead to a nullptr dereference Created: 28/May/21 Updated: 29/Oct/23 Resolved: 29/Jun/21 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Sharding |
| Affects Version/s: | None |
| Fix Version/s: | 5.1.0-rc0 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Max Hirschhorn | Assignee: | Haley Connelly |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | sharding-wfbf-day | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||
| Backwards Compatibility: | Fully Compatible | ||||
| Operating System: | ALL | ||||
| Sprint: | Sharding 2021-06-28 | ||||
| Participants: | |||||
| Linked BF Score: | 12 | ||||
| Description |
|
SingleServerDiscoveryMonitor::_executor is reset as part of SingleServerDiscoveryMonitor::shutdown() but may still be referenced by a task running on the task executor. It looks like a similiar issue also affects the non-streamable version of the SingleServerDiscoveryMonitor (--setParameter replicaSetMonitorProtocol=sdam). |
| Comments |
| Comment by Vivian Ge (Inactive) [ 06/Oct/21 ] |
|
Updating the fixversion since branching activities occurred yesterday. This ticket will be in rc0 when it’s been triggered. For more active release information, please keep an eye on #server-release. Thank you! |
| Comment by Githook User [ 25/Jun/21 ] |
|
Author: {'name': 'Haley Connelly', 'email': 'haley.connelly@mongodb.com', 'username': 'haleyConnelly'}Message: |
| Comment by Lamont Nelson [ 28/May/21 ] |
|
I think the fix for this is to just not null the executor pointer. We check if the instance is shutdown already, which should cause the callbacks to exit. Once they all exit the instance will be destroyed. |
| Comment by Lamont Nelson [ 28/May/21 ] |
|
In general, the state of those classes should be protected by the mutex. We shouldn't need to call the shutdown method of the executor at all. I think we might be able to remove setting the executor to nullptr without consequence. It will not change after it's passed it to the constructor. The linked examples are called under the mutex here: https://github.com/mongodb/mongo/blob/8d5547cdd6c45dd68c95dcf353e459e62da5222b/src/mongo/client/server_discovery_monitor.cpp#L212. The executor's lifetime is governed by the ReplicaSetMonitorManager, which should outlive any instance of the RSM: created / shutdown |
| Comment by Max Hirschhorn [ 28/May/21 ] |
|
lamont.nelson, what exactly are the synchronization rules for accessing SingleServerDiscoveryMonitor::_executor? I see SingleServerDiscoveryMonitor::_executor is reset while holding SingleServerDiscoveryMonitor::_mutex but not every access of _executor is done while holding the mutex. I'm back to wondering if the fix here should be to explicitly call _executor->shutdown() instead of resetting _executor. It also seems deadlock prone to me that we implicitly call _executor->_join() while holding SingleServerDiscoveryMonitor::_mutex. (I'm assuming nothing else is keeping the reference alive.) |