[SERVER-77851] Address recursive lock acquisition while shutting down the `ServerDiscoveryMonitor` Created: 06/Jun/23  Updated: 27/Oct/23  Resolved: 07/Jun/23

Status: Closed
Project: Core Server
Component/s: Internal Code
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Amirsaman Memaripour Assignee: [DO NOT USE] Backlog - Sharding NYC
Resolution: Gone away Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
Problem/Incident
is caused by SERVER-77403 Remove TaskExecutor fulfilling respon... Closed
Related
is related to SERVER-77707 Always invoke onReply callback out-of... Closed
Assigned Teams:
Sharding NYC
Operating System: ALL
Participants:
Linked BF Score: 135

 Description   

ServerDiscoveryMonitor cancels its ongoing remote calls during shutdown, while holding SingleServerDiscoveryMonitor::_mutex. This may result in recursive lock acquisition, which could result in dead-locks: https://linux.die.net/man/3/pthread_mutex_lock



 Comments   
Comment by George Wangensteen [ 07/Jun/23 ]

amirsaman.memaripour@mongodb.com max.hirschhorn@mongodb.com , just to be clear, we plan on re-merging SERVER-77403 after the changes from SERVER-77707 land. Based on my understanding of this ticket and the linked BF, that should be fine / also fix the BF. Let me know if that doesn't sound correct to you. 

Comment by Amirsaman Memaripour [ 07/Jun/23 ]

Thank you max.hirschhorn@mongodb.com for brining SERVER-77403 to my attention. Now that it is reverted, this is no longer an issue since ThreadPoolTaskExecutor no longer runs its callback (inline) upon cancellation.

Comment by Max Hirschhorn [ 06/Jun/23 ]

amirsaman.memaripour@mongodb.com, I believe we also reverted the changes from SERVER-77403 in 0943108 and so BF-28965 shouldn't have any reoccurrences. I think we can probably close SERVER-77851 as Won't Do. CC george.wangensteen@mongodb.com

Comment by Amirsaman Memaripour [ 06/Jun/23 ]

Linked SERVER-77707 as related, since if we make the suggested change to NetworkInterface, the recursive lock acquisition described here won't happen, thus this ticket can become a duplicate if SERVER-77707 gets merged first/soon.

Generated at Thu Feb 08 06:36:47 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.