[SERVER-64641] Deadlock invariant tripped in shard split SingleServerDiscoveryMonitor Created: 18/Mar/22  Updated: 29/Oct/23  Resolved: 22/Mar/22

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: 6.0.0-rc0

Type: Bug Priority: Major - P3
Reporter: Matt Broadstone Assignee: Didier Nadeau
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
Backwards Compatibility: Fully Compatible
Operating System: ALL
Participants:
Linked BF Score: 140

 Description   

The source of the issue appears to be improper shutdown of EventsPublisher/SingleServerDiscoveryMonitor. Following a similar logic to StreamableReplicaSetMonitor should enable us to gracefully shutdown these components.

 

From this build failure:
https://spruce.mongodb.com/task/mongodb_mongo_master_enterprise_rhel_80_64_bit_dynamic_all_feature_flags_required_serverless_patch_b1dc7f546a006efa5edf063286e4368ca603fe48_62349a030ae6061771e6d5da_22_03_18_14_41_15/tests?execution=0&sortBy=STATUS&sortDir=ASC

[js_test:shard_split_basic_test] d20270| 2022-03-18T15:16:15.247+00:00 I  -        4333222 [ShardSplitDonorService-3] "RSM received error response","attr":{"host":"ip-10-122-17-171.ec2.internal:20275","error":"ShutdownInProgress: Shutdown in progress","replicaSet":"","response":{}}
[js_test:shard_split_basic_test] d20270| 2022-03-18T15:16:15.247+00:00 F  -        5106800 [ShardSplitDonorService-3] "Theoretical deadlock found on use of latch","attr":{"reason":"Latch acquired after other latch of lower level","latch":{"name":"TopologyEventsPublisher::_eventQueueMutex","latchId":11855,"level":6,"file":"src/mongo/client/sdam/topology_listener.h","line":99},"latchesHeld":[{"name":"SingleServerDiscoveryMonitor::mutex","latchId":11858,"level":4,"file":"src/mongo/client/server_discovery_monitor.cpp","line":85}]}
[js_test:shard_split_basic_test] d20270| 2022-03-18T15:16:15.247+00:00 F  ASSERT   23089   [ShardSplitDonorService-3] "Fatal assertion","attr":{"msgid":5106800,"file":"src/mongo/util/latch_analyzer.cpp","line":229}
[js_test:shard_split_basic_test] d20270| 2022-03-18T15:16:15.247+00:00 F  ASSERT   23090   [ShardSplitDonorService-3] "\n\n***aborting after fassert() failure\n\n"
[js_test:shard_split_basic_test] d20270| 2022-03-18T15:16:15.247+00:00 F  CONTROL  4757800 [ShardSplitDonorService-3] "Writing fatal message","attr":{"message":"Got signal: 6 (Aborted).\n"}



 Comments   
Comment by Githook User [ 22/Mar/22 ]

Author:

{'name': 'Didier Nadeau', 'email': 'didier.nadeau@mongodb.com', 'username': 'nadeaudi'}

Message: SERVER-64641 Shutdown listeners upon completion of recipient future in shard split
Branch: master
https://github.com/mongodb/mongo/commit/433fd32967a9e78ff896affb4be6d69ea4f5bcd0

Generated at Thu Feb 08 06:00:48 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.