[SERVER-41869] Reverse mutex acquisition order in CatalogCache::_scheduleCollectionRefresh Created: 21/Jun/19  Updated: 29/Oct/23  Resolved: 03/Jul/19

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: 3.6.13, 4.0.10
Fix Version/s: 3.6.14, 4.0.11

Type: Bug Priority: Major - P3
Reporter: Randolph Tan Assignee: Kaloian Manassiev
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Depends
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Requested:
v3.6
Sprint: Sharding 2019-07-15
Participants:
Case:

 Description   

CatalogCache::_scheduleCollectionRefresh holds CatalogCache::_mutex while calling ShardServerCatalogCacheLoader::getsChunkSince, which will try to grab ShardServerCCL::_mutex at the beginning:
https://github.com/mongodb/mongo/blob/r4.0.10/src/mongo/s/catalog_cache.cpp#L603

Inside async task, tries to run callback while holding ShardServerCLL::_mutex:
https://github.com/mongodb/mongo/blob/r4.0.10/src/mongo/db/s/shard_server_catalog_cache_loader.cpp#L416

and the callback tries to grab the CatalogCache::_mutex:
https://github.com/mongodb/mongo/blob/r4.0.10/src/mongo/s/catalog_cache.cpp#L577



 Comments   
Comment by Githook User [ 03/Jul/19 ]

Author:

{'name': 'Kaloian Manassiev', 'email': 'kaloian.manassiev@mongodb.com', 'username': 'kaloianm'}

Message: SERVER-41869 On term mismatch do not invoke the getChunkSince callback under mutex

As part of this change also backports the following cleanup:

  • Use alias for the callback of CatalogCacheLoader::getChunksSince
  • Use StringMap in CollectionShardingState instead of std::unordered_map

(cherry picked from commit cb0393248d26e21e69efde15d9d3965293ead29b)
(cherry picked from commit 059b8a9dc777e3940caa26f2a909d4988b605645)
Branch: v3.6
https://github.com/mongodb/mongo/commit/6296d2f907565ec44410a43868ee41cf46fbdd16

Comment by Githook User [ 03/Jul/19 ]

Author:

{'name': 'Kaloian Manassiev', 'username': 'kaloianm', 'email': 'kaloian.manassiev@mongodb.com'}

Message: SERVER-41869 Rename `struct dbTask` to DBTask

... to follow naming conventions

(cherry picked from commit b919fb48eb611b3c8cbba9d7f03f6df1d25d4cd5)
Branch: v4.0
https://github.com/mongodb/mongo/commit/6fa35fa8ffa5edd47a2e78e11f524a03fd99567d

Comment by Githook User [ 03/Jul/19 ]

Author:

{'name': 'Kaloian Manassiev', 'username': 'kaloianm', 'email': 'kaloian.manassiev@mongodb.com'}

Message: SERVER-41869 On term mismatch do not invoke the getChunkSince callback under mutex

As part of this change also backports the following cleanup:

  • Use alias for the callback of CatalogCacheLoader::getChunksSince
  • Use StringMap in CollectionShardingState instead of std::unordered_map

(cherry picked from commit cb0393248d26e21e69efde15d9d3965293ead29b)
Branch: v4.0
https://github.com/mongodb/mongo/commit/059b8a9dc777e3940caa26f2a909d4988b605645

Comment by Randolph Tan [ 21/Jun/19 ]

Note: this deadlock no longer exists in v4.2 due to the changes made in this commit: https://github.com/mongodb/mongo/commit/cb0393248d26e21e69efde15d9d3965293ead29b

Generated at Thu Feb 08 04:58:54 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.