[SERVER-45191] Data race in CollectionShardingState::getCriticalSectionSignal Created: 17/Dec/19  Updated: 27/Oct/23  Resolved: 03/Nov/21

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Matthew Saltz (Inactive) Assignee: [DO NOT USE] Backlog - Sharding EMEA
Resolution: Gone away Votes: 0
Labels: sharding-wfbf-sprint
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
Assigned Teams:
Sharding EMEA
Operating System: ALL
Sprint: Sharding EMEA 2021-07-26, Sharding EMEA 2021-08-09
Participants:

 Description   

CollectionShardingState::getCriticalSectionSignal does not take or enforce any locks, and ShardingMigrationCriticalSection::getSignal does not do any synchronization either.

When entering the critical section in migration, we use an X lock, but when exiting (which calls .reset() on the signal), we only use an IX lock.

Furthermore, in setShardVersion, when we call getCriticalSectionSignal we're only holding an IS lock on the collection which does not conflict with the IX lock held in exitCriticalSection, so there's no synchronization on reading/writing the shared_ptr for the critical section signal. The same is true in _flushRoutingTableCacheUpdates.

Fortunately in the normal path for inspecting the critical section we use a shared lock on the CSR which conflicts with the exclusive lock taken on the CSR when we exit the critical section.



 Comments   
Comment by Kaloian Manassiev [ 03/Nov/21 ]

As of version 5.0, the critical section is only operated on under the CSRLock, so this ticket is no longer relevant.

Generated at Thu Feb 08 05:08:09 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.