[SERVER-47566] CollectionShardingRuntime::getCriticalSectionSignal accesses _critSec without locks Created: 15/Apr/20  Updated: 29/Oct/23  Resolved: 23/Apr/20

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: None
Fix Version/s: 4.7.0

Type: Bug Priority: Major - P3
Reporter: Randolph Tan Assignee: Randolph Tan
Resolution: Fixed Votes: 0
Labels: PM-1645-Milestone-1
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
is depended on by SERVER-47563 Add concurrency control to the Collec... Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Sprint: Sharding 2020-05-04
Participants:

 Description   

Currently, _critSec by itself is not thread safe. All other methods using _critSec takes the CSRLock except for this method.

ShardingMigrationCriticalSection::exitCriticalSection() must lock such that ShardingMigrationCriticalSection::getSignal cannot read the _critSecSignal value while _critSecSignal is being written simultaneously – C++ memory bug.



 Comments   
Comment by Githook User [ 23/Apr/20 ]

Author:

{'name': 'Randolph Tan', 'email': 'randolph@10gen.com', 'username': 'renctan'}

Message: SERVER-47566 Take CSRLock inside CollectionShardingRuntime::getCriticalSectionSignal
Branch: master
https://github.com/mongodb/mongo/commit/7cde21f53367705a06e7580d63648938ea688a1d

Comment by Randolph Tan [ 16/Apr/20 ]

Yes. ExitCrit was taking X lock in 4.0 and downgraded to IX in 4.2.

Comment by Kaloian Manassiev [ 16/Apr/20 ]

Yeah, Dianna is right that it is still a bug for ShardingMigrationCriticalSection::getSignal, because even though it is called under a Collection Intent-lock everywhere, we seem to reset it under Collection IX + CSR X lock, which is not sufficient. We should be reading it with CSR S-lock.

I think this might be since 4.2 if I am not mistaken, when we relaxed some of the collection X-lock requirements. renctan, I am going to assign it to you for the next iteration, can you please confirm if it is in 4.2/4.4?

Comment by Dianna Hohensee (Inactive) [ 15/Apr/20 ]

I took a spin through the DatabaseShardingState critical section flag concurrency. It takes the DSSLock for all critical section things, so it looks good.

I got to wondering – noting it here in case someone else wonders.

Comment by Dianna Hohensee (Inactive) [ 15/Apr/20 ]

I think this is still a bug, even with collection locks taken today.

MODE_IS and MODE_IX locks are not exclusive of one another. So ShardingMigrationCriticalSection::exitCriticalSection() resets the _criticalSection value under a MODE_IX collection lock, and the ShardingMigrationCriticalSection::getSignal reads the _criticalSection value under a MODE_IS collection lock. A read and write of the same memory at the same time.

Comment by Randolph Tan [ 15/Apr/20 ]

Note: This is currently ok since all callers take collection IS, but this should be tighten up to allow lock free reads.

Generated at Thu Feb 08 05:14:33 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.