[SERVER-47563] Add concurrency control to the CollectionShardingState class such that it no longer depends upon collection locks for concurrency control Created: 15/Apr/20  Updated: 27/Oct/23  Resolved: 20/Jul/20

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: None
Fix Version/s: None

Type: Task Priority: Major - P3
Reporter: Dianna Hohensee (Inactive) Assignee: [DO NOT USE] Backlog - Sharding Team
Resolution: Gone away Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
depends on SERVER-47566 CollectionShardingRuntime::getCritica... Closed
Duplicate
is duplicated by SERVER-47564 Add concurrency control to setting an... Closed
Assigned Teams:
Sharding
Participants:

 Description   

Execution's Lock Free Reads project will allow reads to run without collection and database intent (MODE_IS) locks. The CollectionShardingState class currently relies on collection locks for concurrency control. The implementations of CollectionShardingState must create their own read concurrency control mechanisms that do not rely on reads taking collection IS locks.

The CollectionShardingRuntime (an implementation of CSS) appears to already have a CSRLock that can be taken in shared and exclusive mode: perhaps its use simply needs to be expanded.

The two areas of the CollectionShardingRuntime concurrency that appear to be of particular interest are:

1. Installing new and fetching the current routing table information (CollectionMetadata) for callers.

2. The ShardingMigrationCriticalSection for each collection has no concurrency control itself. It depends upon the CollectionShardingRuntime to provide concurrency control. The two ShardingMigrationCriticalSection::entering* functions expect a collection X lock, the ShardingMigrationCriticalSection::exit* function expects a IX lock, and ShardingMigrationCriticalSection::get* presumably expects callers to hold a collection IS lock.



 Comments   
Comment by Dianna Hohensee (Inactive) [ 20/Jul/20 ]

With the resolution of SERVER-47566, I believe the sharding state has the concurrency control it needs in order to work properly without user reads holding MODE_IS coll/db locks.

Comment by Dianna Hohensee (Inactive) [ 01/May/20 ]

kaloian.manassiev, I believe we determined that sharding has no more work (after SERVER-47564) to do here. Is it alright to close this?

Kal and I discussed the CollectionShardingState lifetime compared to Collection and determined that it is safe for now and known planned changes, so we are not going to make CSS a decoration of Collection.

Comment by Dianna Hohensee (Inactive) [ 16/Apr/20 ]

I think I agree that the CollectionShardingState is concurrency safe due to mutexes, from what I can figure out. It looks like the _metadataManagerLock takes care of concurrency for fetching and setting the routing information, and the CSRLock protects the critical section reads/writes. SERVER-47566 still needs to fix a concurrency bug protecting the critical section, though.

I'll look into the decoration idea. I think I need to more details on what happens between fetching the CollectionShardingState and the Collection instances for an operation, to see what could go wrong without the atomicity of a MODE_X collection lock.

Comment by Kaloian Manassiev [ 16/Apr/20 ]

I don't think this is necessary - the CSRLock already serves this purpose, so we should be good there, but I think the returned CollectionShardingState is a direct pointer into a map, which intends to parallel the CollectionCatalog. Currently, we never delete from this map and rely that around drop and create, the collection X lock will ensure a barrier between the transitions of the objects there. However, with lock-free reads, this may be a bit hard to reason.

Since I see that the Collection class is already decorable, can I propose that we get rid of that map and put the CollectionShardingState as a decoration on Collection? That way we don't need to worry about the lifetime of the CSS and the CSRLock is only for the state of the CSS.

Generated at Thu Feb 08 05:14:32 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.