[SERVER-53257] Determine whether we need const CollectionShardingState/DatabaseShardingState instances for Lock-Free Reads Created: 07/Dec/20  Updated: 08/Feb/21  Resolved: 08/Feb/21

Status: Closed
Project: Core Server
Component/s: Storage
Affects Version/s: None
Fix Version/s: None

Type: Task Priority: Major - P3
Reporter: Dianna Hohensee (Inactive) Assignee: Dianna Hohensee (Inactive)
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Duplicate
duplicates SERVER-53392 Determine what longer term sharding c... Closed
Problem/Incident
causes SERVER-54401 Move shard filtering metadata out of ... Closed
causes SERVER-54402 CollectionShardingState and DatabaseS... Closed
causes SERVER-54403 The paradigm of acquiring DSSLocks an... Closed
causes SERVER-54404 Move the dbVersion check out of AutoG... Closed
Related
related to SERVER-51319 Check DatabaseShardingState::checkDbV... Closed
Sprint: Execution Team 2020-12-28, Execution Team 2021-01-11, Execution Team 2021-01-25, Execution Team 2021-02-08, Execution Team 2021-02-22
Participants:

 Description   

Currently we're adding shared_ptr getters to the interfaces in SERVER-51319.

Ideally, CSS::getShared, etc., would return const pointers, but would require further interface changes to accept const pointers. Also, without const pointers, or more accurately without copy-on-write, the metadata instance fetched can be modified during a read. (internally, Collection/DatabaseShardingState have concurrency control via mutex, but state still changes??)

Maybe this is safe, but need to double check, particularly with things like the resharding project coming up?



 Comments   
Comment by Dianna Hohensee (Inactive) [ 08/Feb/21 ]

I have filed several SERVER tickets based on the above investigation and thoughts. Closing this ticket as resolved.

Comment by Dianna Hohensee (Inactive) [ 04/Feb/21 ]
  • DatabaseShardingState and CollectionShardingState use DSSLock and CSSLock mutex RAII types that can only be accessed AFTER the DSS or CSS is fetched. A collection or database level-lock currently protects the DSS::get() and CSS::get() methods. We will not have coll and db locks in not too distant future, so these expectations must change. Either DSS and CSS must always be safe to access without lock or mutex – the current implementation is, but not officially –, or the DSSLock and CSSLock concepts need to work somehow before fetching the DSS or CSS. (currently the DSS/CSSLock types lock the mutex IN the DSS/CSS).
    • FYI, the Execution team has intentions of removing the Database class eventually. Today, it mostly behaves as a wrapper around operations applied to all Collection}}s in the {{CollectionCatalog with the same db namespace – lots of for loops. It still contains a small amount of in-memory state that needs to be moved elsewhere before we can eliminate – or rework as a utility library – the class entirely.
  • The DatabaseShardingState and CollectionShardingState classes are considered global and unversioned state, I believe, merely accessors for various sharding state. Therefore, it would best to be able to safely fetch versioned state THROUGH the DSS and CSS objects, to then stash for a lock-free read operation to use until finished.
    • An alternative would be to make these classes copy-on-write, so a lock-free read can grab a versioned instance of the CSS or DSS, but that doesn’t seem desirable from past conversations.
    • As a note, the execution layer Collection class is copy-on-write and can have several Collection instances for a single collection. Furthermore, a Collection instance can have versioned decorations OR unversioned shared decorations via decorating the ShardCollectionDecorations that exists as shared state across all Collection instances for a single collection.
  • The shard filtering/routing information should be moved from the query layer to the AutoGetCollection*/CollectionPtr interfaces
    • With this change, we should reconsider whether the AutoGetCollection* classes still need to explicitly check the shardVersion, as fetching the filtering metadata does the check implicitly.
    • ScopedCollectionDescription, or whatever class is the filtering metadata, must not have any other sharding linking dependencies, and should be documented strongly to keep it that way. We do NOT want to link sharding into the execution catalog / code layer.
  • The dbVersion check should be moved out of the AutoGetDb and AutoGetCollectionForReadLockFree helpers and instead explicitly called in operations where needed. Execution plans to eliminate database level locking in the near future. (this would likely be Execution work)
  • Query has a use case for doing an aggregation on a view namespace wherein they must check whether the underlying collection is sharded or not — an optimization to reduce round trips between mongos and shard. Aggregation is running lock-free, so I recommended using CSS::getSharedForLockFreeReads() that bypasses the collection lock requirements for lock-free read operations. It seems likely that we will have an increasing need to access the CSS without collection locks, especially down the line when Execution runs projects to remove more locks.
  • The CSS::getSharedForLockFreeReads() should return a const object. This requires wider changes to allow the CSSLock related code, and maybe some other places, to operate on a const CSS object. This would prevent ostensible readers from running CSS functions that may modify the CSS or other sub-state, which is not currently protected against.
    • Sharding may want to consider changes to the CSS and/or DSS classes such that there can be const readers and writers such that readers will not be allowed/able to do something they should not by accident

@ kaloian.manassiev these are the changes that I've thought of to make the Execution-Sharding layers smoother in a lock-free and forward thinking world. They are probably a good discussion starting point. at least. I tried to limit the suggestions to those at immediate hand, and to provide some larger context of which to be aware. CC geert.bosch

Generated at Thu Feb 08 05:30:23 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.