[SERVER-75976] ShardingRecoveryService should provide blocking acquire API Created: 11/Apr/23  Updated: 12/May/23  Resolved: 12/May/23

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: 7.0.0-rc0
Fix Version/s: None

Type: Improvement Priority: Major - P3
Reporter: Abdul Qadeer Assignee: Sergi Mateo Bellido
Resolution: Won't Do Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
is depended on by SERVER-75977 MovePrimaryRecipientService should wa... Closed
Assigned Teams:
Sharding EMEA
Sprint: Sharding EMEA 2023-05-15
Participants:

 Description   

In the current ShardingRecoveryService API, acquireRecoverableCriticalSectionBlockWrites tasserts if the CS is held by some thread on the same namespace and expects the caller to check. This is problematic for two reasons:

  1. The caller only cares about acquiring CS and releasing it, it would need to know multiple additional APIs to check for others holding the CS.
  2. The semantics of acquire/release in critical section APIs is usually block until acquire succeeds. Between checking for CS in IS mode and acquiring, there is a race and theoretically there is a starvation problem.

ShardingRecoveryService can be improved to provide additional APIs to block until acquire and consolidate waiting/signals. This would also prevent incorrect usage bugs and simplify code. Although DDL operations are serialized at ShardingDDLCoordinator, there are cases like these where acquiring CS can fail.



 Comments   
Comment by Sergi Mateo Bellido [ 12/May/23 ]

The proposed solution is to rely on another high level synchronization mechanism instead of using the Recoverable Critical Section. This is aligned with some discussions that we internally had at Sharding EMEA.

Comment by Abdul Qadeer [ 12/May/23 ]

That makes sense, thanks for the explanation. I am fine with the solution you proposed.

Comment by Sergi Mateo Bellido [ 12/May/23 ]

The in-memory CS section is acquired as part of the commit of the WUOW associated with the write of the recoverable CS

About resharding, if it didn't have any top level synchronization it would be hitting the same kind of error. More specifically, resharding spawns a DDL Coordinator on the primary shard of the dbName before reaching the CSRS. This prevents the execution of any other DDL that could potentially acquire the CS, except for migrations. Then, for migrations, the resharding coordinator service on the CSRS disallows them during the reshard op. Thus, a this point we are sure that no one else can use the CS.

We did a hacky temporary solution for the movePrimary, but we would like to avoid exposing more synchronization levels unless it is unavoidable.  Happy to jump in a zoom if you wanna discuss further  

Comment by Abdul Qadeer [ 11/May/23 ]

sergi.mateo-bellido@mongodb.com I am not adding any in-memory CS with CollectionShardingRuntime before interacting with ShardingRecoveryService explicitly - not sure if you mean something implicit happening in the bg. Per this comment in the API, I was under the impression that an in-memory CS is acquired after persisting (which makes sense to persist before updating in-memory). The critical section is also taken currently by the resharding recipient and donor both, and both don't spawn a DDL coordinator. I am not sure if spawning a coordinator in all these places is feasible.

Comment by Sergi Mateo Bellido [ 08/May/23 ]

abdul.qadeer@mongodb.com what do you think? We discussed some time ago about this issue, I guess that at some point we will tackle but rather than fixing it at the level of the critical section we will do it trying to spawn a coordinator also on the recipient.

Comment by Sergi Mateo Bellido [ 08/May/23 ]

The Recoverable Critical Section is just an utility that added persistence to the already existing in-memory critical section (i.e. the one you obtain through the CollectionShardingRuntime). Note that the recoverable critical section keeps synchronized the persisted representation of the critical section with the in-memory one, so internally all recoverable critical sections are also in-memory critical sections.

In terms of synchronization, the Recoverable Critical Section is also similar to the in-memory one: their acquisition happen in a context in which it should be impossible that the critical section was already acquired by someone else. This is usually guaranteed through the usage of a higher level synchronization mechanism (e.g. DDL Coordinators that stop migrations). Note that in general we don't want to wait until the CS is released to do something: in most of our cases what we want is to throw an exception if the critical section is held! You have to imagine the critical section as the last resource you need to acquire to modify the shard authoritative metadata. If someone else is already modifying that metadata, you don't really know what's going to happen to the collection: perhaps the shard will get a new chunk, perhaps the collection is renamed or, even worse, perhaps it is dropped.

For your specific case, I think that the real issue is that the recipient shard of a movePrimary doesn't spawn a DDL Coordinator. Thus, the higher level synchronization mechanism I mentioned above is not present.

Generated at Thu Feb 08 06:31:31 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.