[SERVER-58417] POC: fine-grained collection critical sections Created: 12/Jul/21  Updated: 06/Dec/22  Resolved: 04/Aug/21

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: None
Fix Version/s: None

Type: Task Priority: Major - P3
Reporter: Sergi Mateo Bellido Assignee: [DO NOT USE] Backlog - Sharding EMEA
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
depends on SERVER-58418 Fine-grained implementation of the Sh... Closed
Assigned Teams:
Sharding EMEA
Sprint: Sharding 2021-07-12, Sharding EMEA 2021-07-26, Sharding EMEA 2021-08-09
Participants:

 Description   

The goal of this task is reducing the scope of the critical section acquired by the moveChunk in such a way that only blocks the chunk being migrated. Thus, concurrent CRUD operations that don’t target that chunk are not affected by the critical section.



 Comments   
Comment by Sergi Mateo Bellido [ 04/Aug/21 ]

We did two high-level explorations of possible implementations:

  • The first one would be build on top of our current infrastructure. Note that this idea has been explored before, the main changes would be:
    • We would have to keep the chunk ranges on the CS.
    • When a CRUD operation is executed, we would have to verify whether that operation is trying to access to a range that is held by the CS. Note that for insert/delete/updates we can do that in the shard op obsever, whereas for reads we would have to modify every operation.
  • The second implementation would try to move the filtering of orphans documents and the management of the CS to the Storage Execution level. The idea is quite nice and would help us to internally offer a more friendly sharding API: can we build something on top of the RecordStore that it only return us the documents that the current shard owns? Can also this new component manage the CS?
    • It will mean that we have to do a completely new implementation of the CS.
    • We discussed whether the CS could live inside the Catalog Collection. We discarded this idea because some DDL operations take advantage of acquiring the CS for a name that it is not associated to a collection yet. We also discussed to use the LockManager to implement it.
    • It will also directly imply a new implementation of the RecoverableCriticalSection since this one is built on top of the current CS.

Finally I want to say I didn't have enough time to check how to implement it, apart from some ideas. It is also important to say that in the context of this spike and in the one of the real project (PM-2423) this feature is an optimization and right now it is unclear to us whether we will really need it: it will depen on the new implementation of the migration protocol (PM-2423) and also on the impact of other projects such as PM-2321.

Generated at Thu Feb 08 05:44:28 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.