[SERVER-58416] POC: Implementing a 2-phase commit protocol on the moveChunk operation Created: 12/Jul/21  Updated: 03/Aug/21  Resolved: 03/Aug/21

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: None
Fix Version/s: None

Type: Task Priority: Major - P3
Reporter: Sergi Mateo Bellido Assignee: Jordi Serra Torrens
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: Text File 0001-SERVER-58416-POC.patch    
Issue Links:
Depends
depends on SERVER-58415 Evaluate the performance of the new w... Closed
Sprint: Sharding EMEA 2021-07-12, Sharding EMEA 2021-07-26, Sharding EMEA 2021-08-09
Participants:

 Description   

The current implementation of the moveChunk will have to be refactored to fit in the scheme of a 2-phase commit protocol. The new implementation should try to overlap as much work as possible, making the happy-path as cheap as possible at the expense of making the error path more expensive.



 Comments   
Comment by Jordi Serra Torrens [ 03/Aug/21 ]

POC 0001-SERVER-58416-POC.patch :

  • Recipient enters the critical section (after the donor has entered their critical section and the recipient has finished cloning the documents)
  • Then donor commits the chunk migration to the configsvr.
  • Finally, donor refreshes its filtering metadata (as it already did), and donor issues a new command to the recipient so that it refreshes its filtering metadata and releases the critical section.

The POC only considers the happy path. The recovery has not been dealt with.
As an improvement, the filtering info refresh on the donor and on the recipient can be parallelized to happen concurrently.


The POC above makes that both donor and recipient hold the critical section during the commit of the migration on the configsvr, and refresh before exiting the critical section. As a result, both donor and recipient are always aware of which chunks they own (outside of their critical section).
This would allow shards to use their current filtering information to avoid making writes to orphaned documents.

In this POC, the updates to the donor's and recipient's config.rangeDeletions collection has not been included to happen in the criticalSection. This is not strictly necessary if we don't use config.rangeDeletions in order to figure the ownership for reads/writes filtering. If we decided to use config.rangeDeletions for that purpose, the updates would need to happen within the criticalSection. Moreover, the updates on donor and recipient should become visible at the same clusterTime, which must also be the same time as the validAfter stored in the chunk history (which is used for routing). This may be achieved by:
1. The donor reserves an oplog slot at time T
2. The donor commits the chunk migration to the configsvr with

{validAfter: T}

in the history
3. The donor commits a multi-shard txn at the reserved oplog time T on both donor and recipient, with the updates on config.rangeDeletions.

Generated at Thu Feb 08 05:44:27 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.