[SERVER-85275] Resharding oplog application should ignore DuplicateKey error Created: 16/Jan/24  Updated: 01/Feb/24

Status: Needs Scheduling
Project: Core Server
Component/s: Sharding
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Cheahuychou Mao Assignee: Backlog - Cluster Scalability
Resolution: Unresolved Votes: 0
Labels: cs-product-sync, resharding-improvements
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
Assigned Teams:
Cluster Scalability
Operating System: ALL
Participants:

 Description   

Resharding like replication applies oplog entries in batches using multiple parallel threads. Oplog entries that touch the same document are batched together and applied in the same thread. So oplog application in resharding (and replication) preserves the (timestamp, _id) order; however, it doesn't preserve the overall write order. Consider a collection with a unique index {a: 1}, we insert the document {_id: 1, a: "foo"} and then delete {_id: 1, a: "foo"} and then insert {_id: 2, a: "foo"}. Resharding would apply the oplog entries in two threads:

  • Thread 1: insert {_id: 1, a: "foo"}, delete {_id: 1, a: "foo"}
  • Thread 2: insert {_id: 2 a: "foo"}

So if Thread 2 runs completely before Thread 1 if Thread 2 interleaves with Thread 1, then oplog application would end up with a DuplicateKey error. It should just ignore this DuplicateKey error just like what replication oplog application does today.



 Comments   
Comment by Max Hirschhorn [ 01/Feb/24 ]

We decided to go forward with the mongosync approach. Resharding will create all secondary indexes as {unique: false} indexes and use the collMod procedure to convert relevant ones to {unique: true} indexes (SERVER-61158) during resharding's critical section after writes on the recipient shard quiesced. The speed of this conversion process must be assessed to confirm it is compatible with the current resharding critical section window.

Comment by Max Hirschhorn [ 17/Jan/24 ]

Alternatively, we could have resharding's oplog application phase create all secondary indexes as {unique: false} indexes and use the collMod procedure to convert relevant ones to {unique: true} indexes (SERVER-61158) during resharding's critical section after writes on the recipient shard quiesced. The collMod procedure would confirm the resulting index is actually unique. This is the technique used by mongosync.

There is new risk with relaxing index constraints in that it becomes possible for resharding to not only propagate existing index inconsistencies of the source collection but also—through bugs specific to resharding's implementation—introduce new index inconsistencies. For comparison, chunk migration does not relax index constraints. I feel like engaging with the Product team can be helpful here because there may be cases such as draining a shard where we don't want manual intervention to be required yet in doing so we must also accept indexes are not guaranteed to be consistent on the recipient shards following the shard's removal.

Comment by Max Hirschhorn [ 17/Jan/24 ]

In addition to not having resharding's oplog application phase fail due to a DuplicateKey error, we should similarly relax all index constraints during resharding's oplog application phase. Relaxing index constraints can be achieved with OperationContext::setEnforceConstraints(false) and is the technique used by replication's secondary oplog application as well as TenantOplogApplier for Serverless.

Generated at Thu Feb 08 06:57:19 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.