[SERVER-52683] Relax restrictions for updates to new shard key fields during resharding Created: 07/Nov/20  Updated: 06/Dec/22  Resolved: 28/Jan/21

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: None
Fix Version/s: None

Type: Task Priority: Major - P3
Reporter: Max Hirschhorn Assignee: [DO NOT USE] Backlog - Sharding Team
Resolution: Won't Do Votes: 0
Labels: PM-234-M2.5, PM-234-T-oplog-fetch
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
related to SERVER-54040 Resharding may fail to throw WouldCha... Closed
related to SERVER-54067 Enforce identical restrictions to the... Closed
related to SERVER-54096 Complete TODO listed in SERVER-52683 Closed
is related to SERVER-49825 Replicate updates changing value unde... Closed
Assigned Teams:
Sharding
Sprint: Sharding 2020-12-28, Sharding 2021-01-11, Sharding 2021-01-25
Participants:
Story Points: 3

 Description   

The changes from 6508f5d as part of SERVER-49825 reused the WouldChangeOwningShard error response handling in mongos. This comes with the following limitations:

  • the update must be performed in a batch of size 1
  • the update must be performed as a retryable write or in a multi-document transaction
  • the update must not be performed as a multi=true update (this restriction was omitted from SERVER-49825 and maybe needs to be added)

One thought would be to have the donor shard handle the WouldChangeOwningShard exception locally without bubbling it up to mongos.



 Comments   
Comment by Max Hirschhorn [ 28/Jan/21 ]

After discussing this ticket with Garaudy, we decided it isn't essential to relax these restrictions for updates to the shard key value under the new key pattern. These limitations will exist for updates on the shard key value after resharding completes. We will instead document these restrictions so users are made aware before they run reshardCollection.

Comment by Max Hirschhorn [ 06/Dec/20 ]

One thought would be to have the donor shard handle the WouldChangeOwningShard exception locally without bubbling it up to mongos.

One idea I recently had would be to

I like this approach over trying to add local WouldChangeOwningShard exception handling to the donor shard's service entry point. I suspect it would be difficult to relax the batch of size 1 and multi=true restrictions in the WouldChangeOwningShard exception handling approach because the multiple updates may trigger the WouldChangeOwningShard exception multiple times for the same command.

Another part of the OpObserverImpl::onUpdate() idea would be to avoid generating a new logical session ID for each multi-document transaction applyOps we'd generate. I think it is possible to use a single logical session ID per donor shard. The challenge here is that OpObserverImpl::onUpdate() is called concurrently by updates to different documents within the same collection and cannot use the real-time thread scheduling order as an indication of the optime order that secondaries would end up seeing the oplog entries in. My idea here would be to use the timestamp component of the generated optime as the txnNumber. This guarantees the txnNumber appears as always increasing to secondaries. If we want to think slightly less hard about a single donor replica set shard members all using the same logical session ID (because the timestamp component of an optime is only guaranteed unique once it has become majority-committed), then we could make it so the donor shard primary generates a new logical session ID in each new term.

Generated at Thu Feb 08 05:28:43 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.