[SERVER-83153] Consider handling WouldChangeOwningShard errors at a lower level Created: 12/Nov/23  Updated: 12/Dec/23

Status: Backlog
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Improvement Priority: Major - P3
Reporter: Kaitlin Mahar Assignee: Backlog - Cluster Scalability
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Assigned Teams:
Cluster Scalability
Participants:

 Description   

I'm filing this ticket as a suggestion to be considered whenever PM-2015 is done and we switch to using the internal transactions API for handling shard key updates that change a document's owning shard. 

Currently, the logic to handle WouldChangeOwningShard errors lives at the command processing layer on mongos. I think this logic is arguably better suited to live at the execution layer (batch_write_exec / bulk_write_exec) of the code.

Conceptually, I think this type of write is similar to other types of special writes we handle at the execution layer, such as writes that use the two-phase protocol or retryable timeseries updates, where we have a single user write statement that ends up getting executed as multiple transactional write statements.

I think WCOS could be treated like a "retryable error" akin to StaleConfig where, once we receive that error on our first attempt to execute the write, we mark it is a WouldChangeOwningShard write and save the WCOS info, and indicate that it needs retargeting, and then on our next round of targeting/execution we can execute it as the transactional delete + insert.

I think this change would help to lift the restriction we currently have that WCOS writes must be sent in their own batch from the client. The current design contributes to that restriction because for an ordered batch/bulk write, the execution layer code will always stop execution of the batch after seeing an error.  We can't just tell that logic to continue after seeing WCOS though, because we don't know whether the logic to handle the WCOS will end up succeeding later or not and so if we execute the later writes before WCOS is handled we can end up violating the contract of ordered writes. If the execution layer were to handle WCOS directly, then it could just continue to process more writes in the batch as normal after performing the WCOS update, without having to return control to the command layer.



 Comments   
Comment by Kaitlin Mahar [ 15/Nov/23 ]

max.hirschhorn@mongodb.com Nice! That sounds even better.

Comment by Max Hirschhorn [ 14/Nov/23 ]

I agree with Kaitlin that the WouldChangeOwningShard error ought to be handled at a lower layer of the system. I think we can get the behavior of updating a document's shard key value to be handled internally by mongod once shards are able to add new participants to transactions (PM-2844) when the client operation is already inside a transaction. When the client operation is part of a retryable write then a retryable internal transaction can be issued between the two shards by the shard which currently owns the document.

To say it another way, I expect we can eventually remove the WouldChangeOwningShard error altogether and instead do the write on the remote shard within UpdateStage directly.

Comment by Jason Zhang [ 14/Nov/23 ]

I do like this idea, since we already have precedent with special cased writes (timeseries, writewithoutshardkey, _id, etc.) It would require us to add in a new write type (WouldChangeOwningShard) that transactionally executes the delete and insert likely using the transaction API. Certainly do-able just would require a bit of work to add in and track this new write type (and delete the old path).

Generated at Thu Feb 08 06:51:23 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.