Loading...

XML

Word

Printable

JSON

Type: Improvement
Resolution: Unresolved
Priority: Major - P3
Fix Version/s: None
Affects Version/s: None
Component/s: None
Labels:
None

Assigned Teams:

Cluster Scalability
Confidence Status:
None
Work Order:
3
CAR Domain/s:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

I'm filing this ticket as a suggestion to be considered whenever PM-2015 is done and we switch to using the internal transactions API for handling shard key updates that change a document's owning shard.

Currently, the logic to handle WouldChangeOwningShard errors lives at the command processing layer on mongos. I think this logic is arguably better suited to live at the execution layer (batch_write_exec / bulk_write_exec) of the code.

Conceptually, I think this type of write is similar to other types of special writes we handle at the execution layer, such as writes that use the two-phase protocol or retryable timeseries updates, where we have a single user write statement that ends up getting executed as multiple transactional write statements.

I think WCOS could be treated like a "retryable error" akin to StaleConfig where, once we receive that error on our first attempt to execute the write, we mark it is a WouldChangeOwningShard write and save the WCOS info, and indicate that it needs retargeting, and then on our next round of targeting/execution we can execute it as the transactional delete + insert.

I think this change would help to lift the restriction we currently have that WCOS writes must be sent in their own batch from the client. The current design contributes to that restriction because for an ordered batch/bulk write, the execution layer code will always stop execution of the batch after seeing an error. We can't just tell that logic to continue after seeing WCOS though, because we don't know whether the logic to handle the WCOS will end up succeeding later or not and so if we execute the later writes before WCOS is handled we can end up violating the contract of ordered writes. If the execution layer were to handle WCOS directly, then it could just continue to process more writes in the batch as normal after performing the WCOS update, without having to return control to the command layer.

Assignee:: Unassigned
Reporter:: Kaitlin Mahar
Participants:: Jason Zhang, Kaitlin Mahar, Max Hirschhorn
Votes:: 0 Vote for this issue
Watchers:: 12 Start watching this issue

Created:: Nov 12 2023 04:12:09 PM UTC
Updated:: Apr 25 2024 04:20:49 PM UTC

Details

Description

Attachments

Activity

People

Dates