Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-83153

Consider handling WouldChangeOwningShard errors at a lower level

    • Type: Icon: Improvement Improvement
    • Resolution: Unresolved
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: None
    • Component/s: None
    • Labels:
      None
    • Cluster Scalability

      I'm filing this ticket as a suggestion to be considered whenever PM-2015 is done and we switch to using the internal transactions API for handling shard key updates that change a document's owning shard. 

      Currently, the logic to handle WouldChangeOwningShard errors lives at the command processing layer on mongos. I think this logic is arguably better suited to live at the execution layer (batch_write_exec / bulk_write_exec) of the code.

      Conceptually, I think this type of write is similar to other types of special writes we handle at the execution layer, such as writes that use the two-phase protocol or retryable timeseries updates, where we have a single user write statement that ends up getting executed as multiple transactional write statements.

      I think WCOS could be treated like a "retryable error" akin to StaleConfig where, once we receive that error on our first attempt to execute the write, we mark it is a WouldChangeOwningShard write and save the WCOS info, and indicate that it needs retargeting, and then on our next round of targeting/execution we can execute it as the transactional delete + insert.

      I think this change would help to lift the restriction we currently have that WCOS writes must be sent in their own batch from the client. The current design contributes to that restriction because for an ordered batch/bulk write, the execution layer code will always stop execution of the batch after seeing an error.  We can't just tell that logic to continue after seeing WCOS though, because we don't know whether the logic to handle the WCOS will end up succeeding later or not and so if we execute the later writes before WCOS is handled we can end up violating the contract of ordered writes. If the execution layer were to handle WCOS directly, then it could just continue to process more writes in the batch as normal after performing the WCOS update, without having to return control to the command layer.

            Assignee:
            Unassigned Unassigned
            Reporter:
            kaitlin.mahar@mongodb.com Kaitlin Mahar
            Votes:
            0 Vote for this issue
            Watchers:
            9 Start watching this issue

              Created:
              Updated: