Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-84796

Disable single write shard optimization for updates that change a shard key value

    • Cluster Scalability
    • Fully Compatible
    • ALL
    • v8.0
    • Cluster Scalability 2024-1-22, Cluster Scalability 2024-2-5, Cluster Scalability 2024-2-19, Cluster Scalability 2024-3-4, Cluster Scalability 2024-3-18, Cluster Scalability 2024-4-1, Cluster Scalability 2024-4-15, Cluster Scalability 2024-4-29

      An upsert that targets a different shard than will own the inserted document triggers the WouldChangeOwningShard protocol, which spawns a transaction that in this case would do a noop update on the initial shard and an insert on the owning shard. This will trigger the single write shard optimization (a noop update is considered a read), so the read shard will not have a durable transaction record and may "forget" the transaction. If the client retries their update and an intervening writer inserted a matching document on the read shard, the retry will update that matching document, instead of triggering the WCOS protocol, breaking the semantics of a retryable write.

      Updates that change a shard key value have never been fully retryable by design, but before the single write shard optimization was enabled in SERVER-48340, they required two phase commit, which by implementation happens to write a transaction record on the "read" shard, which guarantees it throws an error on a retry even if there is an intervening write. When full retryability is added for such updates, this problem should go away, but in the meantime, to maximize safety we should disable the single write shard optimization for writes that trigger the WCOS protocol.

            Assignee:
            abdul.qadeer@mongodb.com Abdul Qadeer
            Reporter:
            jack.mulrow@mongodb.com Jack Mulrow
            Votes:
            0 Vote for this issue
            Watchers:
            8 Start watching this issue

              Created:
              Updated:
              Resolved: