Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-54711

Invariant on shardVersion, atClusterTime relationship for sharded write transactions

    XMLWordPrintableJSON

Details

    • Fully Compatible
    • 59

    Description

      When a moveChunk on a collection commits, the shardVersion of the collection is advanced and a timestamp is chosen such that:

      • Reads of moved documents after that timestamp on the recipient are considered "owned"
      • Reads of moved documents after that timestamp on the donor are considered "orphaned"

      Also note that a donor does not eagerly delete or mark its orphans in a way that would generate a storage engine write conflict. The orphans are eventually deleted at some arbitrary point in the future.

      Updating documents that are concurrently participating in a moveChunk result in those updates being transferred ("xfer mods") from the donor -> recipient. Those updates must be applied in a critical section before the moveChunk can be committed.

      Sharded transactions, however operate against a snapshot defined by the atClusterTime attached by a mongoS. Performing a transactional sharded update on a document that's concurrently being moved with a stale atClusterTime can result in write skew and lost data.

      This correctness problem is protected by the use of a shard version. But first consider this scenario that uses a stale read timestamp without a shard versioning protocol:

      | Client                                                 | Config                              | Shard0            | Shard1            |
      |--------------------------------------------------------+-------------------------------------+-------------------+-------------------|
      | Insert A                                               |                                     | Insert A @ TS(10) |                   |
      |                                                        | MoveChunkStarts                     |                   |                   |
      |                                                        |                                     |                   | Insert A @ TS(20) |
      |                                                        | MoveChunkCommits ValidAfter: TS(30) |                   |                   |
      | Delete A (broadcast transaction) atClusterTime: TS(15) |                                     | Delete A @ 40     | `A` Not Found     |
      

      In this example, a client inserts and deletes a document and received a success for both requests. However the document still remains as it wasn't deleted from the proper shard.

      In this example, the move chunk committing generates a new shard version which we'll call SV. If the client request is operating with an atClusterTime: TS(15), but also claims to be using the fresh shard version: SV, the same bug would manifest.

      Thus we can conclude there is a relationship that must be maintained for correctness of the system with chunk migrations and transactions that do updates:
      For a shard receiving a startTransaction request with a shardVersion SV, the request's atClusterTime must be at least the validAfter timestamp associated SV.

      Attachments

        Activity

          People

            sergi.mateo-bellido@mongodb.com Sergi Mateo Bellido
            daniel.gottlieb@mongodb.com Daniel Gottlieb (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: