Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-78050

Chunk Migration Can Lose Data If Processing Deferred Modifications

    • Fully Compatible
    • ALL
    • v7.0
    • Sharding NYC 2023-06-26
    • 135

      If a update is performed in a transaction, we can't immediately read back the document if we don't already have the post-image, so we instead insert it into a list of deferred updates. These deferred updates will be processed in nextModBatch by reading the latest version of the document when the recipient shard calls _transferMods.

      Next, the ids of the documents that have changed will be pulled from the update list, and those documents will be read in order to get the latest state to transfer to the recipient.

      However, if we have already read documents while processing the deferred updates, a snapshot will be opened and pinned to the operation context, continuing to be used when we read later to get the latest state of the documents in the update list.

      This means that it is possible for an update to read from a stale snapshot during the following sequence of events:

      1. Deferred updates are processed, a snapshot is opened
      2. Another thread updates a document in the chunk being moved, adding its id to the update list
      3. The updates list is spliced in nextModsBatch
      4. The state of the documents in the updates list is read using the same snapshot as in step 1, prior to the update being made in step 2
      5. The update is lost

      Calling abandonSnapshot on the operation context's recovery unit after splicing the update list should be sufficient to ensure that we will read from a snapshot at least as recent as the updates in the list, though it's not clear if this is the best long term solution to the problem.

            Assignee:
            brett.nawrocki@mongodb.com Brett Nawrocki
            Reporter:
            brett.nawrocki@mongodb.com Brett Nawrocki
            Votes:
            0 Vote for this issue
            Watchers:
            11 Start watching this issue

              Created:
              Updated:
              Resolved: