Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-86249

Consider changing findAndModify behavior when concurrent transaction re-inserts matching document

    • Type: Icon: Task Task
    • Resolution: Unresolved
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: None
    • Component/s: None
    • Labels:
      None
    • Query Execution

      Consider the following scenario. There is a collection with a document _id: 123 and two separate clients running. The following sequence happens:

      • Client 1 Begins a transaction
      • Client 1 Deletes _id: 123
      • Client 1 Inserts a new document with _id: 123 (but with a new RecordId)
      • Client 2 runs a findAndModify or updateOne targeting _id: 123, and attempts to set a field 'X'
      • (Client 2 conflicts with the running transaction and is in a retry loop)
      • Client 1 commits its transaction
      • Client 2's operation completes

      Today, (in versions 4.4-7.0) client 2's findAndModify does not update anything, even though there was a document with _id: 0 the entire time. This is permitted under read-committed semantics, since queries under read-committed isolation can miss rows entirely, though it is confusing.

      What specifically causes Client 2's operation to "miss" the document?

      Client 2's operation is an UPDATE -> IDHACK plan. The plan reads from the _id index, fetches the document, and then receives a WriteConflict while attempting to update it. The UpdateStage stashes the document it read from the below stage when it gets a write conflict. After a WriteConflict, we abort our WT transaction, and start a new one at a new point in time. The UpdateStage then "recovers" its state (namely, the copy of the document it was trying to update and Record ID). It re-fetches this document by RecordId and checks whether it still matches the filter.

      Since the RecordId stashed no longer exists after Thread 1 deleted it, no document is fetched. The UpdateStage then returned NEED_TIME and in the subsequent call to work(), the IDHackStage returns EOF.

      What are our known options? (We can add to this)

      1. Update the documentation to make it clear that two documents with the same _id are not necessarily "the same document." Otherwise no change in server behavior. Today's behavior is allowed under read committed isolation, so while it's inconvenient, it's not a bug. There are also two workarounds:
        1. Thread 2 could use findAndModify and specify a sort. The sort acts as a sort+limit 1, and if the document which comes first in the sort order gets removed or doesn't match the predicate, we retry the entire operation over via this code path.
        2. Thread 1 could update the document instead of deleting and re-inserting it, which would preserve its RecordId.
      1. Change the behavior so when a document is deleted and re-inserted (Same _id, but new record ID), concurrent updates will succeed.
        1. One idea MaxH had for this was to have the IDHack stage continue seeking/fetching even after it's returned a document, when beneath a write stage. Essentially, removing the limit 1 that's baked into it today (only when beneath a write stage).
        2. Pass a flag via the UpdateParams indicating that the query is reading by _id and change this code to check that flag. This would cause the operation to behave just like findAndModify with a sort does today, without changing the IDHack stage.
        3. Make some more general change to the update code to retry when a conflict is hit and the document is later found to be missing.
          1. This would result in a perf hit for some scenarios, since it would cause operations to retry completely which don't today.

      Repro
      A repro script is attached below. It can be run with the following resmoke invocation:

      python3 buildscripts/resmoke.py run --installDir build/install/bin --suites=replica_sets fam-repro-replset.js 

       

            Assignee:
            evan.bergeron@mongodb.com Evan Bergeron
            Reporter:
            ian.boros@mongodb.com Ian Boros
            Votes:
            0 Vote for this issue
            Watchers:
            17 Start watching this issue

              Created:
              Updated: