Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-65312

findAndModify pre/post image noop oplog entry forged by chunk migration donor should have the same statement id as the CRUD oplog entry it corresponds to

    • Type: Icon: Bug Bug
    • Resolution: Fixed
    • Priority: Icon: Major - P3 Major - P3
    • 6.0.0-rc0
    • Affects Version/s: None
    • Component/s: None
    • None
    • Fully Compatible
    • ALL
    • Sharding NYC 2022-04-18

      Currently, each findAndModify pre/post image noop oplog entry is assigned a statement id as follows:

      1. For the case where the side image collection is enabled, the SessionCatalogMigrationSource and the DocumentSoureceFindAndModifyImageLookup aggregation stage the statement id of each forged image oplog entry to 0.
      2. For the case where the side image collection is not enable, the OpObserverImpl indirectly sets the statement id of each image oplog entry to the statement id of the oplog entry that has the pre/post image.

      (1) leads to issues in the following case. Consider a sharded collection with only one chunk that resides on shard0.

      1. The client performs a findAndModify with a pre-image in a retryable-write session S. The findAndModify has a statement id 1.
      2. The chunk is moved from shard0 to shard1. During the migration, shard1 receives a forged pre-image noop oplog entry and an update oplog entry for the findAndModify. The noop oplog entry has statement id 0 and the update oplog entry has statement id 1.
      3. The chunk is moved from shard1 back to shard0. During the migration, shard0 receives a forged pre-image noop oplog entry and a noop oplog entry for the update oplog entry for the findAndModify. shard0 writes the forged pre-image noop oplog entry but not the update noop since checkStatementExecuted() returns false for statement id 0 but returns true for statement id 1. Upon writing the pre-image noop oplog entry, the primary does not update the config.transactions entry for S but the secondaries do since "fromMigrate" is true and "o2" is not null. As a result, the config.transactions entry for S on the primary and secondaries has mismatched content. 

      Also, if the client performs another write between step 2 and 3, the migration in step 3 can fail this uassert if the write is done in the same session as the findAndModify but with a higher txnNumber and this uassert if the write is done in a new session. The reason is that processSessionOplog() doesn't update 'lastResult' upon skipping the update noop so it expects the next oplog entry to correspond to the same retryable write as the pre-image that it is still being tracked by 'lastResult'.

            Assignee:
            cheahuychou.mao@mongodb.com Cheahuychou Mao
            Reporter:
            cheahuychou.mao@mongodb.com Cheahuychou Mao
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

              Created:
              Updated:
              Resolved: