Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-60469

Retryable deletes reserve two optimes for preImage chaining despite not capturing a preImage

    • Type: Icon: Bug Bug
    • Resolution: Fixed
    • Priority: Icon: Blocker - P1 Blocker - P1
    • 5.2.0, 5.0.4, 5.1.0-rc1
    • Affects Version/s: None
    • Component/s: None
    • Labels:
      None
    • Fully Compatible
    • ALL
    • v5.1, v5.0
    • Repl 2021-10-18
    • 120

      Traditionally, retryable findAndModify calls reconstruct a response to a retry by writing the returned document to the oplog separate from the update/delete the findAndModify performed.

      PM-2213 offers a second option, the document being returned can be written to a separate image collection. However, for features such as tenant migrations and resharding, these images are communicated via the oplog (as opposed to selectively copying that collection).

      To accomplish that, we reserve two oplog timestamps when recording an image as part of a findAndModify. This allows us to add an aggregation stage that can seamlessly insert the image into the oplog and not worry about choosing a timestamp.

      For regular update findAndModify's, we're (correctly) only reserving two optimes when we intend to store an image.

      However, deletes are done unconditionally (i.e: regular retryable deletes don't record a preImage, but do reserve two optimes). This has been identified as a perf regression.

      A sample patch that corrects the perf regression:

      diff --git a/src/mongo/db/catalog/collection_impl.cpp b/src/mongo/db/catalog/collection_impl.cpp
      index 3de0a60632..5786026d33 100644
      --- a/src/mongo/db/catalog/collection_impl.cpp
      +++ b/src/mongo/db/catalog/collection_impl.cpp
      @@ -1157,7 +1161,10 @@ void CollectionImpl::deleteDocument(OperationContext* opCtx,
               uasserted(10089, "cannot remove from a capped collection");
           }
       
      -    const auto oplogSlot = reserveOplogSlotsForRetryableFindAndModify(opCtx);
      +    boost::optional<OplogSlot> oplogSlot = boost::none;
      +    if (storeDeletedDoc == Collection::StoreDeletedDoc::On) {
      +        oplogSlot = reserveOplogSlotsForRetryableFindAndModify(opCtx);
      +    }
           OpObserver::OplogDeleteEntryArgs deleteArgs{
               nullptr, fromMigrate, getRecordPreImages(), oplogSlot, oplogSlot != boost::none};
       
      

            Assignee:
            jason.chan@mongodb.com Jason Chan
            Reporter:
            daniel.gottlieb@mongodb.com Daniel Gottlieb (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            8 Start watching this issue

              Created:
              Updated:
              Resolved: