Traditionally, retryable findAndModify calls reconstruct a response to a retry by writing the returned document to the oplog separate from the update/delete the findAndModify performed.
PM-2213 offers a second option, the document being returned can be written to a separate image collection. However, for features such as tenant migrations and resharding, these images are communicated via the oplog (as opposed to selectively copying that collection).
To accomplish that, we reserve two oplog timestamps when recording an image as part of a findAndModify. This allows us to add an aggregation stage that can seamlessly insert the image into the oplog and not worry about choosing a timestamp.
For regular update findAndModify's, we're (correctly) only reserving two optimes when we intend to store an image.
However, deletes are done unconditionally (i.e: regular retryable deletes don't record a preImage, but do reserve two optimes). This has been identified as a perf regression.
A sample patch that corrects the perf regression:
diff --git a/src/mongo/db/catalog/collection_impl.cpp b/src/mongo/db/catalog/collection_impl.cpp index 3de0a60632..5786026d33 100644 --- a/src/mongo/db/catalog/collection_impl.cpp +++ b/src/mongo/db/catalog/collection_impl.cpp @@ -1157,7 +1161,10 @@ void CollectionImpl::deleteDocument(OperationContext* opCtx, uasserted(10089, "cannot remove from a capped collection"); } - const auto oplogSlot = reserveOplogSlotsForRetryableFindAndModify(opCtx); + boost::optional<OplogSlot> oplogSlot = boost::none; + if (storeDeletedDoc == Collection::StoreDeletedDoc::On) { + oplogSlot = reserveOplogSlotsForRetryableFindAndModify(opCtx); + } OpObserver::OplogDeleteEntryArgs deleteArgs{ nullptr, fromMigrate, getRecordPreImages(), oplogSlot, oplogSlot != boost::none};
- related to
-
SERVER-58740 Reserve multiple oplog slots when writing retryable findAndModify with storeFindAndModifyImagesInSideCollection=true
- Closed