-
Type: Bug
-
Resolution: Fixed
-
Priority: Major - P3
-
Affects Version/s: None
-
Component/s: Sharding
-
Labels:
-
Fully Compatible
-
ALL
-
Sharding 2018-12-17, Sharding 2018-12-31, Sharding 2019-01-14
-
45
The range deleter attempts to find documents which have since been migrated away from a shard and delete them. When performing the deletion, it correctly wraps the call to Collection::deleteDocument inside a writeConflictRetry (link:
exec->saveState(); writeConflictRetry(opCtx, "delete range", nss.ns(), [&] { WriteUnitOfWork wuow(opCtx); if (saver) { uassertStatusOK(saver->goingToDelete(obj)); } collection->deleteDocument(opCtx, kUninitializedStmtId, rloc, nullptr, true); wuow.commit(); }); try { exec->restoreState();
The call to collection->deleteDocument() will end up looking up the document and asserting that the document exists. This should generally be true, but if we encounter a write conflict exception, the writeConflictRetry loop will abandon the snapshot in between attempts:
int attempts = 0; while (true) { try { return f(); } catch (WriteConflictException const&) { CurOp::get(opCtx)->debug().additiveMetrics.incrementWriteConflicts(1); WriteConflictException::logAndBackoff(attempts, opStr, ns); ++attempts; opCtx->recoveryUnit()->abandonSnapshot(); } }
Once the snapshot has been abandoned, we need to be able to handle the document no longer existing on the next attempt.
So I would propose that the code in collection_range_deleter.cpp either (1) somehow change the PlanExecutor constructed to include a delete stage which is able to handle the document no longer existing or (2) to manually check before each delete whether the document still exists.