-
Type:
Bug
-
Resolution: Fixed
-
Priority:
Major - P3
-
Affects Version/s: None
-
Component/s: Sharding
-
Fully Compatible
-
ALL
-
Sharding 2018-12-17, Sharding 2018-12-31, Sharding 2019-01-14
-
45
-
None
-
None
-
None
-
None
-
None
-
None
-
None
The range deleter attempts to find documents which have since been migrated away from a shard and delete them. When performing the deletion, it correctly wraps the call to Collection::deleteDocument inside a writeConflictRetry (link:
exec->saveState();
writeConflictRetry(opCtx, "delete range", nss.ns(), [&] {
WriteUnitOfWork wuow(opCtx);
if (saver) {
uassertStatusOK(saver->goingToDelete(obj));
}
collection->deleteDocument(opCtx, kUninitializedStmtId, rloc, nullptr, true);
wuow.commit();
});
try {
exec->restoreState();
The call to collection->deleteDocument() will end up looking up the document and asserting that the document exists. This should generally be true, but if we encounter a write conflict exception, the writeConflictRetry loop will abandon the snapshot in between attempts:
int attempts = 0;
while (true) {
try {
return f();
} catch (WriteConflictException const&) {
CurOp::get(opCtx)->debug().additiveMetrics.incrementWriteConflicts(1);
WriteConflictException::logAndBackoff(attempts, opStr, ns);
++attempts;
opCtx->recoveryUnit()->abandonSnapshot();
}
}
Once the snapshot has been abandoned, we need to be able to handle the document no longer existing on the next attempt.
So I would propose that the code in collection_range_deleter.cpp either (1) somehow change the PlanExecutor constructed to include a delete stage which is able to handle the document no longer existing or (2) to manually check before each delete whether the document still exists.