Details
-
Bug
-
Resolution: Fixed
-
Major - P3
-
None
-
Fully Compatible
-
ALL
-
Sharding 2018-12-17, Sharding 2018-12-31, Sharding 2019-01-14
-
45
Description
The range deleter attempts to find documents which have since been migrated away from a shard and delete them. When performing the deletion, it correctly wraps the call to Collection::deleteDocument inside a writeConflictRetry (link:
|
collection_range_deleter.cpp |
exec->saveState();
|
|
|
writeConflictRetry(opCtx, "delete range", nss.ns(), [&] { |
WriteUnitOfWork wuow(opCtx);
|
if (saver) { |
uassertStatusOK(saver->goingToDelete(obj));
|
}
|
collection->deleteDocument(opCtx, kUninitializedStmtId, rloc, nullptr, true); |
wuow.commit();
|
});
|
|
|
try { |
exec->restoreState();
|
The call to collection->deleteDocument() will end up looking up the document and asserting that the document exists. This should generally be true, but if we encounter a write conflict exception, the writeConflictRetry loop will abandon the snapshot in between attempts:
|
write_conflict_exception.h |
int attempts = 0; |
while (true) { |
try { |
return f(); |
} catch (WriteConflictException const&) { |
CurOp::get(opCtx)->debug().additiveMetrics.incrementWriteConflicts(1);
|
WriteConflictException::logAndBackoff(attempts, opStr, ns);
|
++attempts;
|
opCtx->recoveryUnit()->abandonSnapshot();
|
}
|
}
|
Once the snapshot has been abandoned, we need to be able to handle the document no longer existing on the next attempt.
So I would propose that the code in collection_range_deleter.cpp either (1) somehow change the PlanExecutor constructed to include a delete stage which is able to handle the document no longer existing or (2) to manually check before each delete whether the document still exists.