-
Type:
Improvement
-
Resolution: Won't Fix
-
Priority:
Major - P3
-
None
-
Affects Version/s: None
-
Component/s: None
-
None
-
Replication
-
None
-
None
-
None
-
None
-
None
-
None
-
None
This bug was found during a review of surrounding code.
https://github.com/10gen/mongo/pull/13172/files/71cbb9540fa619f3aa07fdb7025274b0cc6fa920#r1206153406
I see there are many places using a particular style of FailPoint loop in these tests, which is, I believe, slow and not quite technically correct, and could be more compact.
A typical instance looks like this.
tenantMigrationHangCollectionClonerAfterHandlingBatchResponse.executeIf(
[&](const BSONObj&) {
while (
MONGO_unlikely(
tenantMigrationHangCollectionClonerAfterHandlingBatchResponse.shouldFail()) &&
!mustExit()) {
LOGV2(4884506,
"tenantMigrationHangCollectionClonerAfterHandlingBatchResponse fail point "
"enabled. Blocking until fail point is disabled",
logAttrs(_sourceNss),
"tenantId"_attr = _tenantId);
mongo::sleepsecs(1);
}
},
[&](const BSONObj& data) {
// Only hang when cloning the specified collection, or if no collection was specified.
auto nss = data["nss"].str();
return nss.empty() || nss == _sourceNss.toString();
});
Because there's a nested while loop inside, the executed lambda, we would be better off using scopedIf for these.
Suggestion with several different improvements together.
{
auto &fp = tenantMigrationHangCollectionClonerAfterHandlingBatchResponse;
auto sfp = fp.scopedIf([&](const BSONObj& data) {
// Only hang when cloning the specified collection,
// or if no collection was specified.
auto nss = data["nss"].str();
return nss.empty() || nss == _sourceNss.toString();
});
if (MONGO_unlikely(sfp.isActive()) {
while (sfp.isStillEnabled())) {
if (mustExit())
break;
LOGV2(4884506,
"Blocking while fail point is disabled",
"failPoint"_attr = fp.getName(),
logAttrs(_sourceNss),
"tenantId"_attr = _tenantId);
sleepsecs(1);
}
}
}
The idea is that a checking a failpoint and taking action on it is a kind of transaction with the failpoint, during which it cannot toggle and cause an ABA problem.
FailPoint nTimes and other modes are expecting this kind of session-based access idiom.
This pattern is repeated several times in these two test fies and would be a good candidate for refactor into a helper function.