Details
-
Improvement
-
Resolution: Won't Fix
-
Major - P3
-
None
-
None
-
None
-
None
-
Replication
Description
This bug was found during a review of surrounding code.
https://github.com/10gen/mongo/pull/13172/files/71cbb9540fa619f3aa07fdb7025274b0cc6fa920#r1206153406
I see there are many places using a particular style of FailPoint loop in these tests, which is, I believe, slow and not quite technically correct, and could be more compact.
A typical instance looks like this.
|
|
tenantMigrationHangCollectionClonerAfterHandlingBatchResponse.executeIf(
|
[&](const BSONObj&) { |
while ( |
MONGO_unlikely(
|
tenantMigrationHangCollectionClonerAfterHandlingBatchResponse.shouldFail()) &&
|
!mustExit()) {
|
LOGV2(4884506,
|
"tenantMigrationHangCollectionClonerAfterHandlingBatchResponse fail point " |
"enabled. Blocking until fail point is disabled", |
logAttrs(_sourceNss),
|
"tenantId"_attr = _tenantId); |
mongo::sleepsecs(1);
|
}
|
},
|
[&](const BSONObj& data) { |
// Only hang when cloning the specified collection, or if no collection was specified. |
auto nss = data["nss"].str(); |
return nss.empty() || nss == _sourceNss.toString(); |
});
|
Because there's a nested while loop inside, the executed lambda, we would be better off using scopedIf for these.
Suggestion with several different improvements together.
{
|
auto &fp = tenantMigrationHangCollectionClonerAfterHandlingBatchResponse;
|
auto sfp = fp.scopedIf([&](const BSONObj& data) { |
// Only hang when cloning the specified collection, |
// or if no collection was specified. |
auto nss = data["nss"].str(); |
return nss.empty() || nss == _sourceNss.toString(); |
});
|
if (MONGO_unlikely(sfp.isActive()) { |
while (sfp.isStillEnabled())) { |
if (mustExit()) |
break; |
LOGV2(4884506,
|
"Blocking while fail point is disabled", |
"failPoint"_attr = fp.getName(), |
logAttrs(_sourceNss),
|
"tenantId"_attr = _tenantId); |
sleepsecs(1);
|
}
|
}
|
}
|
The idea is that a checking a failpoint and taking action on it is a kind of transaction with the failpoint, during which it cannot toggle and cause an ABA problem.
FailPoint nTimes and other modes are expecting this kind of session-based access idiom.
This pattern is repeated several times in these two test fies and would be a good candidate for refactor into a helper function.