-
Type: Improvement
-
Resolution: Won't Fix
-
Priority: Major - P3
-
None
-
Affects Version/s: None
-
Component/s: None
-
Labels:None
-
Replication
This bug was found during a review of surrounding code.
https://github.com/10gen/mongo/pull/13172/files/71cbb9540fa619f3aa07fdb7025274b0cc6fa920#r1206153406
I see there are many places using a particular style of FailPoint loop in these tests, which is, I believe, slow and not quite technically correct, and could be more compact.
A typical instance looks like this.
tenantMigrationHangCollectionClonerAfterHandlingBatchResponse.executeIf( [&](const BSONObj&) { while ( MONGO_unlikely( tenantMigrationHangCollectionClonerAfterHandlingBatchResponse.shouldFail()) && !mustExit()) { LOGV2(4884506, "tenantMigrationHangCollectionClonerAfterHandlingBatchResponse fail point " "enabled. Blocking until fail point is disabled", logAttrs(_sourceNss), "tenantId"_attr = _tenantId); mongo::sleepsecs(1); } }, [&](const BSONObj& data) { // Only hang when cloning the specified collection, or if no collection was specified. auto nss = data["nss"].str(); return nss.empty() || nss == _sourceNss.toString(); });
Because there's a nested while loop inside, the executed lambda, we would be better off using scopedIf for these.
Suggestion with several different improvements together.
{ auto &fp = tenantMigrationHangCollectionClonerAfterHandlingBatchResponse; auto sfp = fp.scopedIf([&](const BSONObj& data) { // Only hang when cloning the specified collection, // or if no collection was specified. auto nss = data["nss"].str(); return nss.empty() || nss == _sourceNss.toString(); }); if (MONGO_unlikely(sfp.isActive()) { while (sfp.isStillEnabled())) { if (mustExit()) break; LOGV2(4884506, "Blocking while fail point is disabled", "failPoint"_attr = fp.getName(), logAttrs(_sourceNss), "tenantId"_attr = _tenantId); sleepsecs(1); } } }
The idea is that a checking a failpoint and taking action on it is a kind of transaction with the failpoint, during which it cannot toggle and cause an ABA problem.
FailPoint nTimes and other modes are expecting this kind of session-based access idiom.
This pattern is repeated several times in these two test fies and would be a good candidate for refactor into a helper function.