[SERVER-77523] Use failpoint scopedIf in collection_cloner_test.cpp and tenant_collection_cloner_test.cpp Created: 26/May/23  Updated: 30/May/23  Resolved: 30/May/23

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Improvement Priority: Major - P3
Reporter: Billy Donahue Assignee: Backlog - Replication Team
Resolution: Won't Fix Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Assigned Teams:
Replication
Participants:

 Description   

This bug was found during a review of surrounding code.

https://github.com/10gen/mongo/pull/13172/files/71cbb9540fa619f3aa07fdb7025274b0cc6fa920#r1206153406

I see there are many places using a particular style of FailPoint loop in these tests, which is, I believe, slow and not quite technically correct, and could be more compact.

A typical instance looks like this.

https://github.com/mongodb/mongo/blob/c0e494a1220c4e67b104ab10f198f5181105ad3f/src/mongo/db/repl/tenant_collection_cloner.cpp#L501-L519

 
    tenantMigrationHangCollectionClonerAfterHandlingBatchResponse.executeIf(
        [&](const BSONObj&) {
            while (
                MONGO_unlikely(
                    tenantMigrationHangCollectionClonerAfterHandlingBatchResponse.shouldFail()) &&
                !mustExit()) {
                LOGV2(4884506,
                      "tenantMigrationHangCollectionClonerAfterHandlingBatchResponse fail point "
                      "enabled. Blocking until fail point is disabled",
                      logAttrs(_sourceNss),
                      "tenantId"_attr = _tenantId);
                mongo::sleepsecs(1);
            }
        },
        [&](const BSONObj& data) {
            // Only hang when cloning the specified collection, or if no collection was specified.
            auto nss = data["nss"].str();
            return nss.empty() || nss == _sourceNss.toString();
        });

Because there's a nested while loop inside, the executed lambda, we would be better off using scopedIf for these.

Suggestion with several different improvements together.

    {
        auto &fp = tenantMigrationHangCollectionClonerAfterHandlingBatchResponse;
        auto sfp = fp.scopedIf([&](const BSONObj& data) {
            // Only hang when cloning the specified collection,
            // or if no collection was specified.
            auto nss = data["nss"].str();
            return nss.empty() || nss == _sourceNss.toString();
        });
        if (MONGO_unlikely(sfp.isActive()) {
            while (sfp.isStillEnabled())) {
                if (mustExit())
                    break;
                LOGV2(4884506,
                      "Blocking while fail point is disabled",
                      "failPoint"_attr = fp.getName(),
                      logAttrs(_sourceNss),
                      "tenantId"_attr = _tenantId);
                sleepsecs(1);
            }
        }
    }

The idea is that a checking a failpoint and taking action on it is a kind of transaction with the failpoint, during which it cannot toggle and cause an ABA problem.
FailPoint nTimes and other modes are expecting this kind of session-based access idiom.

This pattern is repeated several times in these two test fies and would be a good candidate for refactor into a helper function.


Generated at Thu Feb 08 06:35:50 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.