Loading...

XML

Word

Printable

JSON

Type: Improvement
Resolution: Won't Fix
Priority: Major - P3
Fix Version/s: None
Affects Version/s: None
Component/s: None
Labels:
None

Assigned Teams:

Replication

This bug was found during a review of surrounding code.

https://github.com/10gen/mongo/pull/13172/files/71cbb9540fa619f3aa07fdb7025274b0cc6fa920#r1206153406

I see there are many places using a particular style of FailPoint loop in these tests, which is, I believe, slow and not quite technically correct, and could be more compact.

A typical instance looks like this.

https://github.com/mongodb/mongo/blob/c0e494a1220c4e67b104ab10f198f5181105ad3f/src/mongo/db/repl/tenant_collection_cloner.cpp#L501-L519


    tenantMigrationHangCollectionClonerAfterHandlingBatchResponse.executeIf(
        [&](const BSONObj&) {
            while (
                MONGO_unlikely(
                    tenantMigrationHangCollectionClonerAfterHandlingBatchResponse.shouldFail()) &&
                !mustExit()) {
                LOGV2(4884506,
                      "tenantMigrationHangCollectionClonerAfterHandlingBatchResponse fail point "
                      "enabled. Blocking until fail point is disabled",
                      logAttrs(_sourceNss),
                      "tenantId"_attr = _tenantId);
                mongo::sleepsecs(1);
            }
        },
        [&](const BSONObj& data) {
            // Only hang when cloning the specified collection, or if no collection was specified.
            auto nss = data["nss"].str();
            return nss.empty() || nss == _sourceNss.toString();
        });

Because there's a nested while loop inside, the executed lambda, we would be better off using scopedIf for these.

Suggestion with several different improvements together.

    {
        auto &fp = tenantMigrationHangCollectionClonerAfterHandlingBatchResponse;
        auto sfp = fp.scopedIf([&](const BSONObj& data) {
            // Only hang when cloning the specified collection,
            // or if no collection was specified.
            auto nss = data["nss"].str();
            return nss.empty() || nss == _sourceNss.toString();
        });
        if (MONGO_unlikely(sfp.isActive()) {
            while (sfp.isStillEnabled())) {
                if (mustExit())
                    break;
                LOGV2(4884506,
                      "Blocking while fail point is disabled",
                      "failPoint"_attr = fp.getName(),
                      logAttrs(_sourceNss),
                      "tenantId"_attr = _tenantId);
                sleepsecs(1);
            }
        }
    }

The idea is that a checking a failpoint and taking action on it is a kind of transaction with the failpoint, during which it cannot toggle and cause an ABA problem.
FailPoint nTimes and other modes are expecting this kind of session-based access idiom.

This pattern is repeated several times in these two test fies and would be a good candidate for refactor into a helper function.

Assignee:: [DO NOT USE] Backlog - Replication Team

Reporter:: Billy Donahue

Participants:: [DO NOT USE] Backlog - Replication Team, Billy Donahue

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Created:: May 26 2023 04:01:45 PM UTC

Updated:: May 30 2023 08:44:31 PM UTC

Resolved:: May 30 2023 08:44:31 PM UTC

Details

Description

Attachments

Activity

People

Dates