Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-77523

Use failpoint scopedIf in collection_cloner_test.cpp and tenant_collection_cloner_test.cpp

    • Type: Icon: Improvement Improvement
    • Resolution: Won't Fix
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: None
    • Component/s: None
    • Labels:
      None
    • Replication

      This bug was found during a review of surrounding code.

      https://github.com/10gen/mongo/pull/13172/files/71cbb9540fa619f3aa07fdb7025274b0cc6fa920#r1206153406

      I see there are many places using a particular style of FailPoint loop in these tests, which is, I believe, slow and not quite technically correct, and could be more compact.

      A typical instance looks like this.

      https://github.com/mongodb/mongo/blob/c0e494a1220c4e67b104ab10f198f5181105ad3f/src/mongo/db/repl/tenant_collection_cloner.cpp#L501-L519

      
          tenantMigrationHangCollectionClonerAfterHandlingBatchResponse.executeIf(
              [&](const BSONObj&) {
                  while (
                      MONGO_unlikely(
                          tenantMigrationHangCollectionClonerAfterHandlingBatchResponse.shouldFail()) &&
                      !mustExit()) {
                      LOGV2(4884506,
                            "tenantMigrationHangCollectionClonerAfterHandlingBatchResponse fail point "
                            "enabled. Blocking until fail point is disabled",
                            logAttrs(_sourceNss),
                            "tenantId"_attr = _tenantId);
                      mongo::sleepsecs(1);
                  }
              },
              [&](const BSONObj& data) {
                  // Only hang when cloning the specified collection, or if no collection was specified.
                  auto nss = data["nss"].str();
                  return nss.empty() || nss == _sourceNss.toString();
              });
      

      Because there's a nested while loop inside, the executed lambda, we would be better off using scopedIf for these.

      Suggestion with several different improvements together.

          {
              auto &fp = tenantMigrationHangCollectionClonerAfterHandlingBatchResponse;
              auto sfp = fp.scopedIf([&](const BSONObj& data) {
                  // Only hang when cloning the specified collection,
                  // or if no collection was specified.
                  auto nss = data["nss"].str();
                  return nss.empty() || nss == _sourceNss.toString();
              });
              if (MONGO_unlikely(sfp.isActive()) {
                  while (sfp.isStillEnabled())) {
                      if (mustExit())
                          break;
                      LOGV2(4884506,
                            "Blocking while fail point is disabled",
                            "failPoint"_attr = fp.getName(),
                            logAttrs(_sourceNss),
                            "tenantId"_attr = _tenantId);
                      sleepsecs(1);
                  }
              }
          }
      

      The idea is that a checking a failpoint and taking action on it is a kind of transaction with the failpoint, during which it cannot toggle and cause an ABA problem.
      FailPoint nTimes and other modes are expecting this kind of session-based access idiom.

      This pattern is repeated several times in these two test fies and would be a good candidate for refactor into a helper function.

            Assignee:
            backlog-server-repl [DO NOT USE] Backlog - Replication Team
            Reporter:
            billy.donahue@mongodb.com Billy Donahue
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

              Created:
              Updated:
              Resolved: