Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-62368

Range deleter must honor rangeDeleterBatchDelayMS

    • Fully Compatible
    • v5.2, v5.0, v4.4
    • Hide

      Unit test proving the erroneous behavior in v4.4 (diff applied on commit 3bdd66d).

      diff --git a/src/mongo/db/s/range_deletion_util_test.cpp b/src/mongo/db/s/range_deletion_util_test.cpp
      index e2296557eb2..8ccd2d7b17b 100644
      --- a/src/mongo/db/s/range_deletion_util_test.cpp
      +++ b/src/mongo/db/s/range_deletion_util_test.cpp
      @@ -709,6 +709,11 @@ TEST_F(RangeDeleterTest, RemoveDocumentsInRangeRespectsDelayInBetweenBatches) {
               dbclient.insert(kNss.toString(), BSON(kShardKey << i));
           }
       
      +    const ChunkRange range2(BSON(kShardKey << 10), BSON(kShardKey << 20));
      +    for (auto i = 10; i < 10 + numDocsToInsert; i++) {
      +        dbclient.insert(kNss.toString(), BSON(kShardKey << i));
      +    }
      +
           auto cleanupComplete =
               removeDocumentsInRange(executor(),
                                      std::move(queriesComplete),
      @@ -721,10 +726,25 @@ TEST_F(RangeDeleterTest, RemoveDocumentsInRangeRespectsDelayInBetweenBatches) {
                                      Seconds(0) /* delayForActiveQueriesOnSecondariesToComplete */,
                                      delayBetweenBatches);
       
      +    auto cleanupComplete2 =
      +        removeDocumentsInRange(executor(),
      +                               std::move(queriesComplete),
      +                               kNss,
      +                               uuid(),
      +                               kShardKeyPattern,
      +                               range2,
      +                               boost::none,
      +                               numDocsToRemovePerBatch,
      +                               Seconds(0) /* delayForActiveQueriesOnSecondariesToComplete */,
      +                               Milliseconds(0) /* delayBetweenBatches */);
      +
           // A best-effort check that cleanup has not completed without advancing the clock.
           sleepsecs(1);
           ASSERT_FALSE(cleanupComplete.isReady());
       
      +    // The second task didn't wait for the delay scheduled by the first task
      +    ASSERT_TRUE(cleanupComplete2.isReady());
      +
           // Advance the time until cleanup is complete. This explicit advancement of the clock is
           // required in order to allow the delay between batches to complete. This cannot be made exact
           // because there's no way to tell when the sleep operation gets hit exactly, so instead we
      
      Show
      Unit test proving the erroneous behavior in v4.4 (diff applied on commit 3bdd66d ). diff --git a/src/mongo/db/s/range_deletion_util_test.cpp b/src/mongo/db/s/range_deletion_util_test.cpp index e2296557eb2..8ccd2d7b17b 100644 --- a/src/mongo/db/s/range_deletion_util_test.cpp +++ b/src/mongo/db/s/range_deletion_util_test.cpp @@ -709,6 +709,11 @@ TEST_F(RangeDeleterTest, RemoveDocumentsInRangeRespectsDelayInBetweenBatches) { dbclient.insert(kNss.toString(), BSON(kShardKey << i)); } + const ChunkRange range2(BSON(kShardKey << 10), BSON(kShardKey << 20)); + for (auto i = 10; i < 10 + numDocsToInsert; i++) { + dbclient.insert(kNss.toString(), BSON(kShardKey << i)); + } + auto cleanupComplete = removeDocumentsInRange(executor(), std::move(queriesComplete), @@ -721,10 +726,25 @@ TEST_F(RangeDeleterTest, RemoveDocumentsInRangeRespectsDelayInBetweenBatches) { Seconds(0) /* delayForActiveQueriesOnSecondariesToComplete */ , delayBetweenBatches); + auto cleanupComplete2 = + removeDocumentsInRange(executor(), + std::move(queriesComplete), + kNss, + uuid(), + kShardKeyPattern, + range2, + boost::none, + numDocsToRemovePerBatch, + Seconds(0) /* delayForActiveQueriesOnSecondariesToComplete */ , + Milliseconds(0) /* delayBetweenBatches */ ); + // A best-effort check that cleanup has not completed without advancing the clock. sleepsecs(1); ASSERT_FALSE(cleanupComplete.isReady()); + // The second task didn't wait for the delay scheduled by the first task + ASSERT_TRUE(cleanupComplete2.isReady()); + // Advance the time until cleanup is complete. This explicit advancement of the clock is // required in order to allow the delay between batches to complete. This cannot be made exact // because there's no way to tell when the sleep operation gets hit exactly, so instead we
    • Sharding EMEA 2022-02-07

      The semantic of the rangeDeleterBatchDelayMS parameter changed over time and the original idea of using it to throttle range deletions is not honored in the majority of scenarios.

      Objective of this ticket is to introduce a global delay applied to the range deletions at thread pool level, independently from collections/ranges.

      Following, a summary of the current semantics.

      Versions gte v4.4

      The delay is applied to the deletion of batches belonging to a specific range.

      Problem: if there are several (more than one) range deletion tasks scheduled for different collections or for the same collection on different ranges, deletions are actually not throttled.

      Versions until v4.2

      The batch delay is per collection and the rescheduling of a cleanup task is not bound to a specific range.

      Problem: if there are several (more than one) range deletion tasks scheduled for different collections, deletions are actually not throttled.

            Assignee:
            pierlauro.sciarelli@mongodb.com Pierlauro Sciarelli
            Reporter:
            pierlauro.sciarelli@mongodb.com Pierlauro Sciarelli
            Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

              Created:
              Updated:
              Resolved: