Test deleteSorterEntriesOutsideRanges with a write conflict

XMLWordPrintableJSON

    • Type: Engineering Test
    • Resolution: Fixed
    • Priority: Major - P3
    • 9.0.0-rc0
    • Affects Version/s: None
    • Component/s: None
    • None
    • Storage Execution
    • Fully Compatible
    • Storage Execution 2026-05-25, Storage Execution 2026-06-08
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      Goal

      Add a deterministic unit test that exercises the writeConflictRetry block at src/mongo/db/index_builds/primary_driven/util.cpp:368, inside deleteSorterEntriesOutsideRanges. This function runs on PDIB resume (step-up) and prunes sorter-table entries that fall outside the persisted ranges so the resumed build starts from a clean state.

      Phase 1 invariant: cursor reset is safe because the loop terminates on entry->first < firstStart; even if the cursor restarts from the beginning, it re-traverses the same pre-range keys and the loop's bound prevents over-delete.

      Phase 2 invariant: cursor reset would surface as a returned key < lastEnd after a flush; the inline re-seek to lastEnd restores the position before continuing.
      Neither invariant has a unit test today.

      What the test must do

      Cover both phases. Recommended as two test variants in src/mongo/db/index_builds/primary_driven/util_test.cpp (file exists – use the established fixture there):

      1. DeleteSorterEntriesOutsideRangesSurvivesWCEInPrePhase: WCE fires during phase 1 (pre-firstStart delete).
      2. DeleteSorterEntriesOutsideRangesSurvivesWCEInPostPhase: WCE fires during phase 2 (post-lastEnd delete).
        For each:
      3. Populate the sorter container (IntegerKeyedContainer via a LazyRecordStore) with a fixed key layout that has data in all three regions:
        • Keys < firstStart (phase 1 victims, e.g. 1..3).
        • Keys in [firstStart, lastEnd) (persisted – must survive, e.g. 4..6).
        • Keys >= lastEnd (phase 2 victims plus lastEnd anchor, e.g. 7..9).
      4. Build an IndexStateInfo whose ranges cover only [firstStart, lastEnd). Wrap it in a std::vector<IndexStateInfo> of size 1.
      5. Arm the engine-agnostic WCE failpoint with nTimes=1 immediately before calling deleteSorterEntriesOutsideRanges. Choose primaryDrivenIndexBuildSorterInsertionBatchSize so the batched flush lands the WCE in the targeted phase (e.g. batch size = 3 + 3 victims pre-range = one flush in phase 1; tune the post-phase variant similarly).
        auto wce = enableStorageEngineWriteConflictForWrites(
            FailPoint::ModeOptions{.mode = FailPoint::Mode::nTimes, .val = 1});
        const auto before = wce->initialTimesEntered();
        
      1. Assert all of:
        • The call returned without exception.
        • Exactly one WCE fired (setMode returns before + 1).
        • The persisted range [firstStart, lastEnd) is intact – every key in that range still present.
        • All keys < firstStart are gone.
        • All keys >= lastEnd (including lastEnd itself) are gone.
      2. Bonus assertions for the variants:
        • Phase 1 variant: verify by direct cursor scan that no key >= firstStart was touched even though the cursor presumably restarted from the beginning post-WCE.
        • Phase 2 variant: verify by direct cursor scan that no key < lastEnd was deleted even after the WCE-reset triggered the re-seek branch.

          Where to put it

      src/mongo/db/index_builds/primary_driven/util_test.cpp (already exists). Use whatever fixture the existing tests use; add a new TEST_F (or two) at the end. If a BUILD.bazel change is needed to gate on use_wiredtiger_enabled (mirroring container_based_spiller_test), include it.

       

      Constraints

      • Use enableStorageEngineWriteConflictForWrites (SERVER-126328), not the WT-specific failpoint by name.
      • FailPoint::Mode::nTimes only – no Mode::random.
      • Test-only PR. Do not modify deleteSorterEntriesOutsideRanges or surrounding code. If the test surfaces a real bug (e.g. persisted-range keys get deleted, or the re-seek workaround fails), stop, file a follow-up SERVER ticket, and link from this ticket.
      • Depends on SERVER-126385 landing first (engine-agnostic helper + CatalogTestFixture compatibility).
      • Sibling but distinct from SERVER-126451: that ticket covers mergeSpills_remove during a live merge; this one covers resume-time cleanup. Cross-link both tickets so the relationship is visible.

        Reference patterns

      • MergeSpillsSurvivesCursorResetUnderWCE in src/mongo/db/sorter/container_based_spiller_test.cpp – canonical "exactly one WCE fired" assertion idiom using initialTimesEntered() + setMode.
      • PersistResumeStateSurvivesWriteConflict_* in multi_index_block_test.cpp (SERVER-126385) – canonical CatalogTestFixture + engine-agnostic-failpoint pattern.
      • Existing tests in primary_driven/util_test.cpp – for fixture setup (resume-state read paths, etc.).

        Acceptance

      • Both {{DeleteSorterEntriesOutsideRangesSurvivesWCEIn {Pre,Post}

        Phase}} tests pass.

      • "Exactly one WCE fired" assertion present in each.
      • Persisted range [firstStart, lastEnd) verified intact in each.
      • Phase-specific bonus assertion present (no over-delete from cursor reset).
      • No production code changed.
      • No Mode::random, no jstest changes.

            Assignee:
            Stephanie Eristoff
            Reporter:
            Stephanie Eristoff
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated:
              Resolved: