-
Type:
Engineering Test
-
Resolution: Fixed
-
Priority:
Major - P3
-
Affects Version/s: None
-
Component/s: None
-
None
-
Storage Execution
-
Fully Compatible
-
Storage Execution 2026-05-25, Storage Execution 2026-06-08
-
None
-
None
-
None
-
None
-
None
-
None
-
None
Goal
Add a deterministic unit test that exercises the writeConflictRetry block at src/mongo/db/index_builds/primary_driven/util.cpp:368, inside deleteSorterEntriesOutsideRanges. This function runs on PDIB resume (step-up) and prunes sorter-table entries that fall outside the persisted ranges so the resumed build starts from a clean state.
Phase 1 invariant: cursor reset is safe because the loop terminates on entry->first < firstStart; even if the cursor restarts from the beginning, it re-traverses the same pre-range keys and the loop's bound prevents over-delete.
Phase 2 invariant: cursor reset would surface as a returned key < lastEnd after a flush; the inline re-seek to lastEnd restores the position before continuing.
Neither invariant has a unit test today.
What the test must do
Cover both phases. Recommended as two test variants in src/mongo/db/index_builds/primary_driven/util_test.cpp (file exists – use the established fixture there):
- DeleteSorterEntriesOutsideRangesSurvivesWCEInPrePhase: WCE fires during phase 1 (pre-firstStart delete).
- DeleteSorterEntriesOutsideRangesSurvivesWCEInPostPhase: WCE fires during phase 2 (post-lastEnd delete).
For each: - Populate the sorter container (IntegerKeyedContainer via a LazyRecordStore) with a fixed key layout that has data in all three regions:
- Keys < firstStart (phase 1 victims, e.g. 1..3).
- Keys in [firstStart, lastEnd) (persisted – must survive, e.g. 4..6).
- Keys >= lastEnd (phase 2 victims plus lastEnd anchor, e.g. 7..9).
- Build an IndexStateInfo whose ranges cover only [firstStart, lastEnd). Wrap it in a std::vector<IndexStateInfo> of size 1.
- Arm the engine-agnostic WCE failpoint with nTimes=1 immediately before calling deleteSorterEntriesOutsideRanges. Choose primaryDrivenIndexBuildSorterInsertionBatchSize so the batched flush lands the WCE in the targeted phase (e.g. batch size = 3 + 3 victims pre-range = one flush in phase 1; tune the post-phase variant similarly).
auto wce = enableStorageEngineWriteConflictForWrites( FailPoint::ModeOptions{.mode = FailPoint::Mode::nTimes, .val = 1}); const auto before = wce->initialTimesEntered();
- Assert all of:
- The call returned without exception.
- Exactly one WCE fired (setMode
returns before + 1). - The persisted range [firstStart, lastEnd) is intact – every key in that range still present.
- All keys < firstStart are gone.
- All keys >= lastEnd (including lastEnd itself) are gone.
- Bonus assertions for the variants:
- Phase 1 variant: verify by direct cursor scan that no key >= firstStart was touched even though the cursor presumably restarted from the beginning post-WCE.
- Phase 2 variant: verify by direct cursor scan that no key < lastEnd was deleted even after the WCE-reset triggered the re-seek branch.
Where to put it
src/mongo/db/index_builds/primary_driven/util_test.cpp (already exists). Use whatever fixture the existing tests use; add a new TEST_F (or two) at the end. If a BUILD.bazel change is needed to gate on use_wiredtiger_enabled (mirroring container_based_spiller_test), include it.
Constraints
- Use enableStorageEngineWriteConflictForWrites (
SERVER-126328), not the WT-specific failpoint by name. - FailPoint::Mode::nTimes only – no Mode::random.
- Test-only PR. Do not modify deleteSorterEntriesOutsideRanges or surrounding code. If the test surfaces a real bug (e.g. persisted-range keys get deleted, or the re-seek workaround fails), stop, file a follow-up SERVER ticket, and link from this ticket.
- Depends on
SERVER-126385landing first (engine-agnostic helper + CatalogTestFixture compatibility). - Sibling but distinct from
SERVER-126451: that ticket covers mergeSpills_remove during a live merge; this one covers resume-time cleanup. Cross-link both tickets so the relationship is visible.Reference patterns
- MergeSpillsSurvivesCursorResetUnderWCE in src/mongo/db/sorter/container_based_spiller_test.cpp – canonical "exactly one WCE fired" assertion idiom using initialTimesEntered() + setMode
. - PersistResumeStateSurvivesWriteConflict_* in multi_index_block_test.cpp (
SERVER-126385) – canonical CatalogTestFixture + engine-agnostic-failpoint pattern. - Existing tests in primary_driven/util_test.cpp – for fixture setup (resume-state read paths, etc.).
Acceptance
- Both {{DeleteSorterEntriesOutsideRangesSurvivesWCEIn
{Pre,Post}
Phase}} tests pass.
- "Exactly one WCE fired" assertion present in each.
- Persisted range [firstStart, lastEnd) verified intact in each.
- Phase-specific bonus assertion present (no over-delete from cursor reset).
- No production code changed.
- No Mode::random, no jstest changes.
- is related to
-
SERVER-126326 Add PDIB test suites that artificially inject write conflicts
-
- Closed
-
-
SERVER-126328 Have a storage engine-agnostic way to turn on write conflict fail points in unit tests
-
- Closed
-
-
SERVER-126385 Make unit tests that exercise _writeStateToContainer write conflict retry
-
- Closed
-
-
SERVER-126451 Test mergeSpills_remove with a write conflict
-
- Closed
-