Test drainBackgroundWrites batch retry under write conflict in primary-driven index builds

XMLWordPrintableJSON

    • Type: Engineering Test
    • Resolution: Fixed
    • Priority: Major - P3
    • 9.0.0-rc0
    • Affects Version/s: None
    • Component/s: None
    • None
    • Storage Execution
    • Fully Compatible
    • Storage Execution 2026-05-25, Storage Execution 2026-06-08
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      Add a deterministic unit test that exercises the writeConflictRetry block a  src/mongo/db/index_builds/side_writes_tracker.cpp:423, which wraps each per-batch apply during index-build drain. This is the highest bug-class value in the remaining PDIB WCE coverage: a single batch applies index
        inserts and deletes the corresponding side-table records in one WUOW, so a buggy retry can double-apply (insert
        the same key twice into the index) or skip records (delete the side-table row but never re-insert into the
        index).

      Testing Plan:

      1. Build a unique (or non-unique) index using the PDIB path. Set featureFlagPrimaryDrivenIndexBuilds=true so we exercise the container_write::remove branch.
      2. Populate the side-writes table with a known, fixed set of N records (N >= 2 so we have a real batch). N must
          be small enough that the whole drain is a single batch, OR large enough that there are multiple batches and
          the WCE lands in a specific one – pick one and document it.
      3. Arm the engine-agnostic WCE failpoint with nTimes=1 immediately before calling drainBackgroundWrites
          (or the equivalent entry point that triggers applySingleBatch):
      4. Drive drain to completion.
      5. Assert all four:
        • The drain returned OK.
        • Exactly one WCE fired during the drain (setMode returns before + 1).
        • The index contains exactly N keys – no duplicates from a double-applied batch, no missing keys from a
            half-rolled-back batch. Count via the index access method's cursor or by reading the underlying ident directly.
        • The side-writes table is empty (or contains exactly the records not yet drained, if multi-batch). All applied
             records are deleted; none survive across the rollback.
      6. Bonus: assert _numApplied on the tracker equals N – proves the counter wasn't double-incremented across
          the retry.

        Suggested test name: DrainBackgroundWritesSurvivesWriteConflict. If multiple test variants are useful
        (single-batch with WCE on first try, multi-batch with WCE on second batch only), suffix accordingly.

      Where to put it

        Preferred: new file src/mongo/db/index_builds/side_writes_tracker_test.cpp, gated on
        use_wiredtiger_enabled in BUILD.bazel (the WCE failpoint requires WT). If a side_writes_tracker
        unit-test target doesn't exist yet, create one alongside the new file; mirror the BUILD.bazel structure of
        container_based_spiller_test.

        Alternative: extend src/mongo/db/index_builds/multi_index_block_test.cpp and exercise drain end-to-end through MultiIndexBlock::drainBackgroundWrites. Pick this if standalone testing of SideWritesTracker requires too much fixture scaffolding.

       h2. Constraints

        * Use enableStorageEngineWriteConflictForWrites (SERVER-126328), not the WT-specific failpoint by name.
        * FailPoint::Mode::nTimes only – no Mode::random. Percentage-based is jstest territory (SERVER-126326).
        * Test-only PR – do not modify side_writes_tracker.cpp. If the test surfaces a real correctness bug (e.g.
        _numApplied drift, leftover side-table records, double-applied keys), stop, file a follow-up SERVER ticket
        for the fix, and link from this ticket.
        * Depends on SERVER-126385 landing first (it brings in the enableStorageEngineWriteConflictForWrites +
        CatalogTestFixture plumbing).
        * The drain path requires !shard_role_details::getLocker(opCtx)->inAWriteUnitOfWork() (line 232 invariant)
        – don't wrap drain calls in an outer WUOW.

      Acceptance

      • New DrainBackgroundWritesSurvivesWriteConflict test passes.
      • All four core assertions present: status OK, exactly-one-WCE-fired, exact key count in index, empty/correct
      •   side-writes table after drain.
      • _numApplied sanity check present.
      • PDIB branch (primaryDrivenFeatureFlagEnabled=true, container_write::remove path at
      •   side_writes_tracker.cpp:365) is the one exercised.
      • No production code changed.
      • No Mode::random, no jstest changes.
      • New BUILD.bazel target (if any) is gated on use_wiredtiger_enabled like container_based_spiller_test

            Assignee:
            Stephanie Eristoff
            Reporter:
            Stephanie Eristoff
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated:
              Resolved: