Enable sharded collections passthrough for primary-driven index builds

XMLWordPrintableJSON

    • Type: Task
    • Resolution: Fixed
    • Priority: Major - P3
    • 9.0.0-rc0
    • Affects Version/s: None
    • Component/s: None
    • None
    • Storage Execution
    • Fully Compatible
    • Storage Execution 2026-03-30
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      Goal / Objective

      Re-enable sharding passthrough testing for primary-driven index builds, and extend coverage to general jscore/passthrough workloads where collections are sharded, not only tests that explicitly build indexes.
      The goal is to:

      • Add sharding passthrough suites
      • Ensure that these suites run under realistic sharded configurations where collections are sharded and primary-driven index builds are enabled.
      • Systematically tag and exclude tests that are incompatible with primary-driven index builds using: – Existing tags where appropriate (e.g. primary_driven_index_builds_incompatible).
        – More specific tags such as: • rolling_index_builds • requires_commit_quorum • requires_capped • requires_rollbacks • requires_index_build_resumability • primary_driven_index_builds_incompatible_due_to_abort_on_step_up so that remaining sharding passthroughs can run cleanly and future enabling work is well-scoped.

      Context & Background

      Background behavior:

      • Primary-driven index builds require that the top-level WriteUnitOfWork have the oplog entry group type kGroupForTransactions, to ensure:
        • Side writes’ container writes share the same timestamp as the “side write” itself.
        • We do not violate multi-timestamp constraints when writing side-write state.
      • Sharding creates its own WUOWs. The nested WUOWs used for index-build side writes inherit the parent sharding WUOW’s oplog group type, which is not necessarily kGroupForTransactions.

      Recent change:

      • WUOW behavior has been changed so that primary-driven index builds always group their writes (kGroupForPossiblyRetryableOperation-style semantics). We want to run sharding passthrough suites that exercise sharded collections, provided we:
        • Ensure sharding WUOWs interact correctly with grouped WUOW behavior for primary-driven index builds.
        • Carefully exclude or re-tag tests whose assumptions about index builds no longer hold under primary-driven semantics (e.g., resumability, restart-on-failover).

      Acceptance Criteria

      1) Sharding passthroughs re-enabled under primary-driven index builds:

      • Relevant sharding passthrough suites (e.g. sharded jscore/passthrough, or similar) are updated or created such that:
        • They run with primary-driven index builds enabled (or the default behavior that uses primary-driven builds) (see the *_primary_driven_index_builds.yml passthroughs for models to use)
        • Collections are actually sharded in the test topology.
        • Suites complete successfully in Evergreen under normal variants used for the primary-driven index build passthroughs.

      2) jscore/passthrough tests run beyond index-focused scenarios; at least one suite configuration:

      • Runs standard jscore/passthrough workloads (not only those that explicitly build indexes) against: • Sharded collections. • Mixed workloads (CRUD, queries, aggregations) where index builds may occur or be triggered.
      • The suite excludes only the tests that are known-incompatible with primary-driven index semantics, via tags (see below).

      3) Tagging and test selection policy is defined and applied

      The following tags are used to manage compatibility: – primary_driven_index_builds_incompatible • General umbrella tag for tests that are not safe to run under primary-driven index builds and cannot yet be made compatible. – rolling_index_builds • For tests that require rolling index build behavior not supported or not representative under primary-driven semantics. – requires_commit_quorum • For tests that rely on commit quorum behavior that conflicts with current primary-driven index build configuration. – requires_capped • For tests requiring capped collections where primary-driven index build semantics are not yet supported or tested. – requires_rollbacks • For tests which explicitly depend on rollback behavior that conflicts with current primary-driven index build behavior. – requires_index_build_resumability • For tests that require resumable index builds; primary-driven index builds are not resumable. – primary_driven_index_builds_incompatible_due_to_abort_on_step_upFor tests that assume index builds restart on failover or step-up, which primary-driven index builds do not currently do. This tag is specifically tied to exclusions done under SERVER-111661 and is meant to be a clear “bucket” for tests potentially re-enabled by future work (e.g., emulating restart behavior or adding resumability). • For each test that fails under primary-driven index builds in local or Evergreen runs: – The reason for incompatibility is identified. – The test is tagged with the most specific applicable tag from the above, not just the broad primary_driven_index_builds_incompatible, when possible. • Suite definitions are updated to: – Include primary-driven index build configurations. – Exclude tests with these tags where necessary, using a clear, documented filter expression.

      Constraints & Out of Scope

      Constraints:
      Must not break non-sharded or non-primary-driven configurations:

      • Any harness or tag changes must preserve behavior for suites not intended to run with primary-driven index builds.
      • Re-enabled sharding passthrough suites must remain within reasonable Evergreen runtime limits comparable to previous sharding suites.
      • Tagging must be stable and intentional: – Avoid over-tagging or using tags as a blunt instrument; only mark tests incompatible when root cause is understood well enough to choose the correct tag.

      Out of Scope:

      • Redesigning primary-driven index build semantics, resumability, or restart-on-failover behavior: – Any such changes belong in separate SERVER tickets (e.g., follow-on work referenced by comments about resumability and restart behavior).
      • Comprehensive new feature coverage for every combination of index option and sharding topology: – This ticket focuses on re-enabling previously disabled passthroughs and running standard jscore/passthrough workloads, not creating an exhaustive matrix nor writing new tests
      • New generic tagging frameworks beyond the specific tags listed; only incremental additions necessary to classify incompatibility reasons are in scope.

      Testing Instructions

      Local / developer testing:

      1. Identify and run affected suites: • Run sharding passthrough and jscore/passthrough suites that are being re-enabled or newly configured for primary-driven index builds. • Example patterns: – python buildscripts/resmoke.py run --suites <sharded_passthrough_suite_with_primary_driven> – python buildscripts/resmoke.py run --suites <jscore_sharded_passthrough_with_primary_driven>
      1. Iterate on failures: • For each failing test: – Determine if the failure is due to: • Known incompatibility with primary-driven index builds (resumability, restart-on-failover assumptions, commit quorum, rolling behavior, capped/rollback requirements, etc.). • A genuine bug in primary-driven index build + sharding behavior. – If incompatibility is expected: • Apply the appropriate tag: – primary_driven_index_builds_incompatible – rolling_index_builds – requires_commit_quorum – requires_capped – requires_rollbacks – requires_index_build_resumability – primary_driven_index_builds_incompatible_due_to_abort_on_step_up • Ensure the suite excludes that tag and rerun to confirm stability. – If behavior appears incorrect: • File a separate SERVER ticket for the bug. • Optionally tag the test as incompatible with a note referencing the bug ticket.

      Evergreen / CI:

      1. Update suite definitions: • Ensure suites meant to test primary-driven index builds with sharding: – Enable primary-driven index builds in the configuration (if not default). – Exclude the incompatible tags identified above.
      1. Run on relevant Storage Execution variants: • Trigger Evergreen patch builds including the updated suites. • Confirm: – Suites complete passing. – No unexpected widespread failures or timeouts.
      1. Stabilization: • If new sporadic failures appear: – Triage: • Is it a race exposed only under primary-driven index builds? • Is it due to mis-tagged tests or missing exclusions? – Fix, retag, or create follow-on tickets as appropriate.
      1. Ticket closure: • Once suites are green and stable: – Add the list of suites, their tag filters, and links to successful Evergreen runs into SERVER-111646. – Confirm that previously disabled sharding passthrough coverage for primary-driven index builds is meaningfully restored and includes non-index-specific jscore/passthrough tests against sharded collections.

            Assignee:
            Stephanie Eristoff
            Reporter:
            Stephanie Eristoff
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

              Created:
              Updated:
              Resolved: