-
Type:
Task
-
Resolution: Fixed
-
Priority:
Major - P3
-
Affects Version/s: None
-
Component/s: None
-
None
-
Storage Execution
-
Fully Compatible
-
Storage Execution 2026-03-30
-
None
-
None
-
None
-
None
-
None
-
None
-
None
Goal / Objective
Re-enable sharding passthrough testing for primary-driven index builds, and extend coverage to general jscore/passthrough workloads where collections are sharded, not only tests that explicitly build indexes.
The goal is to:
- Add sharding passthrough suites
- Ensure that these suites run under realistic sharded configurations where collections are sharded and primary-driven index builds are enabled.
- Systematically tag and exclude tests that are incompatible with primary-driven index builds using: – Existing tags where appropriate (e.g. primary_driven_index_builds_incompatible).
– More specific tags such as: • rolling_index_builds • requires_commit_quorum • requires_capped • requires_rollbacks • requires_index_build_resumability • primary_driven_index_builds_incompatible_due_to_abort_on_step_up so that remaining sharding passthroughs can run cleanly and future enabling work is well-scoped.
Context & Background
Background behavior:
- Primary-driven index builds require that the top-level WriteUnitOfWork have the oplog entry group type kGroupForTransactions, to ensure:
- Side writes’ container writes share the same timestamp as the “side write” itself.
- We do not violate multi-timestamp constraints when writing side-write state.
- Sharding creates its own WUOWs. The nested WUOWs used for index-build side writes inherit the parent sharding WUOW’s oplog group type, which is not necessarily kGroupForTransactions.
Recent change:
- WUOW behavior has been changed so that primary-driven index builds always group their writes (kGroupForPossiblyRetryableOperation-style semantics). We want to run sharding passthrough suites that exercise sharded collections, provided we:
- Ensure sharding WUOWs interact correctly with grouped WUOW behavior for primary-driven index builds.
- Carefully exclude or re-tag tests whose assumptions about index builds no longer hold under primary-driven semantics (e.g., resumability, restart-on-failover).
Acceptance Criteria
1) Sharding passthroughs re-enabled under primary-driven index builds:
- Relevant sharding passthrough suites (e.g. sharded jscore/passthrough, or similar) are updated or created such that:
- They run with primary-driven index builds enabled (or the default behavior that uses primary-driven builds) (see the *_primary_driven_index_builds.yml passthroughs for models to use)
- Collections are actually sharded in the test topology.
- Suites complete successfully in Evergreen under normal variants used for the primary-driven index build passthroughs.
2) jscore/passthrough tests run beyond index-focused scenarios; at least one suite configuration:
- Runs standard jscore/passthrough workloads (not only those that explicitly build indexes) against: • Sharded collections. • Mixed workloads (CRUD, queries, aggregations) where index builds may occur or be triggered.
- The suite excludes only the tests that are known-incompatible with primary-driven index semantics, via tags (see below).
3) Tagging and test selection policy is defined and applied
The following tags are used to manage compatibility: – primary_driven_index_builds_incompatible • General umbrella tag for tests that are not safe to run under primary-driven index builds and cannot yet be made compatible. – rolling_index_builds • For tests that require rolling index build behavior not supported or not representative under primary-driven semantics. – requires_commit_quorum • For tests that rely on commit quorum behavior that conflicts with current primary-driven index build configuration. – requires_capped • For tests requiring capped collections where primary-driven index build semantics are not yet supported or tested. – requires_rollbacks • For tests which explicitly depend on rollback behavior that conflicts with current primary-driven index build behavior. – requires_index_build_resumability • For tests that require resumable index builds; primary-driven index builds are not resumable. – primary_driven_index_builds_incompatible_due_to_abort_on_step_upFor tests that assume index builds restart on failover or step-up, which primary-driven index builds do not currently do. This tag is specifically tied to exclusions done under SERVER-111661 and is meant to be a clear “bucket” for tests potentially re-enabled by future work (e.g., emulating restart behavior or adding resumability). • For each test that fails under primary-driven index builds in local or Evergreen runs: – The reason for incompatibility is identified. – The test is tagged with the most specific applicable tag from the above, not just the broad primary_driven_index_builds_incompatible, when possible. • Suite definitions are updated to: – Include primary-driven index build configurations. – Exclude tests with these tags where necessary, using a clear, documented filter expression.
Constraints & Out of Scope
Constraints:
Must not break non-sharded or non-primary-driven configurations:
- Any harness or tag changes must preserve behavior for suites not intended to run with primary-driven index builds.
- Re-enabled sharding passthrough suites must remain within reasonable Evergreen runtime limits comparable to previous sharding suites.
- Tagging must be stable and intentional: – Avoid over-tagging or using tags as a blunt instrument; only mark tests incompatible when root cause is understood well enough to choose the correct tag.
Out of Scope:
- Redesigning primary-driven index build semantics, resumability, or restart-on-failover behavior: – Any such changes belong in separate SERVER tickets (e.g., follow-on work referenced by comments about resumability and restart behavior).
- Comprehensive new feature coverage for every combination of index option and sharding topology: – This ticket focuses on re-enabling previously disabled passthroughs and running standard jscore/passthrough workloads, not creating an exhaustive matrix nor writing new tests
- New generic tagging frameworks beyond the specific tags listed; only incremental additions necessary to classify incompatibility reasons are in scope.
Testing Instructions
Local / developer testing:
- Identify and run affected suites: • Run sharding passthrough and jscore/passthrough suites that are being re-enabled or newly configured for primary-driven index builds. • Example patterns: – python buildscripts/resmoke.py run --suites <sharded_passthrough_suite_with_primary_driven> – python buildscripts/resmoke.py run --suites <jscore_sharded_passthrough_with_primary_driven>
- Iterate on failures: • For each failing test: – Determine if the failure is due to: • Known incompatibility with primary-driven index builds (resumability, restart-on-failover assumptions, commit quorum, rolling behavior, capped/rollback requirements, etc.). • A genuine bug in primary-driven index build + sharding behavior. – If incompatibility is expected: • Apply the appropriate tag: – primary_driven_index_builds_incompatible – rolling_index_builds – requires_commit_quorum – requires_capped – requires_rollbacks – requires_index_build_resumability – primary_driven_index_builds_incompatible_due_to_abort_on_step_up • Ensure the suite excludes that tag and rerun to confirm stability. – If behavior appears incorrect: • File a separate SERVER ticket for the bug. • Optionally tag the test as incompatible with a note referencing the bug ticket.
Evergreen / CI:
- Update suite definitions: • Ensure suites meant to test primary-driven index builds with sharding: – Enable primary-driven index builds in the configuration (if not default). – Exclude the incompatible tags identified above.
- Run on relevant Storage Execution variants: • Trigger Evergreen patch builds including the updated suites. • Confirm: – Suites complete passing. – No unexpected widespread failures or timeouts.
- Stabilization: • If new sporadic failures appear: – Triage: • Is it a race exposed only under primary-driven index builds? • Is it due to mis-tagged tests or missing exclusions? – Fix, retag, or create follow-on tickets as appropriate.
- Ticket closure: • Once suites are green and stable: – Add the list of suites, their tag filters, and links to successful Evergreen runs into
SERVER-111646. – Confirm that previously disabled sharding passthrough coverage for primary-driven index builds is meaningfully restored and includes non-index-specific jscore/passthrough tests against sharded collections.
- is related to
-
SERVER-110840 Use the container_writes api for the side writes table
-
- Closed
-
-
SERVER-111661 Abort primary-driven index build when stepping up to primary
-
- Closed
-
- related to
-
SERVER-114117 Investigate why mr_preserve_indexes.js fails on disagg sharding passthrough
-
- Closed
-
-
SERVER-122092 Complete TODO listed in SERVER-111646
-
- Closed
-