-
Type:
Task
-
Resolution: Unresolved
-
Priority:
Major - P3
-
None
-
Affects Version/s: None
-
Component/s: None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
Summary
SERVER-99827 currently excludes $merge aggregation tests from config-transition passthrough suites because no test grid maps which whenMatched × whenNotMatched combinations remain bit-identical under a forced primary stepdown that lands mid-pipeline. As internal counter materialisation migrates onto $merge (SERVER-123687 and related), passthrough authors need a concrete safe/unsafe table to selectively re-include exclusions rather than the current blanket skip.
Proposal
Add jstests/sharding/merge_stepdown_idempotency_matrix.js (~266 lines) that enumerates the full cross-product of whenMatched ∈
{replace, merge, keepExisting, fail, pipeline}against whenNotMatched ∈
{insert, fail, discard}, drives each cell through a real mid-pipeline stepdown, and prints a structured pass/unsafe verdict.
Test shape
- Fixture: three-node ReplSetTest (no ShardingTest required; the semantics under test are local to the $merge writer's retry, not router / chunk migration). Placement under jstests/sharding/ follows SERVER-99827's stated home for stepdown-during-aggregation coverage.
- For each mode cell:
- Seed deterministic source (500 docs) and target (half-overlapping prefix so both matched and unmatched paths fire).
- Capture baseline: single-run aggregation against a stable primary.
- Capture stepdown-replay: enable hangWhileBuildingDocumentSourceMergeBatch on current primary, launch agg in parallel shell, await failpoint, force election via ReplSetTest.stepUp on a secondary, release failpoint, await primary agreement, re-issue same pipeline.
- Fingerprint target as (count, checksum, applySum).
- Classify pass (fingerprints equal) or unsafe (diverge → retry not idempotent for this mode).
- Pipeline mode increments a per-document applyCount so double-application is detectable in the fingerprint even if count matches.
Coverage notes
- Four cells (keepExisting ×
{fail, discard}, fail × {fail, discard}
) are rejected at parse time per the documented mode matrix; the test marks them specInvalid: true and surfaces outcome: "n/a".
- The remaining eleven cells run end-to-end.
- Test does NOT fail on unsafe verdicts — the deliverable is the table itself; passthrough authors consume the printed jsTestLog summary to decide which exclusions in SERVER-99827 can be lifted.
Verification
- node --input-type=module --check < jstests/sharding/merge_stepdown_idempotency_matrix.js exits 0 (matches neighbor range_deletion_ordering_with_stepdown.js).
- Imports ReplSetTest and configureFailPoint from canonical jstests/libs/ helpers.
- Test tags: requires_replication, requires_majority_read_concern.
Follow-ups
- Once the table lands, owners of SERVER-99827 can selectively re-enable passthrough inclusion for cells reported as pass, and file targeted follow-ups for any cell reported as unsafe together with its captured reason string.
- Sharded variant: same matrix can be re-driven inside ShardingTest with rs-backed shards to cover the routing-layer retry path; structure is unchanged.
Related
- SERVER-99827 — excludes $merge tests from config-transition suites
- is related to
-
SERVER-99827 Exclude $merge tests from config transition suites due to unsafe retries
-
- Backlog
-
-
SERVER-123687 Create $merge lite parsed pipeline using `targetNss` instead of `nss`
-
- Closed
-