Develop black-box change stream tests for ignoreRemovedShards mode

XMLWordPrintableJSON

    • Type: Improvement
    • Resolution: Fixed
    • Priority: Major - P3
    • 9.0.0-rc0
    • Affects Version/s: None
    • Component/s: None
    • None
    • Query Execution
    • Fully Compatible
    • QE 2026-03-30, QE 2026-03-16, QE 2026-04-13
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      Summary

      Add FSM-based black-box testing for change streams in ignoreRemovedShards mode. The test generates a deterministic command sequence covering all FSM state transitions, executes all commands against a 3-shard cluster, removes a random shard, then reads the change stream with {{

      {version: "v2", ignoreRemovedShards: true}

      }} and verifies the returned events are an ordered subsequence of the FSM-predicted expected events. All three watch modes (collection, database, cluster) are to be tested.

      Background

      The existing FSM black-box test infrastructure (jstests/libs/util/change_stream/) validates change stream correctness in strict mode (no removed shards). It models the collection lifecycle as a state machine, generates a command sequence that covers all transitions, executes the commands, and verifies the resulting change events match predictions.

      The ignoreRemovedShards mode (cursor-level parameter on $changeStream) causes the change stream reader to silently skip events from shards that have been removed from the cluster, rather than failing with a ShardRemovedError.

      Test Strategy

      1. Create a 3-shard cluster; all 3 shards are in the shardSet
      2. Generate FSM commands covering all state transitions
      3. Compute expected events for the target watch mode via Command.getChangeEvents(watchMode)
      4. Record startAtOperationTime, execute all commands via Writer
      5. Remove a random shard from the shardSet
      6. Open a change stream with {{ {version: "v2", ignoreRemovedShards: true, startAtOperationTime}

        }} using a new kReadUntilDone reading mode that drains all available events

      7. Verify: every returned event matches an expected event in order via matchesOrSkip(), with skips allowed for events that were on the removed shard (subsequence matching)

      Each watch mode (kCollection, kDb, kCluster) is a separate it() test case in the mochalite framework.

      Changes

      jstests/libs/util/change_stream/change_stream_matcher.js

      Proposed matchesOrSkip() implementation:

       

      constructor(eventMatchers) {
          this.matchers = eventMatchers;
          this.index = 0;
          this.mismatch = null;
          this.skipped = [];  // NEW: track skipped expected events
      }
      
      /**
       * Match event against expected, skipping unmatched expected events.
       * Used for ignoreRemovedShards mode where some expected events may be
       * missing because they were on a removed shard.
       *
       * @param {Object} event - The change event to process
       * @param {boolean} cursorClosed - Whether the cursor has been closed
       * @returns {boolean} True if matched, false if no match in remaining expected
       */
      matchesOrSkip(event, cursorClosed) {
          while (this.index < this.matchers.length) {
              if (this.matchers[this.index].matches(event, cursorClosed)) {
                  this.index++;
                  return true;
              }
              this.skipped.push({
                  index: this.index,
                  type: this.matchers[this.index].event.operationType,
              });
              this.index++;
          }
          this.mismatch = {
              index: this.index,
              expected: "<end of expected>",
              actual: event.operationType,
          };
          return false;
      } 

       

       

      jstests/libs/util/change_stream/change_stream_reader.js

      • Add kReadUntilDone reading mode: drains all available events from the change stream, considering the stream exhausted when no events arrive. Needed because in ignoreRemovedShards mode the event count is unknown

      jstests/libs/util/change_stream/change_stream_verifier.js

      • Add IgnoreRemovedShardsTestCase class: feeds test events through the matcher's matchesOrSkip(), asserts all test events matched (no fabricated events), asserts non-empty results, does not require all expected events to be matched (unmatched expected events were on the removed shard)
      • Proposed implementation
      class IgnoreRemovedShardsTestCase {
          constructor(testInstanceName) {
              this._testInstanceName = testInstanceName;
          }    run(conn, ctx) {
              const testEvents = ctx.getChangeEvents(conn, this._testInstanceName);
              const matcher = ctx.getChangeStreamMatcher(this._testInstanceName);        assert.gt(testEvents.length, 0,
                  "ignoreRemovedShards mode returned no events");        for (const rec of testEvents) {
                  const matched = matcher.matchesOrSkip(rec.changeEvent, rec.cursorClosed);
                  assert(matched,
                      `Unexpected event '${rec.changeEvent.operationType}' ` +
                      `not found in remaining expected events`);
              }        jsTest.log.info(
                  `ignoreRemovedShards: matched ${matcher.getMatchedCount()} events, ` +
                  `skipped ${matcher.skipped.length} expected events`);
          }
      } 

      jstests/libs/util/change_stream/change_stream_sharding_utils.js

      • Add removeRandomShardFromSet(st, dbName, shardSet) helper: picks a random shard, moves DB primary away if necessary, and calls removeShard.

      New: jstests/sharding/query/change_streams/test_change_stream_sharding_fsm_ignore_removed_shards.js

      • it() test cases (kCollection, kDb, kCluster), each creating a 3-shard cluster

      What This Tests

      • Change stream in ignoreRemovedShards mode returns events from surviving shards without errors
      • Every returned event matches a FSM-predicted expected event (no fabricated events)
      • Events arrive in correct relative order (ordered subsequence of expected events)
      • All 3 watch modes (collection, database, cluster) behave correctly
      • All FSM state transitions are exercised

            Assignee:
            Nicola Cabiddu
            Reporter:
            Denis Grebennicov
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated:
              Resolved: