-
Type:
Task
-
Resolution: Unresolved
-
Priority:
Major - P3
-
None
-
Affects Version/s: None
-
Component/s: None
-
None
-
Query Execution
-
QE 2026-03-16
-
None
-
None
-
None
-
None
-
None
-
None
-
None
Summary
SERVER-111381 implemented AllDatabasesChangeStreamShardTargeterImpl with C++ unit tests and JS smoke tests, but lacks white-box integration tests that verify observable shard-targeting behavior: which shards have open cursors after each lifecycle event, and that placement history is consulted at the right times.
Approach
Two jstest files using two observability mechanisms:
- *$currentOp with {
Unknown macro: { {idleCursors: true}}}
* — inspects which shards have open change stream cursors. Cursors identified via unique comment per test case.
- Log offset snapshots — before each operation, snapshot checkLog.getGlobalLog(mongos).length. After the operation, search only the new log entries for expected LOGV2 IDs and attributes. This avoids false matches from prior test steps.
Shared fixture: A single ShardingTest created in before(). beforeEach()/afterEach() only clean up databases/collections. Verbose query logging on all nodes via logComponentVerbosity: {query: {verbosity: 3{}}}.
Test Files
- jstests/sharding/query/change_streams/change_stream_all_databases_v2_strict_whitebox.js — 3 shards
- jstests/sharding/query/change_streams/change_stream_all_databases_v2_ignore_removed_shards_whitebox.js — 4 shards (needs spare after removals)
File 1: Strict Mode (change_stream_all_databases_v2_strict_whitebox.js)
Fixture: 3 shards. Tags: featureFlagChangeStreamPreciseShardTargeting, requires_sharding, uses_change_streams, assumes_balancer_off.
Test 1.1: Initialize with data on multiple shards
Setup: Create DB (primary shard0). Shard collection, split chunks across shard0 and shard1. Insert documents on both.
Steps:
- Snapshot log offset on mongos.
- Open cluster-wide v2 change stream.
- Assert cursors: shard0 (has data), shard1 (has data). No cursor on shard2 (no data). Config cursor open.
- Assert log: ID 11138104 on mongos with shards listing shard0 and shard1.
- Insert a document, verify event received.
Test 1.2: Initialize on empty cluster — config server cursor only
Setup: No user databases.
Steps:
- Snapshot log offset.
- Open cluster-wide v2 change stream.
- Assert cursors: No data shard cursors. Config cursor open.
- Assert log: ID 11138104 with empty shards.
- Snapshot log offset again.
- Create database + collection (triggers DatabaseCreatedControlEvent).
- Assert cursors: Data shard cursor now open on the DB's primary shard. Config cursor still open.
- Assert log: ID 11138117 ("Handling placement refresh") with updated shard set.
- Insert document, verify event received.
Test 1.3: DatabaseCreated on a different shard opens cursor on that shard
Setup: DB1 with primary on shard0, unsharded collection with data.
Steps:
- Open change stream. Assert cursors on shard0 + config.
- Snapshot log offset.
- Create DB2 with primaryShard: shard1. Create collection in DB2.
- Assert cursors: Cursor now also open on shard1. shard2 still no cursor. Config still open.
- Assert log: ID 11138117 with shards including shard1.
- Insert into DB2, verify event received.
Test 1.4: MoveChunk triggers placement refresh
Setup: Collection sharded across shard0 and shard1 (data on both).
Steps:
- Open change stream. Assert cursors on shard0, shard1 + config.
- Snapshot log offset.
- moveChunk all chunks from shard0 to shard2.
- Assert cursors: Cursor opened on shard2. Cursor on shard0 closed (no data). shard1 unchanged. Config still open.
- Assert log: ID 11138117 with updated shard set.
- Insert targeting shard2, verify event.
Test 1.5: MovePrimary triggers placement refresh
Setup: Unsharded collection on shard0 (shard0 is DB primary).
Steps:
- Open change stream. Assert cursor on shard0 + config.
- Snapshot log offset.
- movePrimary to shard1.
- Assert cursors: Cursor on shard1 opened. Cursor on shard0 closed. Config still open.
- Assert log: ID 11138117.
- Insert, verify event.
Test 1.6: NamespacePlacementChanged via reshardCollection triggers placement refresh
Setup: Collection sharded across shard0 and shard1 (key {}{_id: 1}{}).
Steps:
- Open change stream. Assert cursors on shard0, shard1 + config.
- Snapshot log offset.
- reshardCollection with new key {{{} {a: 1}
{}}}, distribute chunks to shard1 and shard2.
- Assert cursors: Cursor opened on shard2. Cursor on shard0 closed. shard1 unchanged. Config still open.
- Assert log: ID 11138117 with updated shard set showing shard1, shard2.
- Insert, verify event with new document key shape.
Test 1.7: Multiple databases on different shards
Setup: DB1 (primary shard0), DB2 (primary shard1), DB3 (primary shard2). Each with an unsharded collection.
Steps:
- Open change stream.
- Assert cursors: shard0, shard1, shard2 all have cursors. Config cursor open.
- Insert one document into each DB, verify all 3 events received.
—
File 2: Ignore Removed Shards Mode (change_stream_all_databases_v2_ignore_removed_shards_whitebox.js)
Fixture: 4 shards (shard3 spare for survival). Tags: add config_shard_incompatible, resource_intensive.
Test 2.1: Multi-database setup, single shard removed — bounded then unbounded
Goal: Verify the core IRS lifecycle with a multi-database whole-cluster placement. A bounded (degraded) segment targets only the surviving shard, then an unbounded (normal) segment opens a config server cursor.
Setup:
- Create DB1 with primary on shard0. Create an unsharded collection in DB1, insert doc A.
- Create DB2 with primary on shard1. Shard a collection in DB2 with key {}{_id: 1}{}, split chunks across shard0 and shard1. Insert doc B on shard0, doc C on shard1.
- Record startAtOperationTime = T1.
- Insert doc D on shard1 (into DB2's collection — provides an event in the bounded segment).
- Move DB2's chunks off shard0 to shard1. Move DB1's primary to shard1 (so shard0 is fully drained).
- Remove shard0.
Segment analysis:
- At T1: whole-cluster placement = [shard0, shard1] (DB1 on shard0, DB2 on shard0+shard1). shard0 removed → shards=[shard1], bounded [T1, T_drain).
- At T_drain: placement = [shard1], no removed shard → unbounded.
Steps:
- Snapshot log offset on mongos.
- Open cluster-wide v2 IRS stream from T1 with comment: "test_2_1".
- Assert log (first segment): ID 11138108 on mongos with:
- shards containing only shard1.
- nextPlacementChangedAt set (bounded).
- Assert cursors (bounded segment): Cursor on shard1 only. No cursor on shard0 (removed), shard2, shard3. No config server cursor.
- Verify events from bounded segment: doc D from shard1 arrives. Doc A (DB1 on shard0) and doc B (DB2 chunk on shard0) are lost.
- Snapshot log offset.
- After segment transition past T_drain:
- Assert log (second segment): ID 11138108 with nextPlacementChangedAt absent (unbounded).
- Assert cursors (unbounded segment): Cursor on shard1. Config server cursor now open.
- Insert doc E on shard1, verify event received.
Test 2.2: All original data shards removed — segment skips to new placement
Goal: Verify that when all shards that had data at T1 are removed, the fetcher's internal loop skips forward to the first timestamp where a surviving shard has data. The openCursorAt value advances past T1, and events from the skipped range are lost.
Setup:
- Create DB with primary on shard0. Create unsharded collection. Insert doc A on shard0.
- Record startAtOperationTime = T1.
- Move primary to shard3 (spare shard) at time ~T_move. This moves data to shard3.
- Remove shard0.
Segment analysis:
- At T1: placement = [shard0]. shard0 removed → surviving = [] (empty). Fetcher loops: skips to T_move.
- At T_move: placement = [shard3], no removed shard → single unbounded segment starting at T_move.
- Range [T1, T_move) is silently skipped.
Steps:
- Snapshot log offset on mongos.
- Open cluster-wide v2 IRS stream from T1 with comment: "test_2_2".
- Assert log: ID 11138108 with:
- openCursorAt = T_move (NOT T1 — demonstrating the skip).
- shards containing shard3.
- nextPlacementChangedAt absent (unbounded).
- Assert cursors: Cursor on shard3. Config server cursor open. No cursors on shard0, shard1, shard2.
- Verify no events from [T1, T_move) are returned (those were on removed shard0).
- Insert doc B on shard3, verify event received.
Test 2.3: Segments discover new shard — data migrated after T1 is not missed
Goal: Prove that reading in bounded segments is necessary for correctness. Without segments, data that migrated to a new shard (shard2) after the stream's start time would be invisible because shard2 wasn't in the original placement. The segment boundary forces re-evaluation, discovering shard2.
Setup:
- Create DB1 with primary on shard0. Create unsharded collection, insert doc A on shard0.
- Create DB2 with primary on shard0. Shard a collection in DB2 with key {}{_id: 1}{}, split chunks across shard0 and shard1. Insert doc B on shard0, doc C on shard1.
- Record startAtOperationTime = T1.
- Insert doc D on shard1 (into DB2's collection).
- Move DB2's chunks from shard1 to shard2 at time ~T_move. This creates a placement change: shard1 drops out of DB2's placement, shard2 enters.
- Insert doc E on shard2 (into DB2's collection, post-move).
- Remove shard1.
Segment analysis:
- At T1: whole-cluster placement = [shard0, shard1] (DB1 on shard0, DB2 on shard0+shard1). shard1 removed → shards=[shard0], bounded [T1, T_move).
- At T_move: placement = [shard0, shard2] (DB1 on shard0, DB2 on shard0+shard2). No removed shard → shards=[shard0, shard2], unbounded.
The key point: shard2 was NOT in the placement at T1. If cursors were opened only based on T1's placement minus removed shards, we'd have [shard0] and never see doc E on shard2. The segment boundary at T_move forces re-evaluation, discovering shard2.
Steps:
- Snapshot log offset.
- Open cluster-wide v2 IRS stream from T1 with comment: "test_2_3".
- Assert log (segment 1): ID 11138108 with shards=[shard0], nextPlacementChangedAt ~= T_move.
- Assert cursors (segment 1): Cursor on shard0 only. No cursor on shard1 (removed), shard2 (not yet in placement). No config cursor (bounded).
- Verify segment 1 events: doc A from shard0 (DB1). Doc D from shard1 lost (shard1 removed).
- Snapshot log offset.
- After segment transition at T_move:
- Assert log (segment 2): ID 11138108 with shards=[shard0, shard2], nextPlacementChangedAt absent.
- Assert cursors (segment 2): Cursors on shard0 AND shard2. Config cursor open.
- Verify doc E from shard2 is received — proving that segment-based reading discovered shard2.
- Insert doc F on shard2, verify event.
Test 2.4: Multiple databases on different shards — only surviving shard's events returned
Goal: Verify all-databases-specific behavior in IRS mode: with databases on different shards, removing one shard causes only that shard's database events to be lost, while other databases' events on surviving shards are preserved.
Setup:
- Create DB1 with primary on shard0. Create unsharded collection, insert doc A.
- Create DB2 with primary on shard1. Create unsharded collection, insert doc B.
- Record startAtOperationTime = T1.
- Insert doc C into DB2 (on shard1).
- Move DB1's primary to shard1 (so shard0 is drained). Remove shard0.
Segment analysis:
- At T1: whole-cluster placement = [shard0, shard1] (DB1 on shard0, DB2 on shard1). shard0 removed → shards=[shard1], bounded.
- After boundary: only shard1 in placement, unbounded.
Steps:
- Snapshot log offset.
- Open cluster-wide v2 IRS stream from T1 with comment: "test_2_4".
- Assert log: ID 11138108 with shards=[shard1].
- Assert cursors (bounded segment): Cursor on shard1 only. No config cursor.
- Verify only DB2's events returned: doc B and doc C. Doc A (DB1 on shard0) lost.
- After transition to unbounded segment:
- Assert cursors: shard1 + config cursor.
- Insert doc D into DB2, verify event received.
- is related to
-
SERVER-111381 Implement AllDatabasesChangeStreamShardTargeterImpl module
-
- Open
-