Loading...

XML

Word

Printable

JSON

Type: Bug
Resolution: Fixed
Priority: Major - P3
Fix Version/s: 3.6.15, 4.3.1
Affects Version/s: 3.4.15, 3.6.4, 4.0.0, 4.2.0
Component/s: Sharding
Labels:
- sharding-wfbf-day

Backwards Compatibility:
Fully Compatible
Operating System:
ALL
Backport Requested:

v4.2, v4.0, v3.6, v3.4
Steps To Reproduce:
Hide

Run the following test that shows majority committed documents may not be in the recipient shard's majority committed snapshot after a migration:

(function() { "use strict"; // Set up a sharded cluster with two shards, two chunks, and one document in one of the chunks. const st = new ShardingTest({shards: 2, rs: {nodes: 2}, config: 1}); const testDB = st.s.getDB("test"); assert.commandWorked(testDB.foo.insert({_id: 1}, {writeConcern: {w: "majority"}})); st.ensurePrimaryShard("test", st.shard0.shardName); assert.commandWorked(st.s.adminCommand({enableSharding: "test"})); assert.commandWorked(st.s.adminCommand({shardCollection: "test.foo", key: {_id: 1}})); assert.commandWorked(st.s.adminCommand({split: "test.foo", middle: {_id: 0}})); // The document is in the majority committed snapshot. assert.eq(1, testDB.foo.find().readConcern("majority").itcount()); // Advance a migration to the beginning of the cloning phase. assert.commandWorked(st.rs1.getPrimary().adminCommand( {configureFailPoint: 'migrateThreadHangAtStep2', mode: "alwaysOn"})); TestData.toShard = st.shard1.shardName; const awaitMigration = startParallelShell(() => { jsTestLog("moving to " + TestData.toShard); assert.commandWorked( db.adminCommand({moveChunk: "test.foo", find: {_id: 1}, to: TestData.toShard})); }, st.s.port); // Sleep to let the migration reach the failpoint and allow any writes to become majority committed // before pausing replication. sleep(3000); st.rs1.awaitLastOpCommitted(); // Disable replication on the recipient shard's secondary node, so the recipient shard's majority // commit point cannot advance. const destinationSec = st.rs1.getSecondary(); assert.commandWorked( destinationSec.adminCommand({configureFailPoint: "rsSyncApplyStop", mode: "alwaysOn"}), "failed to enable fail point on secondary"); // Allow the migration to begin cloning. assert.commandWorked(st.rs1.getPrimary().adminCommand( {configureFailPoint: 'migrateThreadHangAtStep2', mode: "off"})); // The migration should finish cloning and commit without being able to advance the majority commit // point. awaitMigration(); // The document is not in the recipient's majority committed snapshot, so the cloned document cannot // be found by a majority read and this assertion fails. assert.eq( 1, testDB.foo.find().readConcern("majority").itcount(), "cloned doc not in majority snapshot!"); assert.commandWorked( destinationSec.adminCommand({configureFailPoint: "rsSyncApplyStop", mode: "off"}), "failed to disable fail point on secondary"); st.stop(); })();
Show
Run the following test that shows majority committed documents may not be in the recipient shard's majority committed snapshot after a migration: (function() { "use strict" ; // Set up a sharded cluster with two shards, two chunks, and one document in one of the chunks. const st = new ShardingTest({shards: 2, rs: {nodes: 2}, config: 1}); const testDB = st.s.getDB( "test" ); assert .commandWorked(testDB.foo.insert({_id: 1}, {writeConcern: {w: "majority" }})); st.ensurePrimaryShard( "test" , st.shard0.shardName); assert .commandWorked(st.s.adminCommand({enableSharding: "test" })); assert .commandWorked(st.s.adminCommand({shardCollection: "test.foo" , key: {_id: 1}})); assert .commandWorked(st.s.adminCommand({split: "test.foo" , middle: {_id: 0}})); // The document is in the majority committed snapshot. assert .eq(1, testDB.foo.find().readConcern( "majority" ).itcount()); // Advance a migration to the beginning of the cloning phase. assert .commandWorked(st.rs1.getPrimary().adminCommand( {configureFailPoint: 'migrateThreadHangAtStep2' , mode: "alwaysOn" })); TestData.toShard = st.shard1.shardName; const awaitMigration = startParallelShell(() => { jsTestLog( "moving to " + TestData.toShard); assert .commandWorked( db.adminCommand({moveChunk: "test.foo" , find: {_id: 1}, to: TestData.toShard})); }, st.s.port); // Sleep to let the migration reach the failpoint and allow any writes to become majority committed // before pausing replication. sleep(3000); st.rs1.awaitLastOpCommitted(); // Disable replication on the recipient shard 's secondary node, so the recipient shard' s majority // commit point cannot advance. const destinationSec = st.rs1.getSecondary(); assert .commandWorked( destinationSec.adminCommand({configureFailPoint: "rsSyncApplyStop" , mode: "alwaysOn" }), "failed to enable fail point on secondary" ); // Allow the migration to begin cloning. assert .commandWorked(st.rs1.getPrimary().adminCommand( {configureFailPoint: 'migrateThreadHangAtStep2' , mode: "off" })); // The migration should finish cloning and commit without being able to advance the majority commit // point. awaitMigration(); // The document is not in the recipient's majority committed snapshot, so the cloned document cannot // be found by a majority read and this assertion fails. assert .eq( 1, testDB.foo.find().readConcern( "majority" ).itcount(), "cloned doc not in majority snapshot!" ); assert .commandWorked( destinationSec.adminCommand({configureFailPoint: "rsSyncApplyStop" , mode: "off" }), "failed to disable fail point on secondary" ); st.stop(); })();
Sprint:
Sharding 2019-08-26
Linked BF Score:
23
Confidence Status:
None
Work Order:
3
CAR Domain/s:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

Before a chunk migration can commit, the shard receiving the chunk is supposed to wait for every document in the chunk to be majority committed to ensure no data can be lost if there's a failover. Currently, the recipient shard will wait for the maximum of the last opTime on the client of the thread driving the migration on the recipient at the end of the cloning phase and the opTime of the latest transfer mod write to be majority committed.

~~SERVER-32885~~ changed migration recipients to insert documents received in the initial cloning phase on a separate thread, so the driving thread's last opTime at the end of the cloning phase will not be the opTime of the insert of the latest cloned document. If there are no transfer mod writes (the opTimes of which are correctly tracked and are guaranteed to be > the opTime of the inserts of any cloned documents), then the migration can commit without waiting for all cloned documents to be majority replicated.

This has at least the following implications for migrations that involve no transfer mods:

Majority reads without causal consistency may fail to read documents inserted on the same session with majority write concern even without failovers (like in the attached repro).
A recipient shard stepdown after the migration commits may lead to a rollback of previously majority committed documents.

~~SERVER-32885~~ was backported to 3.4, so this may be a problem from that release on, although I haven't manually verified this.

Assignee:: Janna Golden
Reporter:: Jack Mulrow
Participants:: Githook User, Jack Mulrow, Janna Golden
Votes:: 0 Vote for this issue
Watchers:: 13 Start watching this issue

Created:: Aug 12 2019 08:19:07 PM UTC
Updated:: Oct 29 2023 10:18:08 PM UTC
Resolved:: Aug 22 2019 02:58:51 PM UTC
Confidence Status Last Update:: 19/Aug/19 5:57 PM

Details

Description

Attachments

Forms

Activity

People

Dates