-
Type: Bug
-
Resolution: Fixed
-
Priority: Major - P3
-
Affects Version/s: 3.4.15, 3.6.4, 4.0.0, 4.2.0
-
Component/s: Sharding
-
Fully Compatible
-
ALL
-
v4.2, v4.0, v3.6, v3.4
-
-
Sharding 2019-08-26
-
23
Before a chunk migration can commit, the shard receiving the chunk is supposed to wait for every document in the chunk to be majority committed to ensure no data can be lost if there's a failover. Currently, the recipient shard will wait for the maximum of the last opTime on the client of the thread driving the migration on the recipient at the end of the cloning phase and the opTime of the latest transfer mod write to be majority committed.
SERVER-32885 changed migration recipients to insert documents received in the initial cloning phase on a separate thread, so the driving thread's last opTime at the end of the cloning phase will not be the opTime of the insert of the latest cloned document. If there are no transfer mod writes (the opTimes of which are correctly tracked and are guaranteed to be > the opTime of the inserts of any cloned documents), then the migration can commit without waiting for all cloned documents to be majority replicated.
This has at least the following implications for migrations that involve no transfer mods:
- Majority reads without causal consistency may fail to read documents inserted on the same session with majority write concern even without failovers (like in the attached repro).
- A recipient shard stepdown after the migration commits may lead to a rollback of previously majority committed documents.
SERVER-32885 was backported to 3.4, so this may be a problem from that release on, although I haven't manually verified this.