The range deleter waits for replication on two occasions:
- First using the moveChunk operation's write concern in Helpers::removeRange which does log the time spent for replication.
- Second time using a 'majority' write concern, which does not log at all.
This second majority wait is completely unnecessary. The migration recipient side can keep going without attempting a majority write until the very end, after all documents have been transferred.
As part of fixing this bug, we should consider the following:
- Before even accepting a migration request, the recipient shard should do a best-effort attempt to check how behind it is from the rest of the replica set (perhaps by doing a majority write with some timeout then) and if that fails, don't even attempt a migration. This is the counterpart of
SERVER-22876. - If the migration was for an empty chunk and we didn't patch up any indexes, do not do any replication waits at all and enter the READY state immediately.
- related to
-
SERVER-29807 RangeDeleter should log when its about to wait for majority replication
- Closed