During initial sync, incomplete index builds on the sync source during the collection cloning phase are started, but not completed, on the initial syncing node. After cloning the collections, if there are any conflicting operations on the collection before committing or aborting the index build, we will interrupt the index builds with the understanding that they will be restarted/aborted on processing an index build commit or abort operation.
The current list of conflicting operations, defined in
SERVER-46659, includes collection and index drops as well as collection renames. However, we should consider adding index hide/unhide, which are replicated as collMod commands, to this list so that initial sync will interrupt the index builds before applying the collMod command.
The rationale for interrupting index builds during initial sync for hide/unhide index operations can be illustrated by the following sequence of operations on the sync source:
On sync source,
- create index a_1
- hide index a_1 <--- startTimestamp for initial sync
- unhide index a_1
- drop index a_1
- create index a_1 <---- collection cloner starts index build for second a_1 index
On initial syncing node,
- clones collection and starts index build for a_1
- completes collection cloning, starts apply oplog entries from 'startTimestamp'
- applying collMod for hiding inde fails with BackgroundOperationInProgressForCollection
- does not interrupt index build for a_1
- waits for initial builds to complete for collection, which leads to the initial syncing node hanging.