Description:
I'm not sure if we really should document this or not, but there are implications for upgrade. It's pretty deep in the weeds, but see my latest public comment. Let me know if you have any questions.
Engineering Ticket Description:
The CollectionCloner, used in the first phase of initial sync, has code to interpret error codes from the getMore command as indicative of a collection drop:
https://github.com/mongodb/mongo/blob/bf58b1ab2abfb2a3ab7a86c154f9f5954ed6f98c/src/mongo/db/repl/collection_cloner.cpp#L576-L582
In 4.0 and older branches, it handles OperationFailed and CursorNotFound. However, collection drops can result in a getMore returning QueryPlanKilled. This error code should be handled as well. This was fixed in 4.1.7 by SERVER-37451, but it still needs to be fixed in older branches. As part of moving ClientCursor ownership to the global cursor manager, SERVER-37451 changed the server's behavior such that collection drops result in QueryPlanKilled rather than CursorNotFound. This necessitated an immediate fix in master in order to ensure that initial sync remains resilient to collection drops. This ticket tracks the remaining backport work.