On step-up - more specifically during drain mode - a thread calling into onShardVersionMismatch is spawned in order to recover potential outstanding migrations.
The implementation of onShardVersionMismatch is assuming that - during drain mode - no other refresh could be running because user requests don't get served. However, this turns out to be incorrect because previously spawned refreshes are not killed on step-down/up as they are happening on a different thread than the command that spawned them.
It is then possible that the recovery on a primary node joins a refresh that started when the node was secondary, skipping the recovery.