By design, it was assumed that rename collection participants could only fail due to stepdown/shutdown errors, so the rationale was: promises would be invalidated before releasing POS instances and eventually new primaries would resume such instances with a clean state.
However, it turns out that there are some scenarios in which "non-stepdown" recoverable errors can happen, meaning that promises get invalidated but POS participant instances do not get released. As a consequence, any retry results in the following flow: get the POS instance, check the promises and fail again.
This can happen for example in case of index builds happening concurrently on a collection being renamed (participants get stuck with BackgroundOperationInProgressForNamespace error).
—
Workaround in case this bug is hit by some users: trigger an election on all shards with a stuck rename participant.