Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-63722

Rename collection participants get stuck upon errors different from stepdown/shutdown

    • Fully Compatible
    • ALL
    • v5.3, v5.2, v5.0
    • Sharding EMEA 2022-02-21

      By design, it was assumed that rename collection participants could only fail due to stepdown/shutdown errors, so the rationale was: promises would be invalidated before releasing POS instances and eventually new primaries would resume such instances with a clean state.

      However, it turns out that there are some scenarios in which "non-stepdown" recoverable errors can happen, meaning that promises get invalidated but POS participant instances do not get released. As a consequence, any retry results in the following flow: get the POS instance, check the promises and fail again.

      This can happen for example in case of index builds happening concurrently on a collection being renamed (participants get stuck with BackgroundOperationInProgressForNamespace error).

      Workaround in case this bug is hit by some users: trigger an election on all shards with a stuck rename participant.

            pierlauro.sciarelli@mongodb.com Pierlauro Sciarelli
            pierlauro.sciarelli@mongodb.com Pierlauro Sciarelli
            0 Vote for this issue
            5 Start watching this issue