Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-63722

Rename collection participants get stuck upon errors different from stepdown/shutdown

    XMLWordPrintable

Details

    • Bug
    • Status: Closed
    • Major - P3
    • Resolution: Fixed
    • 5.2.0, 5.3.0-rc0, 5.0.5
    • 6.0.0-rc0, 5.0.7, 5.2.2, 5.3.0-rc2
    • Sharding
    • None
    • Fully Compatible
    • ALL
    • v5.3, v5.2, v5.0
    • Sharding EMEA 2022-02-21

    Description

      By design, it was assumed that rename collection participants could only fail due to stepdown/shutdown errors, so the rationale was: promises would be invalidated before releasing POS instances and eventually new primaries would resume such instances with a clean state.

      However, it turns out that there are some scenarios in which "non-stepdown" recoverable errors can happen, meaning that promises get invalidated but POS participant instances do not get released. As a consequence, any retry results in the following flow: get the POS instance, check the promises and fail again.

      This can happen for example in case of index builds happening concurrently on a collection being renamed (participants get stuck with BackgroundOperationInProgressForNamespace error).

      Workaround in case this bug is hit by some users: trigger an election on all shards with a stuck rename participant.

      Attachments

        Activity

          People

            pierlauro.sciarelli@mongodb.com Pierlauro Sciarelli
            pierlauro.sciarelli@mongodb.com Pierlauro Sciarelli
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: