Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-29599

Balancer never relinquishes lock

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major - P3
    • Resolution: Works as Designed
    • Affects Version/s: 3.4.4
    • Fix Version/s: None
    • Component/s: Sharding
    • Labels:
      None
    • Environment:
      3.4.4 sharded cluster with 18 shards, each consisting of 1 replica, 1 primary, and 1 hidden replica. 3 config servers (CSRS) and 5 mongoS
    • Operating System:
      ALL
    • Steps To Reproduce:
      Hide
      1. Stop the balancer
      2. Wait for the balancer to finish it's migration and stop
      3. Check locks collection for the balancer lock

      Let me know if you need more information to help reproduce. I'm not sure what you need now but I'm sure you'll need something.

      Show
      Stop the balancer Wait for the balancer to finish it's migration and stop Check locks collection for the balancer lock Let me know if you need more information to help reproduce. I'm not sure what you need now but I'm sure you'll need something.

      Description

      After upgrading our main mongo cluster from 3.2.12 to 3.4.4, we've noticed a weird behavior where the balancer never relinquishes it's lock. I can run sh.isBalancerRunning() and sh.getBalancerState(), both of which return false, but the balancer lock still shows a state of "2".
      Found using:

      db.getSiblingDB("config").locks.findOne({_id: "balancer"}).state

      I've checked the changelog collection and haven't found any evidence there that the balancer is still actually running.

      We also have had a problem for a while with moving chunks in this cluster due to mismatching index definitions on the various shards, which we are blocked from repairing due to another bug with dropping indexes which I'll log elsewhere and link to this.

      We turn off the balancer every night to do some system maintenance, and for now we've been having to manually free the balancer lock otherwise this maintenance gets stuck waiting for the balancer to finish it's migration.

      On a possibly related note, I've had to fix this balancer lock a few times in the past few days, so either some process on our end keeps re-enabling the balancer, or the lock keeps getting re-established on its own.

        Attachments

          Activity

            People

            Assignee:
            kaloian.manassiev Kaloian Manassiev
            Reporter:
            glajchs Scott Glajch
            Participants:
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

              Dates

              Created:
              Updated:
              Resolved: