Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-7260

Balancer lock is not relinquished

    • Type: Icon: Bug Bug
    • Resolution: Done
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: 2.0.7, 2.2.0
    • Component/s: Sharding
    • ALL

      Under certain conditions, the balancer lock may never be relinquished. One case appeasr to have occured when the balancer state was disabled during a chunk migration:

      mongos> db.locks.findOne({_id:"balancer"});
      {
              "_id" : "balancer",
              "process" : "r5.10gen.cc:27017:1349297686:1804289383",
              "state" : 2,
              "ts" : ObjectId("506cae1f13bf56db8d1b0856"),
              "when" : ISODate("2012-10-03T21:29:03.359Z"),
              "who" : "r5.10gen.cc:27017:1349297686:1804289383:Balancer:846930886",
              "why" : "doing balance round"
      }
      
      mongos> db.changelog.find().sort({$natural:-1}).limit(10).skip(10).pretty()
      {
              "_id" : "r5.10gen.cc-2012-10-03T21:30:05-17",
              "server" : "r5.10gen.cc",
              "clientAddr" : "127.0.0.1:57957",
              "time" : ISODate("2012-10-03T21:30:05.136Z"),
              "what" : "moveChunk.from",
              "ns" : "sh.test",
              "details" : {
                      "min" : {
                              "id" : "16540452295883480447516388304186410329865247257024"
                      },
                      "max" : {
                              "id" : "22754752024366413683521379069776306796548182491720"
                      },
                      "step1 of 6" : 0,
                      "step2 of 6" : 305,
                      "step3 of 6" : 378,
                      "step4 of 6" : 32007,
                      "step5 of 6" : 4542,
                      "step6 of 6" : 24280
              }
      }
      

      Note the above output was taken 15 hours after the last moveChunk was logged to the config server. It's unclear if the mongos process holding the lock was killed before it had a chance to release the lock.

      The net effect is that sh.isBalancerRunning() never returns false, even if the balancer is no longer running.

            Votes:
            4 Vote for this issue
            Watchers:
            11 Start watching this issue

              Created:
              Updated:
              Resolved: