Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-7161

Sharding will fail with non obvious error when locks collection is not consistent

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Minor - P4
    • Resolution: Duplicate
    • Affects Version/s: 2.0.7
    • Fix Version/s: None
    • Component/s: Sharding
    • Labels:
      None
    • Operating System:
      ALL

      Description

      If a locks collection is inconsistent across three config servers, shards will fail to be balanced. In the case where one shard has the lock entry but one or both of the others do not, a message similar to the following will appear in the logs:

      [Balancer] caught exception while doing balance: distributed lock balancer/ip-<ip>:<port>:1347910582:1804289383 had errors communicating with individual server <server>:<port> :: caused by :: field not found, expected type 7

      expected type 7 refers to the ObjectId that is missing from the locks collection within the affect shard key.

      The balancer lock will be forced when it times out, with the following messages:

      [Balancer] forcing lock 'balancer/ip-<ip>:<port>:1347690492:1804289383' because elapsed time 900364 > takeover time 900000
      [Balancer] warning: lock forcing balancer/ip-<ip>:<port>:1347690492:1804289383 inconsistent
      [Balancer] lock 'balancer/ip-<ip>:<port>:1347690492:1804289383' successfully forced

      Which indicates the locks is both successfully forced and inconsistent. However, no shard balancing will take place.

      Potentially the message could be more obvious ("Lock not found"), or the Lock should be successfully forced as reported.

        Attachments

          Issue Links

            Activity

              People

              Assignee:
              greg_10gen Greg Studer
              Reporter:
              andre.defrere Andre de Frere
              Participants:
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved: