Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-14375

never ending "split failed Cause: the collection's metadata lock is taken"

    • Type: Icon: Bug Bug
    • Resolution: Done
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: 2.6.1
    • Component/s: Sharding
    • Labels:
      None
    • ALL

      We are presplitting our chunks which worked always fine until we have upgraded to v2.6.1_linux_64bit. Since then we encounter never ending "split failed Cause: the collection's metadata lock is taken" error messages.
      In the log of the mongod holding the chunk to split we find:

      2014-06-27T16:42:57.797+0200 [LockPinger] cluster sx210:20020,sx176:20020,sx177:20020 pinged successfully at Fri Jun 27 16:42:57 2014 by distributed lock pinger 'sx210:20020,sx176:20020,sx177:20020/s484:27017:1403879511:112806737', sleeping for 30000ms
      2014-06-27T16:42:58.707+0200 [conn17] received splitChunk request: { splitChunk: "offerStore.offer", keyPattern: { _id: 1.0 }, min: { _id: 2929980021 }, max: { _id: MaxKey }, from: "offerStoreDE5", splitKeys: [ { _id: 2930480021 } ], shardId: "offerStore.offer-_id_2929980021", configdb: "sx210:20020,sx176:20020,sx177:20020" }
      2014-06-27T16:43:01.738+0200 [conn17] received splitChunk request: { splitChunk: "offerStore.offer", keyPattern: { _id: 1.0 }, min: { _id: 2929980021 }, max: { _id: MaxKey }, from: "offerStoreDE5", splitKeys: [ { _id: 2930480021 } ], shardId: "offerStore.offer-_id_2929980021", configdb: "sx210:20020,sx176:20020,sx177:20020" }
      2014-06-27T16:43:04.771+0200 [conn17] received splitChunk request: { splitChunk: "offerStore.offer", keyPattern: { _id: 1.0 }, min: { _id: 2929980021 }, max: { _id: MaxKey }, from: "offerStoreDE5", splitKeys: [ { _id: 2930480021 } ], shardId: "offerStore.offer-_id_2929980021", configdb: "sx210:20020,sx176:20020,sx177:20020" }
      

      Related to the issue https://jira.mongodb.org/browse/SERVER-14047 , where we learnt that we have to shut down the whole cluster to clean-up noTimeOut cursors because they may block chunk moves, we restarted the whole cluster (which already is quite painful!). We left all routers shut down and started only one router on a "private" port so that only the application which does the presplit was connected to the cluster. Nevertheless, we received the same error messages as above!
      How it's possible that there is still a metadata lock? How to deblock it? How can we proceed with our presplitting?

            Assignee:
            sam.kleinman Sam Kleinman (Inactive)
            Reporter:
            kay.agahd@idealo.de Kay Agahd
            Votes:
            0 Vote for this issue
            Watchers:
            9 Start watching this issue

              Created:
              Updated:
              Resolved: