Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-14375

never ending "split failed Cause: the collection's metadata lock is taken"

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major - P3
    • Resolution: Done
    • Affects Version/s: 2.6.1
    • Fix Version/s: None
    • Component/s: Sharding
    • Labels:
      None
    • Operating System:
      ALL

      Description

      We are presplitting our chunks which worked always fine until we have upgraded to v2.6.1_linux_64bit. Since then we encounter never ending "split failed Cause: the collection's metadata lock is taken" error messages.
      In the log of the mongod holding the chunk to split we find:

      2014-06-27T16:42:57.797+0200 [LockPinger] cluster sx210:20020,sx176:20020,sx177:20020 pinged successfully at Fri Jun 27 16:42:57 2014 by distributed lock pinger 'sx210:20020,sx176:20020,sx177:20020/s484:27017:1403879511:112806737', sleeping for 30000ms
      2014-06-27T16:42:58.707+0200 [conn17] received splitChunk request: { splitChunk: "offerStore.offer", keyPattern: { _id: 1.0 }, min: { _id: 2929980021 }, max: { _id: MaxKey }, from: "offerStoreDE5", splitKeys: [ { _id: 2930480021 } ], shardId: "offerStore.offer-_id_2929980021", configdb: "sx210:20020,sx176:20020,sx177:20020" }
      2014-06-27T16:43:01.738+0200 [conn17] received splitChunk request: { splitChunk: "offerStore.offer", keyPattern: { _id: 1.0 }, min: { _id: 2929980021 }, max: { _id: MaxKey }, from: "offerStoreDE5", splitKeys: [ { _id: 2930480021 } ], shardId: "offerStore.offer-_id_2929980021", configdb: "sx210:20020,sx176:20020,sx177:20020" }
      2014-06-27T16:43:04.771+0200 [conn17] received splitChunk request: { splitChunk: "offerStore.offer", keyPattern: { _id: 1.0 }, min: { _id: 2929980021 }, max: { _id: MaxKey }, from: "offerStoreDE5", splitKeys: [ { _id: 2930480021 } ], shardId: "offerStore.offer-_id_2929980021", configdb: "sx210:20020,sx176:20020,sx177:20020" }

      Related to the issue https://jira.mongodb.org/browse/SERVER-14047 , where we learnt that we have to shut down the whole cluster to clean-up noTimeOut cursors because they may block chunk moves, we restarted the whole cluster (which already is quite painful!). We left all routers shut down and started only one router on a "private" port so that only the application which does the presplit was connected to the cluster. Nevertheless, we received the same error messages as above!
      How it's possible that there is still a metadata lock? How to deblock it? How can we proceed with our presplitting?

        Attachments

          Activity

            People

            • Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: