-
Type:
Bug
-
Resolution: Done
-
Priority:
Major - P3
-
None
-
Affects Version/s: 2.6.1
-
Component/s: Sharding
-
None
-
ALL
We are presplitting our chunks which worked always fine until we have upgraded to v2.6.1_linux_64bit. Since then we encounter never ending "split failed Cause: the collection's metadata lock is taken" error messages.
In the log of the mongod holding the chunk to split we find:
2014-06-27T16:42:57.797+0200 [LockPinger] cluster sx210:20020,sx176:20020,sx177:20020 pinged successfully at Fri Jun 27 16:42:57 2014 by distributed lock pinger 'sx210:20020,sx176:20020,sx177:20020/s484:27017:1403879511:112806737', sleeping for 30000ms 2014-06-27T16:42:58.707+0200 [conn17] received splitChunk request: { splitChunk: "offerStore.offer", keyPattern: { _id: 1.0 }, min: { _id: 2929980021 }, max: { _id: MaxKey }, from: "offerStoreDE5", splitKeys: [ { _id: 2930480021 } ], shardId: "offerStore.offer-_id_2929980021", configdb: "sx210:20020,sx176:20020,sx177:20020" } 2014-06-27T16:43:01.738+0200 [conn17] received splitChunk request: { splitChunk: "offerStore.offer", keyPattern: { _id: 1.0 }, min: { _id: 2929980021 }, max: { _id: MaxKey }, from: "offerStoreDE5", splitKeys: [ { _id: 2930480021 } ], shardId: "offerStore.offer-_id_2929980021", configdb: "sx210:20020,sx176:20020,sx177:20020" } 2014-06-27T16:43:04.771+0200 [conn17] received splitChunk request: { splitChunk: "offerStore.offer", keyPattern: { _id: 1.0 }, min: { _id: 2929980021 }, max: { _id: MaxKey }, from: "offerStoreDE5", splitKeys: [ { _id: 2930480021 } ], shardId: "offerStore.offer-_id_2929980021", configdb: "sx210:20020,sx176:20020,sx177:20020" }
Related to the issue https://jira.mongodb.org/browse/SERVER-14047 , where we learnt that we have to shut down the whole cluster to clean-up noTimeOut cursors because they may block chunk moves, we restarted the whole cluster (which already is quite painful!). We left all routers shut down and started only one router on a "private" port so that only the application which does the presplit was connected to the cluster. Nevertheless, we received the same error messages as above!
How it's possible that there is still a metadata lock? How to deblock it? How can we proceed with our presplitting?