[SERVER-9262] Mongo distributed locks aren't released => balancing stops working Created: 05/Apr/13  Updated: 11/Jul/16  Resolved: 13/Apr/13

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: 2.2.3
Fix Version/s: None

Type: Bug Priority: Critical - P2
Reporter: James Blackburn Assignee: Unassigned
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
Operating System: ALL
Participants:

 Description   

We have a shard that is nearly full, while other more recently added shards are nearly empty. I was wondering why a particularly large sharded collection is much larger on two shards than others:

c.mongoose.command('collStats', 'centaur')
Out[51]: 
{u'ns': u'mongoose.centaur',
 u'sharded': True,
 u'shards': {u'rs0': 
   u'size': 152076069392.0,
   u'storageSize': 154752093200.0},
  u'rs1': {
   u'size': 114195152272.0,
   u'storageSize': 116004875472.0},
  u'rs2': {
   u'size': 13189608896.0,
   u'storageSize': 13481529296.0,},
  u'rs3': {
   u'size': 11661895416.0,
   u'storageSize': 13594558416.0,},
  u'rs4': {
   u'size': 7642675920.0,
   u'storageSize': 7859703808.0},
 u'size': 298765401896.0,
 u'storageSize': 305692760192.0,
 u'totalIndexSize': 434202832}

Data is 10x the size on rs0/rs1 than 2,3,4

Looking at the config DB logs:

mongos> db.changelog.find({ns: "mongoose.centaur"}).sort({time: -1}).pretty()
{
        "_id" : "dlonapahls211.maninvestments.com-2013-04-04T09:30:32-8",
        "server" : "dlonapahls211.maninvestments.com",
        "clientAddr" : "10.220.54.3:47785",
        "time" : ISODate("2013-04-04T09:30:32.119Z"),
        "what" : "split",
        "ns" : "mongoose.centaur",
        "details" : {

The last trace for mongoose.centaur is 36 hours ago.

If I look in config.locks, I see:

mongos> db.locks.find({_id: "mongoose.centaur"}).pretty()
{
        "_id" : "mongoose.centaur",
        "process" : "dlonapahls211.maninvestments.com:27118:1365067417:1678333556",
        "state" : 0,
        "ts" : ObjectId("515d48375b3b8c1d49d2b605"),
        "when" : ISODate("2013-04-04T09:30:31.958Z"),
        "who" : "dlonapahls211.maninvestments.com:27118:1365067417:1678333556:conn575:189782212",
        "why" : "split-{ symbol: \"thingy\" }"
}

There are many 'old' - some from February, March - in this collection in state 0. It's not clear to me what state 0 is, and whether this is important...

On dlonapahls211's mongos log, for 2013-04-04T09:30:31.958 I see:

Thu Apr  4 09:29:10 [Balancer] distributed lock 'balancer/dlonapahls211.maninvestments.com:27119:1365027221:1804289383' acquired, ts : 515d39d60c29edf10f5bf0e2
Thu Apr  4 09:29:10 [Balancer] distributed lock 'balancer/dlonapahls211.maninvestments.com:27119:1365027221:1804289383' unlocked. 
Thu Apr  4 09:29:17 [Balancer] distributed lock 'balancer/dlonapahls211.maninvestments.com:27119:1365027221:1804289383' acquired, ts : 515d39dc0c29edf10f5bf0e3
Thu Apr  4 09:29:17 [Balancer] distributed lock 'balancer/dlonapahls211.maninvestments.com:27119:1365027221:1804289383' unlocked. 
Thu Apr  4 09:29:46 [LockPinger] cluster cn53:27117,cn54:27117,cn55:27117 pinged successfully at Thu Apr  4 09:29:46 2013 by distributed lock pinger 'cn53:27117,cn54:27117,cn55:27117/dlonapahls211.maninvestments.com:27119:1365027221:1804289383', sleeping for 30000ms
Thu Apr  4 09:30:11 [Balancer] distributed lock 'balancer/dlonapahls211.maninvestments.com:27119:1365027221:1804289383' acquired, ts : 515d3a130c29edf10f5bf0e4
Thu Apr  4 09:30:11 [Balancer] distributed lock 'balancer/dlonapahls211.maninvestments.com:27119:1365027221:1804289383' unlocked. 
Thu Apr  4 09:30:30 [Balancer] could not acquire lock 'balancer/dlonapahls211.maninvestments.com:27119:1365027221:1804289383' (another update won)
Thu Apr  4 09:30:30 [Balancer] distributed lock 'balancer/dlonapahls211.maninvestments.com:27119:1365027221:1804289383' was not acquired.
Thu Apr  4 09:30:42 [Balancer] could not acquire lock 'balancer/dlonapahls211.maninvestments.com:27119:1365027221:1804289383' (another update won)
Thu Apr  4 09:30:42 [Balancer] distributed lock 'balancer/dlonapahls211.maninvestments.com:27119:1365027221:1804289383' was not acquired.
Thu Apr  4 09:31:54 [Balancer] warning: distributed lock 'balancer/dlonapahls211.maninvestments.com:27119:1365027221:1804289383 did not propagate properly. :: caused by :: 8017 update not consistent  ns: config.locks query: { _id: "balancer", state: 0, ts: ObjectId('515d3a7ab07f340a2aafd3bf') } update: { $set: { state: 1, who: "dlonapahls211.maninvestments.com:27119:1365027221:1804289383:Balancer:846930886", process: "dlonapahls211.maninvestments.com:27119:1365027221:1804289383", when: new Date(1365064314764), why: "doing balance round", ts: ObjectId('515d3a7a0c29edf10f5bf0e7') } } gle1: { updatedExisting: true, n: 1, connectionId: 774, waited: 34, err: null, ok: 1.0 } gle2: { updatedExisting: false, n: 0, connectionId: 813, waited: 34, err: null, ok: 1.0 }
Thu Apr  4 09:31:54 [Balancer] lock update lost, lock 'balancer/dlonapahls211.maninvestments.com:27119:1365027221:1804289383' not propagated.
Thu Apr  4 09:31:54 [Balancer] distributed lock 'balancer/dlonapahls211.maninvestments.com:27119:1365027221:1804289383' was not acquired.
Thu Apr  4 09:32:17 [LockPinger] trying to delete 1 old lock entries for process dlonapahls211.maninvestments.com:27119:1365027221:1804289383
Thu Apr  4 09:32:17 [LockPinger] handled late remove of old distributed lock with ts 515d3a7a0c29edf10f5bf0e7
Thu Apr  4 09:32:36 [Balancer] distributed lock 'balancer/dlonapahls211.maninvestments.com:27119:1365027221:1804289383' acquired, ts : 515d3aa40c29edf10f5bf0e8
Thu Apr  4 09:32:37 [Balancer] distributed lock 'balancer/dlonapahls211.maninvestments.com:27119:1365027221:1804289383' unlocked. 

Any ideas on what the issue might be? How do I go about getting mongoose.centaur to be balanced again?

Is there any way to see which collections Mongo thinks are eligible for balancing - i.e. how can I tell if it will eventually balance vs. it will never balance?



 Comments   
Comment by kishore battula [ 30/Jul/13 ]

thank you Dan

Comment by Daniel Pasette (Inactive) [ 29/Jul/13 ]

kishore25kumar, to set the jumbo flag to false, you need to make a modification to the chunks collection in the config database directly from the mongos.

mongos> use config
mongos> db.chunks.update({jumbo:true}, {$set: { jumbo: false}}, false, true);

Comment by kishore battula [ 26/Jul/13 ]

can you please tell me how to set the jumbo flag to false

Comment by Stennie Steneker (Inactive) [ 13/Apr/13 ]

Hi James,

Thanks for confirming .. closing the issue.

Cheers,
Stephen

Comment by James Blackburn [ 12/Apr/13 ]

Wonderful, thanks Eliot and Stephen. Chunks balancing nicely now.

Cheers,
James

Comment by Eliot Horowitz (Inactive) [ 12/Apr/13 ]

If you look at the chunks collection, you'll probably see a "jumbo" flag.
Set that to false.

Comment by James Blackburn [ 12/Apr/13 ]

Magic! I hadn't realised that affected moving in addition to splitting.

I've bumped the limit to 2048 (to allow moving chunks around a bit). The balancer still seems to be ignoring the chunks (I've turned it off and on again, and flushRouterConfig). The moveChunk command I ran manually above seemed to work. So I could go through and move a bunch of chunks manually.

Comment by Eliot Horowitz (Inactive) [ 12/Apr/13 ]

You can change the chunk size in config.settings.

connect to mongos

use config
db.settings.find()

then update the chunk size to 256mb or larger.

Comment by James Blackburn [ 12/Apr/13 ]

Ah, got it. Got brave and tried to moveChunk manually. The chunks are too big. We added 2 shards first, then the remaining shards later:

mongos> db.runCommand({moveChunk: 'mongoose.centaur', find: {'symbol':'symbol'}, to: 'rs4'})
{
        "cause" : {
                "chunkTooBig" : true,
                "estimatedChunkSize" : 174746000,
                "errmsg" : "chunk too big to move",
                "ok" : 0
        },
        "ok" : 0,
        "errmsg" : "move failed"
}

166MB doesn't seem all that big - some of our data are 1GB (uncompressed) and we are following the recommendation of storing all the chunks on one shard (so a query hits one shard only by default). Is there any way to raise the moveChunk limit? Failing that I guess I can dump and re-insert the data.

Comment by Eliot Horowitz (Inactive) [ 11/Apr/13 ]

Can you attach a mongodump of the entire config database to SUPPORT-534?

Comment by James Blackburn [ 11/Apr/13 ]

Interestingly number of chunks has increased by a few, but still no balance for this collection:

        {  "_id" : "mongoose",  "partitioned" : true,  "primary" : "rs0" }
...
                mongoose.centaur chunks:
                                rs4     33
                                rs1     124
                                rs3     35
                                rs0     168
                                rs2     34
                        too many chunks to print, use verbose if you want to force print
...

Anything else I can look at to try to get to the bottom of why this collection doesn't balance?

Comment by James Blackburn [ 08/Apr/13 ]

Thanks both. Changelog uploaded to: SUPPORT-534

Comment by Stennie Steneker (Inactive) [ 07/Apr/13 ]

Hi James,

Can you create an issue in the Community Private project with the changelog zip attached:
https://jira.mongodb.org/browse/SUPPORT

Please also reference this SERVER issue in the description.

Correspondence and attachments in Community Private can only be viewed by yourself and 10gen support.

Thanks,
Stephen

Comment by James Blackburn [ 07/Apr/13 ]

Sure. Is there any way to send / attach the file privately?

Comment by Eliot Horowitz (Inactive) [ 07/Apr/13 ]

Can you send the contents of the changelog collection on the config servers?
A mongodump zipped of the whole thing would be fine.

Comment by James Blackburn [ 07/Apr/13 ]

Looks like nothing has happened in mongoose.centaur since yesterday, the chunk distribution is the same...

Is there anything else you wanted from sh.status()? Anything else I could look at?

Comment by James Blackburn [ 06/Apr/13 ]

The balancer does appear to be working. As I drop and re-insert data (as detailed in SERVER-9243). I see that the data is ending up well partitioned.

  • sh.status() does seem to show most DBs are well balanced. But For mongoose.centaur, I see that it isn't balanced:
    mongoose.centaur chunks:
    rs4 32
    rs1 122
    rs3 33
    rs0 165
    rs2 32
    too many chunks to print, use verbose if you want to force print
    mongoose.month chunks:
  • MMS says we have 126 mongos's. Though counting in my head, I think we have 102.
  • ntpd is running on all the nodes. Looping over them, the times seem identical (within a second or so)

 
For a database I dropped, enabled sharding, and restored, I see that the data ends up balanced well:
print_db_stats(db_name)
rs0: 
  Data: 0.745512861758G
  Storage: 0.897743225098G
  File: 3.9365234375G
rs1: 
  Data: 0.818145480007G
  Storage: 1.08492279053G
  File: 3.9365234375G
rs2: 
  Data: 0.560761421919G
  Storage: 0.586483001709G
  File: 1.9375G
rs3: 
  Data: 1.08872801065G
  Storage: 2.71416091919G
  File: 5.935546875G
rs4: 
  Data: 0.763759989291G
  Storage: 1.02611160278G
  File: 3.9365234375G
Total: 
  Data: 0.763759989291G
  Storage: 6.30942153931G
  File: 19.6826171875G

print_db_stats:

def print_db_stats(db_name):
    db = c[db_name]
    results = db.command('dbstats')
 
    for x in sorted(results['raw']):
        print x[:3] + ': '
        print "  Data: %sG" % (results['raw'][x]['dataSize'] / 1024. / 1024 / 1024)
        print "  Storage: %sG" % (results['raw'][x]['storageSize'] / 1024. / 1024 / 1024)
        print "  File: %sG" % (results['raw'][x]['fileSize'] / 1024. / 1024 / 1024)
    print "Total: "
    print "  Data: %sG" % (results['raw'][x]['dataSize'] / 1024. / 1024 / 1024)
    print "  Storage: %sG" % (results['storageSize'] / 1024. / 1024 / 1024)
    print "  File: %sG" % (results['fileSize'] / 1024. / 1024 / 1024)

I also notice, that as the balancer works, count() returns different answers (the right answer is 30076):

c[db_name].centaur.count()
Out[326]: 30076
c[db_name].centaur.count()
Out[327]: 30268
c[db_name].centaur.count()
Out[328]: 30448
c[db_name].centaur.count()
Out[330]: 30076
c[db_name].centaur.count()
Out[331]: 30076
c[db_name].centaur.count()
Out[332]: 30076
c[db_name].centaur.count()
Out[335]: 30448
c[db_name].centaur.count()
Out[336]: 30451
c[db_name].centaur.count()
Out[337]: 30268
c[db_name].centaur.count()
Out[338]: 30076
c[db_name].centaur.count()
Out[339]: 30076

Comment by Eliot Horowitz (Inactive) [ 06/Apr/13 ]

Can you send sh.status()

How many mongos do you have?

Can you check that the clocks are synchronized and running ntp on all the machines?

Comment by James Blackburn [ 06/Apr/13 ]

The balancer doesn't seem to be doing anything this morning, I notice errors like this in a mongos log:

Sat Apr  6 09:12:51 [Balancer] could not acquire lock 'balancer/dlonapahls240.maninvestments.com:27119:1365027159:1804289383' (another update won)
Sat Apr  6 09:12:51 [Balancer] distributed lock 'balancer/dlonapahls240.maninvestments.com:27119:1365027159:1804289383' was not acquired.
Sat Apr  6 09:12:55 [LockPinger] cluster cn53:27117,cn54:27117,cn55:27117 pinged successfully at Sat Apr  6 09:12:55 2013 by distributed lock pinger 'cn53:27117,cn54:27117,cn55:27117/dlonapahls240.maninvestments.com:27119:1365027159:1804289383', sleeping for 30000ms
Sat Apr  6 09:12:57 [Balancer] warning: distributed lock 'balancer/dlonapahls240.maninvestments.com:27119:1365027159:1804289383 did not propagate properly. :: caused by :: 8017 update not consistent  ns: config.locks query: { _id: "balancer", state: 0, ts: ObjectId('515fd8fc7ba325b3a176a37a') } update: { $set: { state: 1, who: "dlonapahls240.maninvestments.com:27119:1365027159:1804289383:Balancer:846930886", process: "dlonapahls240.maninvestments.com:27119:1365027159:1804289383", when: new Date(1365235977278), why: "doing balance round", ts: ObjectId('515fd90988a47d431b9ce044') } } gle1: { updatedExisting: false, n: 0, connectionId: 1249, waited: 0, err: null, ok: 1.0 } gle2: { updatedExisting: true, n: 1, connectionId: 2502, waited: 0, err: null, ok: 1.0 }
Sat Apr  6 09:12:57 [Balancer] lock update lost, lock 'balancer/dlonapahls240.maninvestments.com:27119:1365027159:1804289383' not propagated.
Sat Apr  6 09:12:57 [Balancer] distributed lock 'balancer/dlonapahls240.maninvestments.com:27119:1365027159:1804289383' was not acquired.
Sat Apr  6 09:13:03 [Balancer] could not acquire lock 'balancer/dlonapahls240.maninvestments.com:27119:1365027159:1804289383' (another update won)

Generated at Thu Feb 08 03:19:52 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.