Details
-
Bug
-
Resolution: Done
-
Critical - P2
-
None
-
2.2.3
-
None
-
ALL
Description
We have a shard that is nearly full, while other more recently added shards are nearly empty. I was wondering why a particularly large sharded collection is much larger on two shards than others:
c.mongoose.command('collStats', 'centaur')
|
Out[51]:
|
{u'ns': u'mongoose.centaur',
|
u'sharded': True,
|
u'shards': {u'rs0':
|
u'size': 152076069392.0,
|
u'storageSize': 154752093200.0},
|
u'rs1': {
|
u'size': 114195152272.0,
|
u'storageSize': 116004875472.0},
|
u'rs2': {
|
u'size': 13189608896.0,
|
u'storageSize': 13481529296.0,},
|
u'rs3': {
|
u'size': 11661895416.0,
|
u'storageSize': 13594558416.0,},
|
u'rs4': {
|
u'size': 7642675920.0,
|
u'storageSize': 7859703808.0},
|
u'size': 298765401896.0,
|
u'storageSize': 305692760192.0,
|
u'totalIndexSize': 434202832}
|
Data is 10x the size on rs0/rs1 than 2,3,4
Looking at the config DB logs:
mongos> db.changelog.find({ns: "mongoose.centaur"}).sort({time: -1}).pretty()
|
{
|
"_id" : "dlonapahls211.maninvestments.com-2013-04-04T09:30:32-8",
|
"server" : "dlonapahls211.maninvestments.com",
|
"clientAddr" : "10.220.54.3:47785",
|
"time" : ISODate("2013-04-04T09:30:32.119Z"),
|
"what" : "split",
|
"ns" : "mongoose.centaur",
|
"details" : {
|
The last trace for mongoose.centaur is 36 hours ago.
If I look in config.locks, I see:
mongos> db.locks.find({_id: "mongoose.centaur"}).pretty()
|
{
|
"_id" : "mongoose.centaur",
|
"process" : "dlonapahls211.maninvestments.com:27118:1365067417:1678333556",
|
"state" : 0,
|
"ts" : ObjectId("515d48375b3b8c1d49d2b605"),
|
"when" : ISODate("2013-04-04T09:30:31.958Z"),
|
"who" : "dlonapahls211.maninvestments.com:27118:1365067417:1678333556:conn575:189782212",
|
"why" : "split-{ symbol: \"thingy\" }"
|
}
|
There are many 'old' - some from February, March - in this collection in state 0. It's not clear to me what state 0 is, and whether this is important...
On dlonapahls211's mongos log, for 2013-04-04T09:30:31.958 I see:
Thu Apr 4 09:29:10 [Balancer] distributed lock 'balancer/dlonapahls211.maninvestments.com:27119:1365027221:1804289383' acquired, ts : 515d39d60c29edf10f5bf0e2
|
Thu Apr 4 09:29:10 [Balancer] distributed lock 'balancer/dlonapahls211.maninvestments.com:27119:1365027221:1804289383' unlocked.
|
Thu Apr 4 09:29:17 [Balancer] distributed lock 'balancer/dlonapahls211.maninvestments.com:27119:1365027221:1804289383' acquired, ts : 515d39dc0c29edf10f5bf0e3
|
Thu Apr 4 09:29:17 [Balancer] distributed lock 'balancer/dlonapahls211.maninvestments.com:27119:1365027221:1804289383' unlocked.
|
Thu Apr 4 09:29:46 [LockPinger] cluster cn53:27117,cn54:27117,cn55:27117 pinged successfully at Thu Apr 4 09:29:46 2013 by distributed lock pinger 'cn53:27117,cn54:27117,cn55:27117/dlonapahls211.maninvestments.com:27119:1365027221:1804289383', sleeping for 30000ms
|
Thu Apr 4 09:30:11 [Balancer] distributed lock 'balancer/dlonapahls211.maninvestments.com:27119:1365027221:1804289383' acquired, ts : 515d3a130c29edf10f5bf0e4
|
Thu Apr 4 09:30:11 [Balancer] distributed lock 'balancer/dlonapahls211.maninvestments.com:27119:1365027221:1804289383' unlocked.
|
Thu Apr 4 09:30:30 [Balancer] could not acquire lock 'balancer/dlonapahls211.maninvestments.com:27119:1365027221:1804289383' (another update won)
|
Thu Apr 4 09:30:30 [Balancer] distributed lock 'balancer/dlonapahls211.maninvestments.com:27119:1365027221:1804289383' was not acquired.
|
Thu Apr 4 09:30:42 [Balancer] could not acquire lock 'balancer/dlonapahls211.maninvestments.com:27119:1365027221:1804289383' (another update won)
|
Thu Apr 4 09:30:42 [Balancer] distributed lock 'balancer/dlonapahls211.maninvestments.com:27119:1365027221:1804289383' was not acquired.
|
Thu Apr 4 09:31:54 [Balancer] warning: distributed lock 'balancer/dlonapahls211.maninvestments.com:27119:1365027221:1804289383 did not propagate properly. :: caused by :: 8017 update not consistent ns: config.locks query: { _id: "balancer", state: 0, ts: ObjectId('515d3a7ab07f340a2aafd3bf') } update: { $set: { state: 1, who: "dlonapahls211.maninvestments.com:27119:1365027221:1804289383:Balancer:846930886", process: "dlonapahls211.maninvestments.com:27119:1365027221:1804289383", when: new Date(1365064314764), why: "doing balance round", ts: ObjectId('515d3a7a0c29edf10f5bf0e7') } } gle1: { updatedExisting: true, n: 1, connectionId: 774, waited: 34, err: null, ok: 1.0 } gle2: { updatedExisting: false, n: 0, connectionId: 813, waited: 34, err: null, ok: 1.0 }
|
Thu Apr 4 09:31:54 [Balancer] lock update lost, lock 'balancer/dlonapahls211.maninvestments.com:27119:1365027221:1804289383' not propagated.
|
Thu Apr 4 09:31:54 [Balancer] distributed lock 'balancer/dlonapahls211.maninvestments.com:27119:1365027221:1804289383' was not acquired.
|
Thu Apr 4 09:32:17 [LockPinger] trying to delete 1 old lock entries for process dlonapahls211.maninvestments.com:27119:1365027221:1804289383
|
Thu Apr 4 09:32:17 [LockPinger] handled late remove of old distributed lock with ts 515d3a7a0c29edf10f5bf0e7
|
Thu Apr 4 09:32:36 [Balancer] distributed lock 'balancer/dlonapahls211.maninvestments.com:27119:1365027221:1804289383' acquired, ts : 515d3aa40c29edf10f5bf0e8
|
Thu Apr 4 09:32:37 [Balancer] distributed lock 'balancer/dlonapahls211.maninvestments.com:27119:1365027221:1804289383' unlocked.
|
Any ideas on what the issue might be? How do I go about getting mongoose.centaur to be balanced again?
Is there any way to see which collections Mongo thinks are eligible for balancing - i.e. how can I tell if it will eventually balance vs. it will never balance?