[SERVER-11694] Balancer not working Created: 14/Nov/13  Updated: 10/Dec/14  Resolved: 19/Mar/14

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: 2.4.1
Fix Version/s: None

Type: Bug Priority: Blocker - P1
Reporter: Dharshan Rangegowda Assignee: Unassigned
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Operating System: ALL
Participants:

 Description   

I have a two shard setup (shard0, shard1) on AWS. I went ahead and added a new shard (shard2). However the balancer is not able to move chunks to the new shards

I see a bunch of errors in the log - "migration already in progress"
Thu Nov 14 00:29:50.406 [Balancer] ns: chat.chats_development going to move { _id: "chat.chats_development-target_pid_MinKey", lastmod: Timestamp 2000|2, la
stmodEpoch: ObjectId('5282724e57add7686f91ac12'), ns: "chat.chats_development", min:

{ target_pid: MinKey }

, max:

{ target_pid: -4611686018427387902 }

, shard
: "Shard-0" } from: Shard-0 to: Shard-2 tag []
Thu Nov 14 00:29:50.408 [mongosMain] connection accepted from 10.84.154.211:43907 #24556 (34 connections now open)
Thu Nov 14 00:29:50.408 [conn24556] end connection 10.84.154.211:43907 (33 connections now open)
Thu Nov 14 00:29:50.409 [Balancer] ns: chat.chats_staging going to move { _id: "chat.chats_staging-target_pid_MinKey", lastmod: Timestamp 2000|2, lastmodEpo
ch: ObjectId('5282725857add7686f91ac14'), ns: "chat.chats_staging", min:

{ target_pid: MinKey }

, max:

{ target_pid: -4611686018427387902 }

, shard: "Shard-0"
} from: Shard-0 to: Shard-2 tag []
Thu Nov 14 00:29:50.409 [Balancer] moving chunk ns: chat.chats_production moving ( ns:chat.chats_productionshard: Shard-0:Shard-0/SG-m1largetest2-1522.server
s.mongodirector.com:27017,SG-m1largetest2-1523.servers.mongodirector.com:27017lastmod: 3|8||000000000000000000000000min:

{ target_pid: -6914850372026113762 }

max:

{ target_pid: -5776441268022973039 }

) Shard-0:Shard-0/SG-m1largetest2-1522.servers.mongodirector.com:27017,SG-m1largetest2-1523.servers.mongodirector.co
m:27017 -> Shard-2:Shard-2/SG-m1largetest2-1534.servers.mongodirector.com:27017,SG-m1largetest2-1535.servers.mongodirector.com:27017
Thu Nov 14 00:29:50.409 [mongosMain] connection accepted from 10.119.39.164:56401 #24557 (34 connections now open)
Thu Nov 14 00:29:50.411 [conn24557] end connection 10.119.39.164:56401 (33 connections now open)
Thu Nov 14 00:29:50.422 [mongosMain] connection accepted from 10.78.134.95:53117 #24558 (34 connections now open)
Thu Nov 14 00:29:50.422 [conn24558] end connection 10.78.134.95:53117 (33 connections now open)
Thu Nov 14 00:29:50.425 [mongosMain] connection accepted from 10.37.15.212:33571 #24559 (34 connections now open)
Thu Nov 14 00:29:50.426 [conn24559] end connection 10.37.15.212:33571 (33 connections now open)
Thu Nov 14 00:29:50.507 [Balancer] moveChunk result:

{ ok: 0.0, errmsg: "migration already in progress" }

Thu Nov 14 00:29:50.508 [Balancer] balancer move failed:

{ ok: 0.0, errmsg: "migration already in progress" }

from: Shard-0 to: Shard-2 chunk: min:

{ target _pid: -6914850372026113762 }

max:

{ target_pid: -5776441268022973039 }

Thu Nov 14 00:29:50.508 [Balancer] moving chunk ns: chat.chats_development moving ( ns:chat.chats_developmentshard: Shard-0:Shard-0/SG-m1largetest2-1522.serv
ers.mongodirector.com:27017,SG-m1largetest2-1523.servers.mongodirector.com:27017lastmod: 2|2||000000000000000000000000min:

{ target_pid: MinKey }

max:

{ targe t_pid: -4611686018427387902 }

) Shard-0:Shard-0/SG-m1largetest2-1522.servers.mongodirector.com:27017,SG-m1largetest2-1523.servers.mongodirector.com:27017 -> S
hard-2:Shard-2/SG-m1largetest2-1534.servers.mongodirector.com:27017,SG-m1largetest2-1535.servers.mongodirector.com:27017
Thu Nov 14 00:29:50.575 [mongosMain] connection accepted from 10.98.227.148:37905 #24560 (34 connections now open)
Thu Nov 14 00:29:50.577 [conn24560] end connection 10.98.227.148:37905 (33 connections now open)
Thu Nov 14 00:29:50.577 [mongosMain] connection accepted from 10.12.191.175:34776 #24561 (34 connections now open)
Thu Nov 14 00:29:50.579 [conn24561] end connection 10.12.191.175:34776 (33 connections now open)
Thu Nov 14 00:29:50.613 [Balancer] moveChunk result:

{ ok: 0.0, errmsg: "migration already in progress" }

Thu Nov 14 00:29:50.614 [Balancer] balancer move failed:

{ ok: 0.0, errmsg: "migration already in progress" }

from: Shard-0 to: Shard-2 chunk: min:

{ target _pid: MinKey }

max:

{ target_pid: -4611686018427387902 }

Thu Nov 14 00:29:50.614 [Balancer] moving chunk ns: chat.chats_staging moving ( ns:chat.chats_stagingshard: Shard-0:Shard-0/SG-m1largetest2-1522.servers.mong
odirector.com:27017,SG-m1largetest2-1523.servers.mongodirector.com:27017lastmod: 2|2||000000000000000000000000min:

{ target_pid: MinKey }

max:

{ target_pid: - 4611686018427387902 }

) Shard-0:Shard-0/SG-m1largetest2-1522.servers.mongodirector.com:27017,SG-m1largetest2-1523.servers.mongodirector.com:27017 -> Shard-2:S
hard-2/SG-m1largetest2-1534.servers.mongodirector.com:27017,SG-m1largetest2-1535.servers.mongodirector.com:27017
Thu Nov 14 00:29:50.718 [Balancer] moveChunk result:

{ ok: 0.0, errmsg: "migration already in progress" }

Thu Nov 14 00:29:50.719 [Balancer] balancer move failed:

{ ok: 0.0, errmsg: "migration already in progress" }

from: Shard-0 to: Shard-2 chunk: min:

{ target _pid: MinKey }

max:

{ target_pid: -4611686018427387902 }

Here is the output of locks collection
mongos> db.locks.find();

{ "_id" : "configUpgrade", "process" : "ip-10-157-51-190:27017:1384231658:1804289383", "state" : 0, "ts" : ObjectId("5281b2eab4966fecfd6e633d"), "when" : ISODate("2013-11-12T04:47:38.349Z"), "who" : "ip-10-157-51-190:27017:1384231658:1804289383:mongosMain:846930886", "why" : "upgrading config database to new format v4" } { "_id" : "balancer", "process" : "ip-10-157-51-190:27017:1384387524:1804289383", "state" : 0, "ts" : ObjectId("52841a15bac893571dc4c0b7"), "when" : ISODate("2013-11-14T00:32:21.869Z"), "who " : "ip-10-157-51-190:27017:1384387524:1804289383:Balancer:846930886", "why" : "doing balance round" }

{ "_id" : "chat.chats_production", "process" : "ip-10-83-63-165:27017:1384280537:396106300", "state" : 2, "ts" : ObjectId("5283de1fd365b49a9f02596d"), "when" : ISODate("2013-11-13T20:16:31.1
92Z"), "who" : "ip-10-83-63-165:27017:1384280537:396106300:conn9740:778452206", "why" : "migrate-

{ target_pid: -6914850372026113762 }

" }
{ "_id" : "chat.chats_development", "process" : "ip-10-181-159-145:27017:1384280542:1450290529", "state" : 0, "ts" : ObjectId("5282725074d386569e867f12"), "when" : ISODate("2013-11-12T18:24:
16.705Z"), "who" : "ip-10-181-159-145:27017:1384280542:1450290529:conn35:1960032425", "why" : "split-

{ target_pid: 0 }

" }
{ "_id" : "chat.chats_staging", "process" : "ip-10-181-159-145:27017:1384280542:1450290529", "state" : 0, "ts" : ObjectId("5282725a74d386569e867f15"), "when" : ISODate("2013-11-12T18:24:26.3
89Z"), "who" : "ip-10-181-159-145:27017:1384280542:1450290529:conn35:1960032425", "why" : "split-

{ target_pid: 0 }

" }



 Comments   
Comment by Stennie Steneker (Inactive) [ 19/Mar/14 ]

Hi Dharshan,

I'm going to close this issue due to inactivity.

I would also note that the SERVER project is intended for reporting bugs or feature suggestions for the MongoDB server.

For MongoDB-related support discussion please post on the mongodb-users group (http://groups.google.com/group/mongodb-user) or ask specific questions on Stack Overflow / ServerFault.

Thanks,
Stephen

Comment by Stennie Steneker (Inactive) [ 16/Dec/13 ]

Hi Dharshan,

Are you still experiencing this issue with your balancer?

If so, in order to progress investigation on this issue we will need the further information requested:

  • output of db.currentOp() from each primary
  • mongod logs from the original shard

Thanks,
Stephen

Comment by Eliot Horowitz (Inactive) [ 25/Nov/13 ]

Can you login onto each primary and send the output of db.currentOp()

Also, if you could send the mongod logs from the original shard, that would be great.

Generated at Thu Feb 08 03:26:30 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.