[SERVER-933] move migrate handling to mognod WAS: temp oplog issues WAS: Assertion error when moving chunks Created: 09/Oct/09  Updated: 12/Jul/16  Resolved: 02/Jul/10

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: None
Fix Version/s: 1.5.4

Type: Bug Priority: Major - P3
Reporter: Matthew Foemmel Assignee: Eliot Horowitz (Inactive)
Resolution: Done Votes: 1
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

mongodb 1.1.1, java driver 0.11, ubuntu 9.04


Participants:

 Description   

I've got an 8-machine mongodb cluster set up, with 2 mongodb servers running on each machine (so 16 mongod processes total, plus a config server running on a 9th machine). I've enabled sharding on a collection, using the "time" field as a key (I initially tried sharding on the _id field, but when I did that the chunks would never split i.e. no matter how much data I inserted, there was only ever one entry in config.chunks).

When I start inserting data into this collection, I can see in config.chunks that additional chunks are being created on different machines throughout the cluster, as expected. But after a while, it stops creating chunks on new shards and just keeps creating chunks on the same shard over and over. From looking at the logs it seems like its trying to move chunks and failing, but I'm inserting things in order (i.e. the "time" field is always the current time), so I'm not sure why it's even trying to move chunks (shouldn't it just create a new chunk and start inserting things there?).

Here's what I'm seeing in the mongos log:

Fri Oct 9 18:13:28 autosplitting test.foo size: 62662123 shard: shard ns:test.foo shard: mongod1:10001 min:

{ time: MinKey }

max:

{ time: MaxKey }

...
Fri Oct 9 18:13:28 moving chunk (auto): shard ns:test.foo shard: mongod1:10001 min:

{ time: MinKey }

max:

{ time: new Date(1255112014526) }

to: mongod2.com:10000 #objcets: 0
Fri Oct 9 18:13:28 moving chunk ns: test.foo moving chunk: shard ns:test.foo shard: mongod1:10001 min:

{ time: MinKey }

max:

{ time: new Date(1255112014526) }

mongod1:10001 -> mongod2:10000

And after a while:

Fri Oct 9 18:28:32 Assertion: moveAndCommit failed: movechunk.start failed:

{ errmsg: "", ok: 0.0 }

Fri Oct 9 18:28:32 UserException: moveAndCommit failed: movechunk.start failed:

{ errmsg: "", ok: 0.0 }

 Comments   
Comment by auto [ 02/Jul/10 ]

Author:

{'login': 'erh', 'name': 'Eliot Horowitz', 'email': 'eliot@10gen.com'}

Message: fix a migrate race condition SERVER-933
http://github.com/mongodb/mongo/commit/b232b213bae815783359354fdfff3c22ec5e4125

Comment by auto [ 02/Jul/10 ]

Author:

{'login': 'erh', 'name': 'Eliot Horowitz', 'email': 'eliot@10gen.com'}

Message: finish first cut at migrate in mognod and lots of fixes SERVER-933
http://github.com/mongodb/mongo/commit/92e8ad86faaf3245893f309a7cae44dee9261108

Comment by Eliot Horowitz (Inactive) [ 14/Jun/10 ]

being in mongos doesn't handle unclean mongos shutdown

Comment by Eliot Horowitz (Inactive) [ 26/May/10 ]

can't reproduce exactly - but going to be re-working that code in 1.5.3

Comment by Eliot Horowitz (Inactive) [ 27/Apr/10 ]

not easily reproducible right now - will work on more in 1.5.2

Comment by Eliot Horowitz (Inactive) [ 09/Nov/09 ]

I think the root problem for these cases has been solved - but there is still work to do around this issue.

Comment by Eliot Horowitz (Inactive) [ 09/Oct/09 ]

Fri Oct 9 18:23:17 movechunk.start res:

{ errmsg: "collection already exists", ok: 0.0 }

...
Fri Oct 9 18:25:05 movechunk.start res: { errmsg: "logCollection failed:

{ errmsg: "Log already started for ns: test....", ok: 0.0 }
Comment by Eliot Horowitz (Inactive) [ 09/Oct/09 ]

can you send logs of mongod1 and mongod2 ?

it technically moves a new shard b/c its hard to exactly what users are doing - its just optimized so it moves 0 data

Generated at Thu Feb 08 02:55:34 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.