[SERVER-933] move migrate handling to mognod WAS: temp oplog issues WAS: Assertion error when moving chunks Created: 09/Oct/09 Updated: 12/Jul/16 Resolved: 02/Jul/10 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Sharding |
| Affects Version/s: | None |
| Fix Version/s: | 1.5.4 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Matthew Foemmel | Assignee: | Eliot Horowitz (Inactive) |
| Resolution: | Done | Votes: | 1 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: |
mongodb 1.1.1, java driver 0.11, ubuntu 9.04 |
||
| Participants: |
| Description |
|
I've got an 8-machine mongodb cluster set up, with 2 mongodb servers running on each machine (so 16 mongod processes total, plus a config server running on a 9th machine). I've enabled sharding on a collection, using the "time" field as a key (I initially tried sharding on the _id field, but when I did that the chunks would never split i.e. no matter how much data I inserted, there was only ever one entry in config.chunks). When I start inserting data into this collection, I can see in config.chunks that additional chunks are being created on different machines throughout the cluster, as expected. But after a while, it stops creating chunks on new shards and just keeps creating chunks on the same shard over and over. From looking at the logs it seems like its trying to move chunks and failing, but I'm inserting things in order (i.e. the "time" field is always the current time), so I'm not sure why it's even trying to move chunks (shouldn't it just create a new chunk and start inserting things there?). Here's what I'm seeing in the mongos log: Fri Oct 9 18:13:28 autosplitting test.foo size: 62662123 shard: shard ns:test.foo shard: mongod1:10001 min: { time: MinKey }max: { time: MaxKey }... max: { time: new Date(1255112014526) } to: mongod2.com:10000 #objcets: 0 max: { time: new Date(1255112014526) }mongod1:10001 -> mongod2:10000 And after a while: Fri Oct 9 18:28:32 Assertion: moveAndCommit failed: movechunk.start failed: { errmsg: "", ok: 0.0 }Fri Oct 9 18:28:32 UserException: moveAndCommit failed: movechunk.start failed: { errmsg: "", ok: 0.0 } |
| Comments |
| Comment by auto [ 02/Jul/10 ] |
|
Author: {'login': 'erh', 'name': 'Eliot Horowitz', 'email': 'eliot@10gen.com'}Message: fix a migrate race condition |
| Comment by auto [ 02/Jul/10 ] |
|
Author: {'login': 'erh', 'name': 'Eliot Horowitz', 'email': 'eliot@10gen.com'}Message: finish first cut at migrate in mognod and lots of fixes |
| Comment by Eliot Horowitz (Inactive) [ 14/Jun/10 ] |
|
being in mongos doesn't handle unclean mongos shutdown |
| Comment by Eliot Horowitz (Inactive) [ 26/May/10 ] |
|
can't reproduce exactly - but going to be re-working that code in 1.5.3 |
| Comment by Eliot Horowitz (Inactive) [ 27/Apr/10 ] |
|
not easily reproducible right now - will work on more in 1.5.2 |
| Comment by Eliot Horowitz (Inactive) [ 09/Nov/09 ] |
|
I think the root problem for these cases has been solved - but there is still work to do around this issue. |
| Comment by Eliot Horowitz (Inactive) [ 09/Oct/09 ] |
|
Fri Oct 9 18:23:17 movechunk.start res: { errmsg: "collection already exists", ok: 0.0 }... |
| Comment by Eliot Horowitz (Inactive) [ 09/Oct/09 ] |
|
can you send logs of mongod1 and mongod2 ? it technically moves a new shard b/c its hard to exactly what users are doing - its just optimized so it moves 0 data |