[SERVER-12862] moveChunk fails if donor has empty field name indexes Created: 24/Feb/14 Updated: 02/May/14 Resolved: 14/Mar/14 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Sharding |
| Affects Version/s: | 2.6.0-rc0 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Minor - P4 |
| Reporter: | Valeri Karpov | Assignee: | Randolph Tan |
| Resolution: | Done | Votes: | 0 |
| Labels: | 26qa | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Attachments: |
|
||||||||||||
| Issue Links: |
|
||||||||||||
| Operating System: | ALL | ||||||||||||
| Steps To Reproduce: | Run test_movechunk.js through either smoke.py or just a plain old shell. I've only tested this on OSX. |
||||||||||||
| Participants: | |||||||||||||
| Description |
|
See the attached scripts. The offending counts that cause the below error are in test_movechunk.js:L45-46 At a high level, the function in test_movechunk.js that causes this, `testDollarPrefix()`, creates 800 documents, { '$x' : i } for i from -400 to 400, and tries to do a movechunk from a 2.4 mongod to a 2.6 mongod and back, and asserts that the cluster and individual shards have the correct document count. The error text is as below:
Let me know if y'all need any more info. |
| Comments |
| Comment by Randolph Tan [ 20/Mar/14 ] | ||||||||||
|
The moveChunk was failing because it will complain once it tries to copy the empty field index over since it is disallowed in v2.6. We had a discussion here and decided not to fix this, but tell warn users about this in our documentation. We should also encourage users to run the upgrade checker since it will catch this issue. | ||||||||||
| Comment by Randolph Tan [ 19/Mar/14 ] | ||||||||||
|
valeri.karpov I can reproduce the failure. I modified the script locally to run purely on 2.6 and I forgot to change it back. I think the moveChunk should not fail. | ||||||||||
| Comment by Valeri Karpov [ 17/Mar/14 ] | ||||||||||
|
renctan You're right that removing the count queries against the individual shards fixes the state version issue, and the setShardVersion explanation makes sense. Thanks for looking into that. 1 outstanding issue: the moveChunk command on L86 is still failing for me. Does it succeed for you? | ||||||||||
| Comment by Randolph Tan [ 14/Mar/14 ] | ||||||||||
|
Investigated the test further and realized that it was actually invalid. The stale config exception was actually coming from the count command (L90, don't forget to convert L86 to assert ok: 1) using the direct shard connection, totally bypassing mongos. It triggered the stale config exception because the connection has its version set at L72 and clients shouldn't be setting the version unless they are prepared to manage the connection versions like mongos. | ||||||||||
| Comment by Randolph Tan [ 11/Mar/14 ] | ||||||||||
|
Was able to reproduce the stale version error after running testEmptyFieldIndex(). Fails both on master and v2.4. | ||||||||||
| Comment by Valeri Karpov [ 11/Mar/14 ] | ||||||||||
|
Hmm I'm getting an assertion error when I make the changes you suggested, Randolph. Still getting the stale version error, on both OSX and Ubuntu 12.04. Are you running against HEAD? Because I'm getting this when running against RC1 with suggested changes...
| ||||||||||
| Comment by Randolph Tan [ 10/Mar/14 ] | ||||||||||
|
Was not able to reproduce the "stale version detected" error. However, the test script passes on 2.6 mongos 2.6 mongod (with some modifications (1) L42 should assert ok: 1 (2) L46-L47 should assert that docs are evenly split). And fails on 2.6 mongos and 2.4.9 mongod. This is because 2.4 mongod doesn't validate the resulting document from the $rename and ends up creating a doc with field name prefixed with a $, which is illegal. 2.6 mongod disallows this so it was able to move the chunks without any problems. | ||||||||||
| Comment by Valeri Karpov [ 24/Feb/14 ] | ||||||||||
|
scotthernandez the issue is that the count command fails, which I believe qualifies as a user facing error. The log output is just there for completeness' sake. | ||||||||||
| Comment by Scott Hernandez (Inactive) [ 24/Feb/14 ] | ||||||||||
|
Is this a user facing error or just log stuff? If a user error, please post that specifically. |