[SERVER-12862] moveChunk fails if donor has empty field name indexes Created: 24/Feb/14  Updated: 02/May/14  Resolved: 14/Mar/14

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: 2.6.0-rc0
Fix Version/s: None

Type: Bug Priority: Minor - P4
Reporter: Valeri Karpov Assignee: Randolph Tan
Resolution: Done Votes: 0
Labels: 26qa
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: File createShardedCollection.js     File test_movechunk.js    
Issue Links:
Depends
Related
related to DOCS-3126 add to compat moveChunk fail w empty ... Closed
Operating System: ALL
Steps To Reproduce:

Run test_movechunk.js through either smoke.py or just a plain old shell. I've only tested this on OSX.

Participants:

 Description   

See the attached scripts. The offending counts that cause the below error are in test_movechunk.js:L45-46

At a high level, the function in test_movechunk.js that causes this, `testDollarPrefix()`, creates 800 documents, { '$x' : i } for i from -400 to 400, and tries to do a movechunk from a 2.4 mongod to a 2.6 mongod and back, and asserts that the cluster and individual shards have the correct document count. The error text is as below:

 m30001| 2014-02-24T12:27:55.196-0500 [migrateThread] warning: cannot remove pending chunk [{ _id: MinKey }, { _id: -300.0 }), this shard does not contain the chunk
 m30001| 2014-02-24T12:27:55.196-0500 [migrateThread] warning: cannot remove pending chunk [{ _id: MinKey }, { _id: -300.0 }), this shard does not contain the chunk
 m30000| Mon Feb 24 12:27:57.452 [conn1] assertion 13388 [test.emptyfield] shard version not ok in Client::Context: this shard contains versioned chunks for test.emptyfield, but no version set in request ( ns : test.emptyfield, received : 0|0||000000000000000000000000, wanted : 3|0||530b81115a14bccd6ce91c33, send ) ( ns : test.emptyfield, received : 0|0||000000000000000000000000, wanted : 3|0||530b81115a14bccd6ce91c33, send ) ns:test.$cmd query:{ count: "emptyfield", query: {}, fields: {} }
 m30000| Mon Feb 24 12:27:57.452 [conn1]  ntoskip:0 ntoreturn:-1
 m30000| Mon Feb 24 12:27:57.452 [conn1] stale version detected during query over test.$cmd : { $err: "[test.emptyfield] shard version not ok in Client::Context: this shard contains versioned chunks for test.emptyfield, but no version set in request ( n...", code: 13388, ns: "test.emptyfield", vReceived: Timestamp 0|0, vReceivedEpoch: ObjectId('000000000000000000000000'), vWanted: Timestamp 3000|0, vWantedEpoch: ObjectId('530b81115a14bccd6ce91c33') }
2014-02-24T12:27:57.454-0500 Error: stale config on lazy receive :: caused by :: $err: "[test.emptyfield] shard version not ok in Client::Context: this shard contains versioned chunks for test.emptyfield, but no version set in request ( n..." ( ns : test.emptyfield, received : 0|0||000000000000000000000000, wanted : 3|0||530b81115a14bccd6ce91c33, recv ) at src/mongo/shell/collection.js:55
failed to load: /Users/vkarpov/qa/QA/QA-418/test_movechunk.js

Let me know if y'all need any more info.



 Comments   
Comment by Randolph Tan [ 20/Mar/14 ]

The moveChunk was failing because it will complain once it tries to copy the empty field index over since it is disallowed in v2.6. We had a discussion here and decided not to fix this, but tell warn users about this in our documentation. We should also encourage users to run the upgrade checker since it will catch this issue.

Comment by Randolph Tan [ 19/Mar/14 ]

valeri.karpov I can reproduce the failure. I modified the script locally to run purely on 2.6 and I forgot to change it back. I think the moveChunk should not fail.

Comment by Valeri Karpov [ 17/Mar/14 ]

renctan You're right that removing the count queries against the individual shards fixes the state version issue, and the setShardVersion explanation makes sense. Thanks for looking into that.

1 outstanding issue: the moveChunk command on L86 is still failing for me. Does it succeed for you?

Comment by Randolph Tan [ 14/Mar/14 ]

Investigated the test further and realized that it was actually invalid. The stale config exception was actually coming from the count command (L90, don't forget to convert L86 to assert ok: 1) using the direct shard connection, totally bypassing mongos. It triggered the stale config exception because the connection has its version set at L72 and clients shouldn't be setting the version unless they are prepared to manage the connection versions like mongos.

Comment by Randolph Tan [ 11/Mar/14 ]

Was able to reproduce the stale version error after running testEmptyFieldIndex(). Fails both on master and v2.4.

Comment by Valeri Karpov [ 11/Mar/14 ]

Hmm I'm getting an assertion error when I make the changes you suggested, Randolph. Still getting the stale version error, on both OSX and Ubuntu 12.04. Are you running against HEAD? Because I'm getting this when running against RC1 with suggested changes...

 m30999| 2014-03-11T20:16:56.363+0000 [conn1] moveChunk result: { cause: { active: true, ns: "test.emptyfield", from: "localhost:30000", min: { _id: 0.0 }, max: { _id: 100.0 }, shardKeyPattern: { _id: 1.0 }, state: "fail", errmsg: "", counts: { cloned: 0, clonedBytes: 0, catchup: 0, steady: 0 }, ok: 1.0 }, ok: 0.0, errmsg: "data transfer error" }
assert: [0] != [1] are not equal : undefined
Error: [0] != [1] are not equal : undefined
    at Error (<anonymous>)
    at doassert (src/mongo/shell/assert.js:11:14)
    at Function.assert.eq (src/mongo/shell/assert.js:38:5)
    at testEmptyFieldIndex (test_movechunk.js:86:9)
    at test_movechunk.js:133:1
2014-03-11T20:16:56.367+0000 Error: [0] != [1] are not equal : undefined at src/mongo/shell/assert.js:13
failed to load: test_movechunk.js

Comment by Randolph Tan [ 10/Mar/14 ]

Was not able to reproduce the "stale version detected" error. However, the test script passes on 2.6 mongos 2.6 mongod (with some modifications (1) L42 should assert ok: 1 (2) L46-L47 should assert that docs are evenly split). And fails on 2.6 mongos and 2.4.9 mongod. This is because 2.4 mongod doesn't validate the resulting document from the $rename and ends up creating a doc with field name prefixed with a $, which is illegal. 2.6 mongod disallows this so it was able to move the chunks without any problems.

Comment by Valeri Karpov [ 24/Feb/14 ]

scotthernandez the issue is that the count command fails, which I believe qualifies as a user facing error. The log output is just there for completeness' sake.

Comment by Scott Hernandez (Inactive) [ 24/Feb/14 ]

Is this a user facing error or just log stuff?

If a user error, please post that specifically.

Generated at Thu Feb 08 03:29:51 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.