[SERVER-6902] MongoDB 2.2 and MongoDB 2.0 cannot be mixed in sharded cluster Created: 30/Aug/12  Updated: 11/Jul/16  Resolved: 10/Sep/12

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: 2.2.0
Fix Version/s: 2.2.1, 2.3.0

Type: Bug Priority: Critical - P2
Reporter: Remon van Vliet Assignee: Spencer Brody (Inactive)
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
Related
is related to SERVER-8279 Warn or do not allow the balancer to ... Closed
Participants:

 Description   

As opposed to what is implied in the changelog we had failures after adding shards to one of our cluster. When the balancer attempted to move 2.0 hosted chunks to a new 2.2 shard with the following errors :

"moveChunk failed to engage TO-shard in the data transfer: migrate already in progress"

The situation does not recover after restarts and is only resolved by downgrading back to 2.0 which immediately resolves the issue. Please change your release changelog accordingly.



 Comments   
Comment by Spencer Brody (Inactive) [ 20/Mar/13 ]

I just posted an answer on the google group thread, but yes, that should work.

Comment by Volodymyr Gren [ 20/Mar/13 ]

What about such case?:

https://groups.google.com/group/mongodb-user/browse_thread/thread/d05da9476804a95

Comment by auto [ 12/Sep/12 ]

Author:

{u'date': u'2012-09-05T15:26:06-07:00', u'email': u'spencer@10gen.com', u'name': u'Spencer T Brody'}

Message: Fix balancer when running mixed 2.2 and 2.0 shards. SERVER-6902

_recvChunkStart command on 2.2 expects a shardKeyPattern argument. 2.0 mongods
don't send that, which breaks migrations. This fixes this by assuming the shard
key has the same pattern as the range specifiers in Helpers::removeRange when
the shardKeyPattern isn't explicitly provided.
Branch: v2.2
https://github.com/mongodb/mongo/commit/6f5e9ad8d3ef5d7053b80748b96a3ca36cdae88e

Comment by auto [ 07/Sep/12 ]

Author:

{u'date': u'2012-09-05T15:26:06-07:00', u'name': u'Spencer T Brody', u'email': u'spencer@10gen.com'}

Message: Fix balancer when running mixed 2.2 and 2.0 shards. SERVER-6902

_recvChunkStart command on 2.2 expects a shardKeyPattern argument. 2.0 mongods
don't send that, which breaks migrations. This fixes this by assuming the shard
key has the same pattern as the range specifiers in Helpers::removeRange when
the shardKeyPattern isn't explicitly provided.
Branch: master
https://github.com/mongodb/mongo/commit/8fbfada4f4c87f837f459dfdf9b2142615d41a61

Comment by Remon van Vliet [ 31/Aug/12 ]

Data is recoverable by downgrading mongos instances as well it turns out.

Comment by Remon van Vliet [ 31/Aug/12 ]

["shard0001", "shard0002", "shard0003", "shard0004"]
Aborted due to exception: #<Mongo::OperationFailure: Database command 'removeshard' failed: (ok: '0.0'; errmsg: 'Can't have more than one draining shard at a time').>

This used to work in 2.0 and clearly doesn't with 2.2 mongos. It throws this error, still moves data but doesn't update the metadata so the data becomes inaccessible.

Comment by Remon van Vliet [ 31/Aug/12 ]

It seems like there are also problems with 2.0 shards (all of them) using 2.2 mongos processes. We just had issues with removeShard invokations losing the shard data! I'm currently trying to recovering manually. If I find the issue I'll report back.

Comment by Remon van Vliet [ 30/Aug/12 ]

You're welcome. This should probably be converted to a bug.

Comment by auto [ 30/Aug/12 ]

Author:

{u'date': u'2012-08-30T11:57:27-07:00', u'name': u'Sam Kleinman', u'email': u'samk@10gen.com'}

Message: SERVER-6902: amending release notes
Branch: master
https://github.com/mongodb/docs/commit/67042ad9a226dcd16cfa226c21dc0eec97bbe5bc

Comment by Randolph Tan [ 30/Aug/12 ]

Hi,

Thanks for the bug report. We have successfully reproduced this issue. Based on our preliminary test, you need to upgrade the entire cluster (mongod shards, config servers and mongos) to be able to move chunks again. We are going to update our release notes accordingly as well as investigate the issue further.

Comment by Remon van Vliet [ 30/Aug/12 ]

1) 6x 2.0 shards, we added 2x 2.2 shards and it immediately started failing migrates. No repsets.
2) We have 16 mongos for this cluster
3) no

Comment by Randolph Tan [ 30/Aug/12 ]

Hi,

I have a couple of questions:

1. What does the cluster topology look like? Are the 2.0 and 2.2 shards replica sets?
2. How many mongos do you have? And can you tell what is the version of the mongos with the active balancer?
3. Is this running on an authenticated environment (ie, --keyFile)?

Thanks!

Generated at Thu Feb 08 03:13:04 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.