[SERVER-10653] unable to shard collection with collection x.y already sharded with 1 chunks error Created: 30/Aug/13  Updated: 10/Dec/14  Resolved: 02/Jan/14

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: 2.4.5
Fix Version/s: None

Type: Bug Priority: Minor - P4
Reporter: Oleg Rekutin Assignee: David Hows
Resolution: Duplicate Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

3 shards with replica set per shard, 3 config servers, 3 mongos


Issue Links:
Duplicate
duplicates SERVER-5160 Handle all failed shardCollection com... Closed
Operating System: ALL
Participants:

 Description   

Despite a collection not being sharded in coll.stats() and sh.status() reports, sharding the collection fails.

1)

db.ws9_User.stats()
{
  "sharded": false,
  "primary": "rs_shard2"
 ...
}

2)

sh.shardCollection('test3.ws9_User', { _id: 1})
{
  "code": 13449,
  "ok": 0,
  "errmsg": "exception: collection test3.ws9_User already sharded with 1 chunks"
}

Dropping the collection did not help, subsequent sharding attempt still had the same problem.

In contrast, sharding an already-sharded collection normally produces an "already sharded" message w/o "exception" in it:

sh.shardCollection('test3.ws9_Account', { _id: 1})
{
  "ok": 0,
  "errmsg": "already sharded"
}

Ended up working around the issue by doing "use config; db.chunks.remove(

{"ns":"test3.ws9_User"}

)" and restarting all mongos. However, I'm not sure if this is safe to do in a production dataset where we don't want to lose data (this is a test dataset that was OK to drop).

This happened on one of several collections that were created in the same way.

Possible trigger for this might be that we had several machines talking to several mongos servers, inserting data into all these collections. The code to set up sharding is done on first-access "collection not exist" basis:

if (collection doesn't exist) { enableSharding(); }

So there were insertions taking place while sharding was being enabled. It's also possible that two parallel sharding requests may have been taking place.

I looked at https://github.com/mongodb/mongo/blob/v2.4/src/mongo/s/chunk.cpp#L1000 and there's a comment a bit above the place where the error 13449 is thrown:

        // TODO: Race condition if we shard the collection and insert data while we split across
        // the non-primary shard.

Could this be a manifestation of this?



 Comments   
Comment by Asya Kamsky [ 02/Jan/14 ]

Duplicate of SERVER-5160

Comment by Asya Kamsky [ 02/Jan/14 ]

Oleg,

It is unlikely that your error is a result of the race condition that the comment is referring to as when you initially shard a collection the way you did, none of its chunks would be living on non-primary shard.

What appears likely here is that the first time you attempted to create the collection, the operation failed part way through leaving the meta-data in an inconsistent state - the config.chunks collection was updated to insert the initial chunk for the collection, but config.collections had not been successfully updated.

This suboptimal situation is already being tracked in SERVER-5160 so I'm going to resolve this as a duplicate of that ticket and you can watch that ticket to see then it gets resolved.

Asya

Comment by David Hows [ 21/Oct/13 ]

Hi Oleg,

It does sound like it could be a manifestation of that race condition, but I cannot be certain.

Have you been able to reproduce this yourself subsequently? If so, would you be able to outline the steps you took?

Additionally, were there any messages within the MongoS logs saying that there were failures with the sharding process? From what I can gather this initial chunk was created by the enableSharding command, but the information saying that the collection was sharded was not created. I'd like to follow up on this.

Thanks,
David

Generated at Thu Feb 08 03:23:43 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.