Loading...

XML

Word

Printable

JSON

Type: Bug
Resolution: Duplicate
Priority: Minor - P4
Fix Version/s: None
Affects Version/s: 2.4.5
Component/s: Sharding
Labels:
None
Environment:
3 shards with replica set per shard, 3 config servers, 3 mongos

Operating System:
ALL
Confidence Status:
None
Work Order:
3
CAR Domain/s:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

Despite a collection not being sharded in coll.stats() and sh.status() reports, sharding the collection fails.

db.ws9_User.stats()
{
  "sharded": false,
  "primary": "rs_shard2"
 ...
}

sh.shardCollection('test3.ws9_User', { _id: 1})
{
  "code": 13449,
  "ok": 0,
  "errmsg": "exception: collection test3.ws9_User already sharded with 1 chunks"
}

Dropping the collection did not help, subsequent sharding attempt still had the same problem.

In contrast, sharding an already-sharded collection normally produces an "already sharded" message w/o "exception" in it:

sh.shardCollection('test3.ws9_Account', { _id: 1})
{
  "ok": 0,
  "errmsg": "already sharded"
}

Ended up working around the issue by doing "use config; db.chunks.remove(

{"ns":"test3.ws9_User"}

)" and restarting all mongos. However, I'm not sure if this is safe to do in a production dataset where we don't want to lose data (this is a test dataset that was OK to drop).

This happened on one of several collections that were created in the same way.

Possible trigger for this might be that we had several machines talking to several mongos servers, inserting data into all these collections. The code to set up sharding is done on first-access "collection not exist" basis:

if (collection doesn't exist) { enableSharding(); }

So there were insertions taking place while sharding was being enabled. It's also possible that two parallel sharding requests may have been taking place.

I looked at https://github.com/mongodb/mongo/blob/v2.4/src/mongo/s/chunk.cpp#L1000 and there's a comment a bit above the place where the error 13449 is thrown:

        // TODO: Race condition if we shard the collection and insert data while we split across
        // the non-primary shard.

Could this be a manifestation of this?

duplicates

SERVER-5160 Handle all failed shardCollection commands well

Closed

Assignee:: David Hows (Inactive)
Reporter:: Oleg Rekutin
Participants:: Asya Kamsky, David Hows, Oleg Rekutin
Votes:: 0 Vote for this issue
Watchers:: 7 Start watching this issue

Created:: Aug 30 2013 06:28:46 PM UTC
Updated:: Dec 10 2014 11:05:31 PM UTC
Resolved:: Jan 02 2014 09:55:04 PM UTC

Details

Description

Attachments

Issue Links

Forms

Activity

People

Dates