- 
    Type:Bug 
- 
    Resolution: Done
- 
    Priority:Major - P3 
- 
    Affects Version/s: None
- 
    Component/s: Sharding
- 
        Linux
- 
        None
- 
        None
- 
        None
- 
        None
- 
        None
- 
        None
- 
        None
ISSUE SUMMARY
In certain cases, the initial distribution of chunks for a hashed sharded collection to multiple shards can cause mongos to split at the same split point more than once, resulting in a corrupted collection metadata in the shard (not visible in the config server). If the corrupted chunks are later migrated in this collection, the corrupted chunk data can creep into the config server.
These empty chunks can be seen via the getShardVersion command with the {fullMetadata : true} option executed directly against the affected single mongod or replica set primary of the shard.
USER IMPACT
This bug can corrupt the config metadata and in turn cause existing documents not to be returned correctly. 
WORKAROUNDS
If the corrupt metadata has not yet propagated to the config servers, the workaround is to stepdown or restart all primaries after sharding the collection on a hashed shard key. This will correctly reload metadata from the config server.
RESOLUTION
Prevent splitting on chunk boundaries to avoid the issue. 
AFFECTED VERSIONS
All recent production releases up to 2.6.0 are affected.
PATCHES
The patch is included in the 2.6.1 production release.
Original description
In certain cases, the initial distribution of chunks for a hashed sharded collection to multiple shards can create duplicate split points, resulting in invisible, empty chunks with the same "min" and "max" value in the collection metadata. These should not interfere with normal operation, but if chunks are later migrated in this collection, this may result in inconsistent metadata which must be manually fixed.
These empty chunks can be seen via the "getShardMetadata" command with the "fullMetadata : true" option executed directly against the affected single mongod or replica set primary of the shard. The workaround is to stepdown or restart the single mongod or primary, which will correctly reload metadata from the config server.
Original Description:
After an unexpected reboot of application server I found out that mongos started to show errors while I try to run show collections.
 mongos> show collections;
  Mon Feb  3 22:50:21.680 error: {
    "$err" : "error loading initial database config information :: caused by :: Couldn't load a valid config for database.stats_archive_monthly after 3 attempts. Please try again.",
    "code" : 13282
  } at src/mongo/shell/query.js:128
However, all mongo servers and mongo config servers were healthy and have no issues in logs.
First of all I tried to reboot each of the server in cluster with no success. Error still occurs.
Then after a little check of mongo source I found out that this error could be caused by overlapping ranges of shard keys.
Looking into shard information for broken collection, I noticed this:
database.stats_archive_monthly
        shard key: { "a" : "hashed" }
        chunks:
            rs1 6
            rs0 6
        { "a" : { "$minKey" : 1 } } -->> { "a" : NumberLong("-7686143364045646500") } on : rs1 Timestamp(2, 0)
        { "a" : NumberLong("-7686143364045646500") } -->> { "a" : NumberLong("-6148914691236517200") } on : rs1 Timestamp(3, 0)
        { "a" : NumberLong("-6148914691236517200") } -->> { "a" : NumberLong("-4611686018427387900") } on : rs1 Timestamp(4, 0)
        { "a" : NumberLong("-4611686018427387900") } -->> { "a" : NumberLong("-3074457345618258600") } on : rs1 Timestamp(5, 0)
        { "a" : NumberLong("-3074457345618258600") } -->> { "a" : NumberLong("-1537228672809129300") } on : rs1 Timestamp(6, 0)
        { "a" : NumberLong("-1537228672809129300") } -->> { "a" : NumberLong(0) } on : rs1 Timestamp(7, 0)
        { "a" : NumberLong(0) } -->> { "a" : NumberLong("7686143364045646500") } on : rs0 Timestamp(7, 1)
        { "a" : NumberLong("1537228672809129300") } -->> { "a" : NumberLong("3074457345618258600") } on : rs0 Timestamp(1, 9)
        { "a" : NumberLong("3074457345618258600") } -->> { "a" : NumberLong("4611686018427387900") } on : rs0 Timestamp(1, 10)
        { "a" : NumberLong("4611686018427387900") } -->> { "a" : NumberLong("6148914691236517200") } on : rs0 Timestamp(1, 11)
        { "a" : NumberLong("6148914691236517200") } -->> { "a" : NumberLong("7686143364045646500") } on : rs0 Timestamp(1, 12)
        { "a" : NumberLong("7686143364045646500") } -->> { "a" : { "$maxKey" : 1 } } on : rs0 Timestamp(1, 13)
There is range
{ "a" : NumberLong(0) } -->> { "a" : NumberLong("*7686143364045646500*") } on : rs0 Timestamp(7, 1)
that is overlapping all shard keys from first replica set.
For some additional statistics: First replica set contains 73 records, second replica set contain 0 records.
rs0:PRIMARY> db.stats_archive_monthly.count(); 73 rs1:PRIMARY> db.stats_archive_monthly.count(); 0
Only one query that work with this collection is:
$mongo_db['stats_archive_monthly'].update( {a: account_id, l_id: location_id, t: time.truncate(interval())}, {'$set' => {u: data.to_i}}, upsert: true)
All data on DB servers is correct, since it is staging environment and all documents has
{"a" : 1}they all should appear at only one shard.
Somehow now DB is completely unusable unless it is full restored.