Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-12638

Initial sharding with hashed shard key can result in duplicate split points

    • Type: Icon: Bug Bug
    • Resolution: Done
    • Priority: Icon: Major - P3 Major - P3
    • 2.6.1, 2.7.0
    • Affects Version/s: None
    • Component/s: Sharding
    • Labels:
    • Environment:
    • Linux

      Issue Status as of April 15, 2014

      ISSUE SUMMARY
      In certain cases, the initial distribution of chunks for a hashed sharded collection to multiple shards can cause mongos to split at the same split point more than once, resulting in a corrupted collection metadata in the shard (not visible in the config server). If the corrupted chunks are later migrated in this collection, the corrupted chunk data can creep into the config server.

      These empty chunks can be seen via the getShardVersion command with the {fullMetadata : true} option executed directly against the affected single mongod or replica set primary of the shard.

      USER IMPACT
      This bug can corrupt the config metadata and in turn cause existing documents not to be returned correctly.

      WORKAROUNDS
      If the corrupt metadata has not yet propagated to the config servers, the workaround is to stepdown or restart all primaries after sharding the collection on a hashed shard key. This will correctly reload metadata from the config server.

      RESOLUTION
      Prevent splitting on chunk boundaries to avoid the issue.

      AFFECTED VERSIONS
      All recent production releases up to 2.6.0 are affected.

      PATCHES
      The patch is included in the 2.6.1 production release.

      Original description

      In certain cases, the initial distribution of chunks for a hashed sharded collection to multiple shards can create duplicate split points, resulting in invisible, empty chunks with the same "min" and "max" value in the collection metadata. These should not interfere with normal operation, but if chunks are later migrated in this collection, this may result in inconsistent metadata which must be manually fixed.

      These empty chunks can be seen via the "getShardMetadata" command with the "fullMetadata : true" option executed directly against the affected single mongod or replica set primary of the shard. The workaround is to stepdown or restart the single mongod or primary, which will correctly reload metadata from the config server.

      Original Description:

      After an unexpected reboot of application server I found out that mongos started to show errors while I try to run show collections.

       mongos> show collections;
        Mon Feb  3 22:50:21.680 error: {
          "$err" : "error loading initial database config information :: caused by :: Couldn't load a valid config for database.stats_archive_monthly after 3 attempts. Please try again.",
          "code" : 13282
        } at src/mongo/shell/query.js:128
      

      However, all mongo servers and mongo config servers were healthy and have no issues in logs.

      First of all I tried to reboot each of the server in cluster with no success. Error still occurs.

      Then after a little check of mongo source I found out that this error could be caused by overlapping ranges of shard keys.

      Looking into shard information for broken collection, I noticed this:

      database.stats_archive_monthly
              shard key: { "a" : "hashed" }
              chunks:
                  rs1 6
                  rs0 6
              { "a" : { "$minKey" : 1 } } -->> { "a" : NumberLong("-7686143364045646500") } on : rs1 Timestamp(2, 0)
              { "a" : NumberLong("-7686143364045646500") } -->> { "a" : NumberLong("-6148914691236517200") } on : rs1 Timestamp(3, 0)
              { "a" : NumberLong("-6148914691236517200") } -->> { "a" : NumberLong("-4611686018427387900") } on : rs1 Timestamp(4, 0)
              { "a" : NumberLong("-4611686018427387900") } -->> { "a" : NumberLong("-3074457345618258600") } on : rs1 Timestamp(5, 0)
              { "a" : NumberLong("-3074457345618258600") } -->> { "a" : NumberLong("-1537228672809129300") } on : rs1 Timestamp(6, 0)
              { "a" : NumberLong("-1537228672809129300") } -->> { "a" : NumberLong(0) } on : rs1 Timestamp(7, 0)
              { "a" : NumberLong(0) } -->> { "a" : NumberLong("7686143364045646500") } on : rs0 Timestamp(7, 1)
              { "a" : NumberLong("1537228672809129300") } -->> { "a" : NumberLong("3074457345618258600") } on : rs0 Timestamp(1, 9)
              { "a" : NumberLong("3074457345618258600") } -->> { "a" : NumberLong("4611686018427387900") } on : rs0 Timestamp(1, 10)
              { "a" : NumberLong("4611686018427387900") } -->> { "a" : NumberLong("6148914691236517200") } on : rs0 Timestamp(1, 11)
              { "a" : NumberLong("6148914691236517200") } -->> { "a" : NumberLong("7686143364045646500") } on : rs0 Timestamp(1, 12)
              { "a" : NumberLong("7686143364045646500") } -->> { "a" : { "$maxKey" : 1 } } on : rs0 Timestamp(1, 13)
      

      There is range

      { "a" : NumberLong(0) } -->> { "a" : NumberLong("*7686143364045646500*") } on : rs0 Timestamp(7, 1)
      

      that is overlapping all shard keys from first replica set.

      For some additional statistics: First replica set contains 73 records, second replica set contain 0 records.

      rs0:PRIMARY> db.stats_archive_monthly.count();
      73
      
      rs1:PRIMARY> db.stats_archive_monthly.count();
      0
      

      Only one query that work with this collection is:

       $mongo_db['stats_archive_monthly'].update( {a: account_id, l_id: location_id, t: time.truncate(interval())}, {'$set' => {u: data.to_i}}, upsert: true)
      

      All data on DB servers is correct, since it is staging environment and all documents has

      {"a" : 1}

      they all should appear at only one shard.

      Somehow now DB is completely unusable unless it is full restored.

        1. mongos1-failedShardConfig.log
          214 kB
        2. shard-problem.txt
          36 kB
        3. 20140321_1540.rar
          71 kB
        4. SERVER-12638.js
          2 kB
        5. repro.js
          2 kB
        6. repro24.out.gz
          85 kB

            Assignee:
            randolph@mongodb.com Randolph Tan
            Reporter:
            NexoMichael Mikhail Kochegarov [X]
            Votes:
            2 Vote for this issue
            Watchers:
            18 Start watching this issue

              Created:
              Updated:
              Resolved: