Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-17530

Concurrent shardcollection commands from two mongos' result in inconsistent metadata

    • Type: Icon: Bug Bug
    • Resolution: Done
    • Priority: Icon: Major - P3 Major - P3
    • 3.2.0
    • Affects Version/s: 2.4.6, 3.0.9
    • Component/s: Sharding
    • None
    • Fully Compatible
    • ALL

      When two mongos' issued a shardcollection command for the same collection in the same millisecond, the corresponding config.chunks documents for the collection went out of sync. Specifically, the ObjectId value for the chunks.lastmodEpoch field was different in each config db.

      Additionally, config.collections in all dbs shows the collection to have a { dropped : true }, even though no drop command was issued for that collection.

      One mongos was successful, with caveats, and the other had an assertion failure.
      The mongos that failed to shard wrote the following log

      Fri Mar  6 02:00:03.377 [conn82] warning: got invalid chunk version 1|0||54f90a23c9192f97e6c145f2 in document { _id: "db.coll-region_name_MinKeysh_random_part_MinKeysh_session_id_MinKey", lastmod: Timestamp 1000|0, lastmodEpoch: ObjectId('54f90a23c9192f97e6c145f2'), ns: "db.coll", min: { region_name: MinKey, sh_random_part: MinKey, sh_session_id: MinKey }, max: { region_name: MaxKey, sh_random_part: MaxKey, sh_session_id: MaxKey }, shard: "server1" } when trying to load differing chunks at version 0|0||54f90a23c9f111a9ca020239
      Fri Mar  6 02:00:03.377 [conn82] warning: major change in chunk information found when reloading db.coll, previous version was 0|0||54f90a23c9f111a9ca020239
      Fri Mar  6 02:00:03.377 [conn82] ChunkManager: time to load chunks for db.coll: 0ms sequenceNumber: 322 version: 0|0||000000000000000000000000 based on: (empty)
      Fri Mar  6 02:00:03.377 [conn82] warning: no chunks found for collection db.coll, assuming unsharded
      Fri Mar  6 02:00:03.650 [conn82]   Assertion failure manager.get() src/mongo/s/config.cpp 183
      0x9b2106 0x97c2c1 0x8b3579 0x88166f 0x8f419a 0x88f30c 0x91a84e 0x8f2205 0x696c71 0x99e7f9 0x3d344079d1 0x3d33ce88fd
       /opt/mongodb/latest/bin/mongos(_ZN5mongo15printStackTraceERSo+0x26) [0x9b2106]
       /opt/mongodb/latest/bin/mongos(_ZN5mongo12verifyFailedEPKcS1_j+0xc1) [0x97c2c1]
      

      The other mongos generated this log snippet:

      Fri Mar  6 02:00:00.983 [conn142] warning: mongos collstats doesn't know about: systemFlags
      Fri Mar  6 02:00:00.983 [conn142] warning: mongos collstats doesn't know about: userFlags
      Fri Mar  6 02:00:03.100 [conn142] warning: mongos collstats doesn't know about: systemFlags
      Fri Mar  6 02:00:03.100 [conn142] warning: mongos collstats doesn't know about: userFlags
      Fri Mar  6 02:00:03.224 [conn142] CMD: shardcollection: { shardCollection: "db.coll", key: { region_name: 1.0, sh_random_part: 1.0, sh_session_id: 1.0 }, unique: false }
      Fri Mar  6 02:00:03.224 [conn142] enable sharding on: db.coll with shard key: { region_name: 1.0, sh_random_part: 1.0, sh_session_id: 1.0 }
      Fri Mar  6 02:00:03.225 [conn142] going to create 1 chunk(s) for: db.coll using new epoch 54f90a23c9192f97e6c145f2
      Fri Mar  6 02:00:03.377 [conn142] ChunkManager: time to load chunks for db.coll: 0ms sequenceNumber: 339 version: 1|0||54f90a23c9192f97e6c145f2 based on: (empty)
      Fri Mar  6 02:00:03.657 [conn142] warning: reloading full configuration for db, connection state indicates significant version changes
      Fri Mar  6 02:00:03.659 [conn142] ChunkManager: time to load chunks for db.coll_2015_03_03: 0ms sequenceNumber: 340 version: 2|3||54f596732517941189b1f2da based on: (empty)
      Fri Mar  6 02:00:03.666 [conn142] ChunkManager: time to load chunks for db.coll_2015_03_04: 7ms sequenceNumber: 341 version: 26|3||54f5967b2517941189b1f2db based on: (empty)
      Fri Mar  6 02:00:03.678 [conn142] ChunkManager: time to load chunks for db.coll_2015_03_05: 11ms sequenceNumber: 342 version: 26|17||54f66720f5067826eafb5971 based on: (empty)
      Fri Mar  6 02:00:03.683 [conn142] ChunkManager: time to load chunks for db.coll_2015_03_06: 4ms sequenceNumber: 343 version: 13|1||54f7b8a04ca5890cd8487bc0 based on: (empty)
      Fri Mar  6 02:00:05.054 [conn142] ChunkManager: time to load chunks for db.coll_2015_03_03: 0ms sequenceNumber: 344 version: 2|3||54f596732517941189b1f2da based on: (empty)
      Fri Mar  6 02:00:05.058 [conn142] ChunkManager: time to load chunks for db.coll_2015_03_04: 4ms sequenceNumber: 345 version: 26|3||54f5967b2517941189b1f2db based on: (empty)
      Fri Mar  6 02:00:05.070 [conn142] ChunkManager: time to load chunks for db.coll_2015_03_05: 11ms sequenceNumber: 346 version: 26|17||54f66720f5067826eafb5971 based on: (empty)
      Fri Mar  6 02:00:05.075 [conn142] ChunkManager: time to load chunks for db.coll_2015_03_06: 4ms sequenceNumber: 347 version: 13|1||54f7b8a04ca5890cd8487bc0 based on: (empty)
      

      It would probably be more useful if the second mongoS would return an error saying something to the effect of "Sharding already in process for said collection". Then the application developer would be able to handle the error and continue or exit.

            Assignee:
            schwerin@mongodb.com Andy Schwerin
            Reporter:
            adam.schwartz@mongodb.com Adam Schwartz
            Votes:
            0 Vote for this issue
            Watchers:
            13 Start watching this issue

              Created:
              Updated:
              Resolved: