Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-27078

Race in ShardRegistry initialization causes it to not update the config server connection string

    • Type: Icon: Bug Bug
    • Resolution: Done
    • Priority: Icon: Major - P3 Major - P3
    • 3.4.1, 3.5.1
    • Affects Version/s: 3.4.0-rc3
    • Component/s: Sharding
    • Labels:
      None
    • Fully Compatible
    • ALL
    • Hide

      Build a custom mongos with this change: add a sleepsecs(5) right before this line:

      https://github.com/mongodb/mongo/blob/r3.4.0-rc2/src/mongo/s/client/shard_registry.cpp#L187

      1. Deploy config server replica sets with 3 members
      2. Run mongos, but only pass one of the members in the configdb parameter. Example:
        ./mongos --port 20005 --configdb test-configRS/ren-desktop:20001
        

      Logs like these will begin to show up:

      2016-11-16T17:10:16.199-0500 I ASIO     [NetworkInterfaceASIO-ShardRegistry-0] Failed to connect to ren-desktop:20003 - ShardNotFound: No shard found for host: ren-desktop:20003
      2016-11-16T17:10:16.199-0500 D -        [shard registry reload] User Assertion: 70:could not get updated shard list from config server due to No shard found for host: ren-desktop:20003 src/mongo/s/client/shard_registry.cpp 325
      
      Show
      Build a custom mongos with this change: add a sleepsecs(5) right before this line: https://github.com/mongodb/mongo/blob/r3.4.0-rc2/src/mongo/s/client/shard_registry.cpp#L187 Deploy config server replica sets with 3 members Run mongos, but only pass one of the members in the configdb parameter. Example: ./mongos --port 20005 --configdb test-configRS/ren-desktop:20001 Logs like these will begin to show up: 2016-11-16T17:10:16.199-0500 I ASIO [NetworkInterfaceASIO-ShardRegistry-0] Failed to connect to ren-desktop:20003 - ShardNotFound: No shard found for host: ren-desktop:20003 2016-11-16T17:10:16.199-0500 D - [shard registry reload] User Assertion: 70:could not get updated shard list from config server due to No shard found for host: ren-desktop:20003 src/mongo/s/client/shard_registry.cpp 325
    • Sharding 2016-12-12
    • 0

      Description of race:

      1. Start mongos with just a single node specified in --configdb
      2. ShardRegistry::init() gets called.
      3. ShardFactory::createShard gets called for "config" and the initial config server seed string (ref).
      4. The ReplicaSetMonitor for the config replica sets gets created.
      5. ReplicaSetMonitor (RSM) reload thread discovers the other nodes in the CSRS.
      6. RSM tries to update the ShardRegistry via the synchronous update hook. However, it fails to update because the "config" entry hasn't been created yet (ref).
      7. ShardRegistry populates the "config" shard entry (ref).

            Assignee:
            misha.tyulenev@mongodb.com Misha Tyulenev
            Reporter:
            randolph@mongodb.com Randolph Tan
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

              Created:
              Updated:
              Resolved: