Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-27078

Race in ShardRegistry initialization causes it to not update the config server connection string

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major - P3
    • Resolution: Fixed
    • Affects Version/s: 3.4.0-rc3
    • Fix Version/s: 3.4.1, 3.5.1
    • Component/s: Sharding
    • Labels:
      None
    • Backwards Compatibility:
      Fully Compatible
    • Operating System:
      ALL
    • Backport Completed:
    • Steps To Reproduce:
      Hide

      Build a custom mongos with this change: add a sleepsecs(5) right before this line:

      https://github.com/mongodb/mongo/blob/r3.4.0-rc2/src/mongo/s/client/shard_registry.cpp#L187

      1. Deploy config server replica sets with 3 members
      2. Run mongos, but only pass one of the members in the configdb parameter. Example:

        ./mongos --port 20005 --configdb test-configRS/ren-desktop:20001
        

      Logs like these will begin to show up:

      2016-11-16T17:10:16.199-0500 I ASIO     [NetworkInterfaceASIO-ShardRegistry-0] Failed to connect to ren-desktop:20003 - ShardNotFound: No shard found for host: ren-desktop:20003
      2016-11-16T17:10:16.199-0500 D -        [shard registry reload] User Assertion: 70:could not get updated shard list from config server due to No shard found for host: ren-desktop:20003 src/mongo/s/client/shard_registry.cpp 325
      

      Show
      Build a custom mongos with this change: add a sleepsecs(5) right before this line: https://github.com/mongodb/mongo/blob/r3.4.0-rc2/src/mongo/s/client/shard_registry.cpp#L187 Deploy config server replica sets with 3 members Run mongos, but only pass one of the members in the configdb parameter. Example: ./mongos --port 20005 --configdb test-configRS/ren-desktop:20001 Logs like these will begin to show up: 2016-11-16T17:10:16.199-0500 I ASIO [NetworkInterfaceASIO-ShardRegistry-0] Failed to connect to ren-desktop:20003 - ShardNotFound: No shard found for host: ren-desktop:20003 2016-11-16T17:10:16.199-0500 D - [shard registry reload] User Assertion: 70:could not get updated shard list from config server due to No shard found for host: ren-desktop:20003 src/mongo/s/client/shard_registry.cpp 325
    • Sprint:
      Sharding 2016-12-12
    • Linked BF Score:
      0

      Description

      Description of race:

      1. Start mongos with just a single node specified in --configdb
      2. ShardRegistry::init() gets called.
      3. ShardFactory::createShard gets called for "config" and the initial config server seed string (ref).
      4. The ReplicaSetMonitor for the config replica sets gets created.
      5. ReplicaSetMonitor (RSM) reload thread discovers the other nodes in the CSRS.
      6. RSM tries to update the ShardRegistry via the synchronous update hook. However, it fails to update because the "config" entry hasn't been created yet (ref).
      7. ShardRegistry populates the "config" shard entry (ref).

        Attachments

          Activity

            People

            • Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: