Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-16044

Add option: On startup mongos should continue trying if configdb is not yet running

    • Type: Icon: New Feature New Feature
    • Resolution: Duplicate
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: None
    • Component/s: Sharding
    • None

      When starting a mongos, if the specified config db servers are not yet running, then mongos will immediately exit and log an error message.

      Example:

      $ mongos --configdb 127.0.0.1:12345
      2014-11-10T08:02:53.371+0000 warning: running with 1 config server should be done only for testing purposes and is not recommended for production
      2014-11-10T08:02:53.373+0000 [mongosMain] MongoS version 2.6.5 starting: pid=5546 port=27017 64-bit host=hingo-sputnik (--help for usage)
      ...
      2014-11-10T08:02:53.373+0000 [mongosMain] options: { sharding: { configDB: "127.0.0.1:12345" } }
      2014-11-10T08:02:53.374+0000 [mongosMain] warning: Failed to connect to 127.0.0.1:12345, reason: errno:111 Connection refused
      2014-11-10T08:02:53.374+0000 [mongosMain] warning:  couldn't check dbhash on config server 127.0.0.1:12345 :: caused by :: 11002 socket exception [CONNECT_ERROR] server [127.0.0.1:12345] connection pool error: couldn't connect to server 127.0.0.1:12345 (127.0.0.1), connection attempt failed
      2014-11-10T08:02:53.374+0000 [mongosMain] warning: Failed to connect to 127.0.0.1:12345, reason: errno:111 Connection refused
      2014-11-10T08:02:53.374+0000 warning: Failed to connect to 127.0.0.1:12345, reason: errno:111 Connection refused
      2014-11-10T08:02:53.374+0000 warning:  couldn't check dbhash on config server 127.0.0.1:12345 :: caused by :: 11002 socket exception [CONNECT_ERROR] server [127.0.0.1:12345] connection pool error: couldn't connect to server 127.0.0.1:12345 (127.0.0.1), connection attempt failed
      2014-11-10T08:02:53.375+0000 [mongosMain] ERROR: error upgrading config database to v5 :: caused by :: could not load config version for upgrade :: caused by :: connection pool: connect failed 127.0.0.1:12345 : couldn't connect to server 127.0.0.1:12345 (127.0.0.1), connection attempt failed
      $
      

      The feature request is to add a command line option that causes mongos to keep retrying indefinitively:

      $ mongos --configdb 127.0.0.1:12345  --waitForConfigdb
      

      This will allow easier deployment orchestration in some scenarios / with some tools when there is not a dependency on the order in which configdb and mongos are started. Also, there are scenarios like DC wide powerfailures where all servers will restart at the same time. This will cause some of the mongoses to start before the configdbs, and then fail, after which sysadmins have to go and restart them manually.

      Note that the proposed behavior is how replica sets already work: For an existing replica set, it is possible to start all nodes at the same time, without a particular order, and they will just wait for a quorum to appear and then proceed to elect a primary.

      (There is an argument for an alternative proposal: instead of adding "wait for configdb" as an option, the default startup behavior could be changed to behave like this. The benefit of this is to avoid adding a config option, however it would change existing behavior. But it seems quite possible that in this case changing existing behavior would not cause serious problems for anyone.)

            Assignee:
            greg_10gen Greg Studer
            Reporter:
            henrik.ingo@mongodb.com Henrik Ingo (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            8 Start watching this issue

              Created:
              Updated:
              Resolved: