Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-20775

cluster not reachable while only one (of three) configserver was down

    • Type: Icon: Bug Bug
    • Resolution: Cannot Reproduce
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: 2.6.4, 2.6.10
    • Component/s: Sharding
    • Labels:
      None
    • ALL

      We have one cluster consisting of 5 shards, each consisting of 3 physical replset members. 3 configservers and 3 routers (mongos) are running on 3 different VM's, called sx350, sx351, sx352. We have also 3 other VM's, called offerstore-en-router-01, offerstore-en-router-02 and offerstore-en-router-03 where we have installed 3 other router (mongos).
      One VM (sx352) went down at 7 o'clock, so its configserver and router crashed down as well.

      The problem is that no connections through mongos on offerstore-en-router-01, offerstore-en-router-02 and offerstore-en-router-03 were possible until sx352 went back round about 20 minutes later after it had crashed down!

      While sx352 was down, the mongoshell waited so long to connect (using auth) that I closed it before it came back. Without using --user and --password, the mongoshell could connect quickly but as soon as I entered db.auth("admin", "XXX"), the mongoshell blocked so I closed it after a few seconds.

      Do you know why one crashed configserver is able to compromise the access to the cluster through mongos, running on a different VM's, and how one can avoid this issue?
      Thanks!

            Assignee:
            ramon.fernandez@mongodb.com Ramon Fernandez Marina
            Reporter:
            kay.agahd@idealo.de Kay Agahd
            Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

              Created:
              Updated:
              Resolved: