Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-4997

Mongos not clearing stale connections

    • Type: Icon: Bug Bug
    • Resolution: Duplicate
    • Priority: Icon: Critical - P2 Critical - P2
    • None
    • Affects Version/s: None
    • Component/s: Sharding
    • None
    • ALL

      We had the following issue on our production environment today:

      Due to a mistake, a mongod process needed to be restarted. This caused the secondary member of the replica set to failover to primary.
      However, after the freshly restarted mongod came back up, another election was held and it was re-elected primary.

      From that point on, it was no longer possible to query a non-sharded DB that resides on the replica set that experienced the restart.
      Connecting to mongos and trying to query the database returned the following error in mongo shell:
      [code]
      mongos> db.collection.find()
      error:

      { "$err" : "socket exception", "code" : 9001 }

      [code]

      After manually retrying the query by repeating the command over and over (between 20-40 times) in mongo shell, the situation eventually cleared up and queries worked normally again, both from the shell as well as from our application. Unfortunately, this process needed to be repeated for every mongos-instance on the cluster, which is six in total.

      It looks to me as if mongos does not check connections to the cluster's other members before using them.
      Is it possible to add that functionality?
      It wouldn't need to check before every use of the connection (though that behaviour might be desirable in some cases, same way it works for connecting to SQL databases from Java using JDBC connection pools), but the administrator shouldn't need to have to manually sort through.

      Or is it already there and we just haven't seen the switch for it, yet?

            Assignee:
            Unassigned Unassigned
            Reporter:
            christian.tonhaeuser@navteq.com Christian Tonhäuser
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated:
              Resolved: