Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-34190

MongoDB process hangs after some random time

    • Type: Icon: Bug Bug
    • Resolution: Duplicate
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: 3.4.13, 3.4.14, 3.6.3
    • Component/s: Stability
    • Labels:
      None
    • ALL
    • Hide

      Not found yet.

      Show
      Not found yet.

      We have a problem with mongodb (3.6.3) PRIMARY server. After some time it gets to a state where it is still PRIMARY but it is not accepting connections. The problem is that it keeps PRIMARY role and because of that our app crashes. Mongodb restart on PRIMARY server helps and everything backs to normal.

      We are hosting mongodb in Amazon on 3 Ubuntu m5.4xlarge instances with 3000 IOPS EBS volumes.

      During the crash we have ~30% more connections to MongoDB than usual, but they are still far below the limits and far below fs.file-max setting that is set to 6430188. No other metric looks suspicious. RAM, CPU, Disk and Network usage are on the same level as just before crash and right after restart of PRIMARY. We have already migrate MongoDB from 3.4.14 to 3.6.3 and problem still occurs every 1-2 days. We have also changed priority for PRIMARY server and migrate this role to another host so it’s not connected to any specific machine.

      There is nothing interesting on logs.

      Here is the output of some commands that we run when the server was in not responsive state:

      $ mongo -u root -p pass --authenticationDatabase admin --eval 'rs.status()'
      MongoDB shell version v3.6.3
      connecting to: mongodb://127.0.0.1:27017
      MongoDB server version: 3.6.3
      2018-03-29T02:21:48.272+0000 E QUERY    [thread1] Error: network error while attempting to run command 'saslStart' on host '127.0.0.1:27017'  :
      DB.prototype._authOrThrow@src/mongo/shell/db.js:1608:20
      @(auth):6:1
      @(auth):1:2
      
      $ mongo -u root -p pass --authenticationDatabase admin --eval 'db.runCommand( { "connPoolStats" : 1 } )'
      MongoDB shell version v3.6.3
      connecting to: mongodb://127.0.0.1:27017
      MongoDB server version: 3.6.3
      2018-03-29T02:21:48.272+0000 E QUERY    [thread1] Error: network error while attempting to run command 'saslStart' on host '127.0.0.1:27017'  :
      DB.prototype._authOrThrow@src/mongo/shell/db.js:1608:20
      @(auth):6:1
      @(auth):1:2
      
      $  mongo -u root -p pass --authenticationDatabase admin --eval 'db.runCommand( { serverStatus: 1 } )'
      MongoDB shell version v3.6.3
      connecting to: mongodb://127.0.0.1:27017
      2018-03-29T02:21:48.382+0000 W NETWORK  [thread1] Failed to connect to 127.0.0.1:27017, in(checking socket for error after poll), reason: Connection refused
      2018-03-29T02:21:48.382+0000 E QUERY    [thread1] Error: couldn't connect to server 127.0.0.1:27017, connection attempt failed :
      connect@src/mongo/shell/mongo.js:251:13
      @(connect):1:6
      

      Any idea what else should we check to debug it?

            Assignee:
            kelsey.schubert@mongodb.com Kelsey Schubert
            Reporter:
            pklimek Piotr Klimek
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

              Created:
              Updated:
              Resolved: