- 
    Type:Bug 
- 
    Resolution: Duplicate
- 
    Priority:Major - P3 
- 
    None
- 
    Affects Version/s: 3.4.13, 3.4.14, 3.6.3
- 
    Component/s: Stability
- 
    None
- 
        ALL
- 
        
- 
        None
- 
        None
- 
        None
- 
        None
- 
        None
- 
        None
- 
        None
We have a problem with mongodb (3.6.3) PRIMARY server. After some time it gets to a state where it is still PRIMARY but it is not accepting connections. The problem is that it keeps PRIMARY role and because of that our app crashes. Mongodb restart on PRIMARY server helps and everything backs to normal.
We are hosting mongodb in Amazon on 3 Ubuntu m5.4xlarge instances with 3000 IOPS EBS volumes.
During the crash we have ~30% more connections to MongoDB than usual, but they are still far below the limits and far below fs.file-max setting that is set to 6430188. No other metric looks suspicious. RAM, CPU, Disk and Network usage are on the same level as just before crash and right after restart of PRIMARY. We have already migrate MongoDB from 3.4.14 to 3.6.3 and problem still occurs every 1-2 days. We have also changed priority for PRIMARY server and migrate this role to another host so it’s not connected to any specific machine.
There is nothing interesting on logs.
Here is the output of some commands that we run when the server was in not responsive state:
$ mongo -u root -p pass --authenticationDatabase admin --eval 'rs.status()' MongoDB shell version v3.6.3 connecting to: mongodb://127.0.0.1:27017 MongoDB server version: 3.6.3 2018-03-29T02:21:48.272+0000 E QUERY [thread1] Error: network error while attempting to run command 'saslStart' on host '127.0.0.1:27017' : DB.prototype._authOrThrow@src/mongo/shell/db.js:1608:20 @(auth):6:1 @(auth):1:2 $ mongo -u root -p pass --authenticationDatabase admin --eval 'db.runCommand( { "connPoolStats" : 1 } )' MongoDB shell version v3.6.3 connecting to: mongodb://127.0.0.1:27017 MongoDB server version: 3.6.3 2018-03-29T02:21:48.272+0000 E QUERY [thread1] Error: network error while attempting to run command 'saslStart' on host '127.0.0.1:27017' : DB.prototype._authOrThrow@src/mongo/shell/db.js:1608:20 @(auth):6:1 @(auth):1:2 $ mongo -u root -p pass --authenticationDatabase admin --eval 'db.runCommand( { serverStatus: 1 } )' MongoDB shell version v3.6.3 connecting to: mongodb://127.0.0.1:27017 2018-03-29T02:21:48.382+0000 W NETWORK [thread1] Failed to connect to 127.0.0.1:27017, in(checking socket for error after poll), reason: Connection refused 2018-03-29T02:21:48.382+0000 E QUERY [thread1] Error: couldn't connect to server 127.0.0.1:27017, connection attempt failed : connect@src/mongo/shell/mongo.js:251:13 @(connect):1:6
Any idea what else should we check to debug it?
- depends on
- 
                    SERVER-33445 Add signal handler to generate stack traces -         
- Closed
 
-         
- duplicates
- 
                    WT-3972 Allow more than 64K cursors to be open on a data source simultaneously -         
- Closed
 
-         
