[SERVER-9155] Mongos process crashes on socket exception Created: 28/Mar/13  Updated: 10/Dec/14  Resolved: 28/Mar/13

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: 2.4.1
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Nick Demyanchuk Assignee: Unassigned
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Ubuntu 12.04 x86_64
Mongo 2.4.1


Participants:

 Description   

This happens when i start third mongos process (no problems with first and second at all). All processes live on different ec2 instances.

mongos log:

Wed Mar 27 14:34:28.260 security key: VpexFQl8pY9smMMJz6Ekqf
Wed Mar 27 14:34:28.260 [mongosMain] MongoS version 2.4.1 starting: pid=4596 port=27017 64-bit host=mongo-1-0 (--help for usage)
Wed Mar 27 14:34:28.260 [mongosMain] git version: 1560959e9ce11a693be8b4d0d160d633eee75110
Wed Mar 27 14:34:28.260 [mongosMain] build info: Linux ip-10-2-29-40 2.6.21.7-2.ec2.v1.2.fc8xen #1 SMP Fri Nov 20 17:48:28 EST 2009 x86_64 BOOST_LIB_VERSION=1_49
Wed Mar 27 14:34:28.260 [mongosMain] options: { configdb: "mongo-0-0:27019", fork: true, keyFile: "/etc/scalr/private.d/keys/mongodb", logpath: "/var/log/mongodb/mongodb.router.log", vvvvv: true }
Wed Mar 27 14:34:28.265 [mongosMain]  config string : mongo-0-0:27019
Wed Mar 27 14:34:28.265 [mongosMain] creating new connection to:mongo-0-0:27019
Wed Mar 27 14:34:28.266 BackgroundJob starting: ConnectBG
Wed Mar 27 14:34:30.704 [mongosMain] connected connection!
Wed Mar 27 14:34:30.704 [mongosMain] calling onCreate auth for mongo-0-0:27019
Wed Mar 27 14:34:30.708 [mongosMain] creating new connection to:mongo-0-0:27019
Wed Mar 27 14:34:30.708 BackgroundJob starting: ConnectBG
Wed Mar 27 14:34:30.708 BackgroundJob starting: CheckConfigServers
Wed Mar 27 14:34:44.385 [mongosMain] ERROR: error upgrading config database to v4 :: caused by :: could not load config version for upgrade :: caused by :: 11002 socket exception [6] server [mongo-0-0:27019] mongos connectionpool error: couldn't connect to server mongo-0-0:27019

mongo configserver log relevant part:

Wed Mar 27 14:34:44.384 [initandlisten] connection accepted from 10.141.173.155:51988 #14 (11 connections now open)          
Wed Mar 27 14:34:44.384 [conn14] SocketException: remote: 10.141.173.155:51988 error: 9001 socket exception [0] server [10.141.173.155:51988]                                                                                                             
Wed Mar 27 14:34:44.384 [conn14] end connection 10.141.173.155:51988 (10 connections now open)                               
Wed Mar 27 14:34:44.399 [conn13] SocketException: remote: 10.141.173.155:51987 error: 9001 socket exception [0] server [10.141.173.155:51987]                                                                                                             
Wed Mar 27 14:34:44.399 [conn13] end connection 10.141.173.155:51987 (9 connections now open)                                



 Comments   
Comment by Scott Hernandez (Inactive) [ 28/Mar/13 ]

The error shows that the connection is severed from the network or somewhere between the machines. The client (mongos) and the server (config svr) show that the connection was broken, not closed (intentionally). You can also see that multiple connections had this problem. I am guessing that something on the network is not configured in a stable way or one of those hosts needs some checking. I would suggest looking at the system logs (/var/log/message or syslog, or dmesg) for corresponding events.

Comment by Nick Demyanchuk [ 28/Mar/13 ]

The thing is that connection is stable (it's amazon ec2 instances in the same availability zone) and i was able to reproduce this 2 times. Every time it's 3rd mongos.
As you can see in log, connection was established, but later socket exception killed the server. Maybe mongos should try to reconnect?

Comment by Scott Hernandez (Inactive) [ 28/Mar/13 ]

The mongos must have stable communication with the config server to start. Any error in the process results in the mongos process shutting down. This is expected behavior.

Comment by Nick Demyanchuk [ 28/Mar/13 ]

If i start mongos again - it doesn't crash anymore.

Generated at Thu Feb 08 03:19:32 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.