This is mongos -vvvv log from one of the failed connections Thu Dec 26 01:10:54.088 [mongosMain] connection accepted from 127.0.0.1:52259 #67 (1 connection now open) Thu Dec 26 01:10:54.088 [conn67] trying reconnect to localhost:27001 Thu Dec 26 01:10:54.088 BackgroundJob starting: ConnectBG Thu Dec 26 01:10:54.088 [conn67] reconnect localhost:27001 ok Thu Dec 26 01:10:54.302 [Balancer] Socket recv() timeout 127.0.0.1:27001 Thu Dec 26 01:10:54.302 [Balancer] SocketException: remote: 127.0.0.1:27001 error: 9001 socket exception [RECV_TIMEOUT] server [127.0.0.1:27001] Thu Dec 26 01:10:54.302 [Balancer] DBClientCursor::init call() failed Thu Dec 26 01:10:54.302 [Balancer] User Assertion: 10276:DBClientBase::findN: transport error: localhost:27001 ns: local.$cmd query: { getnonce: 1 } Thu Dec 26 01:10:54.302 [Balancer] query failed to: localhost:27001 exception: DBClientBase::findN: transport error: localhost:27001 ns: local.$cmd query: { getnonce: 1 } Thu Dec 26 01:10:54.303 [Balancer] Refreshing MaxChunkSize: 64 Thu Dec 26 01:10:54.303 [Balancer] trying reconnect to localhost:27001 Thu Dec 26 01:10:54.303 BackgroundJob starting: ConnectBG Thu Dec 26 01:10:54.303 [Balancer] reconnect localhost:27001 ok Thu Dec 26 01:11:14.422 Socket recv() timeout 127.0.0.1:27001 Thu Dec 26 01:11:14.422 SocketException: remote: 127.0.0.1:27001 error: 9001 socket exception [RECV_TIMEOUT] server [127.0.0.1:27001] Thu Dec 26 01:11:14.422 DBClientCursor::init call() failed Thu Dec 26 01:11:14.422 User Assertion: 10276:DBClientBase::findN: transport error: localhost:27001 ns: config.$cmd query: { dbhash: 1, collections: [ "chunks", "databases" ] } Thu Dec 26 01:11:14.422 warning: couldn't check dbhash on config server localhost:27001 :: caused by :: 10276 DBClientBase::findN: transport error: localhost:27001 ns: config.$cmd query: { dbhash: 1, collections: [ "chunks", "databases" ] } Thu Dec 26 01:11:15.390 [LockPinger] Socket recv() timeout 127.0.0.1:27001 Thu Dec 26 01:11:15.390 [LockPinger] SocketException: remote: 127.0.0.1:27001 error: 9001 socket exception [RECV_TIMEOUT] server [127.0.0.1:27001] Thu Dec 26 01:11:15.390 [LockPinger] DBClientCursor::init call() failed Thu Dec 26 01:11:15.390 [LockPinger] User Assertion: 10276:DBClientBase::findN: transport error: localhost:27001 ns: local.$cmd query: { getnonce: 1 } Thu Dec 26 01:11:15.442 [LockPinger] scoped connection to localhost:27001,localhost:27002,localhost:27003 not being returned to the pool Thu Dec 26 01:11:15.442 [LockPinger] warning: distributed lock pinger 'localhost:27001,localhost:27002,localhost:27003/hingo-sputnik:27017:1388010584:1804289383' detected an exception while pinging. :: caused by :: SyncClusterConnection::udpate prepare failed: localhost:27001:10276 DBClientBase::findN: transport error: localhost:27001 ns: local.$cmd query: { getnonce: 1 } Thu Dec 26 01:11:24.086 [conn67] Socket recv() timeout 127.0.0.1:27001 Thu Dec 26 01:11:24.086 [conn67] SocketException: remote: 127.0.0.1:27001 error: 9001 socket exception [RECV_TIMEOUT] server [127.0.0.1:27001] Thu Dec 26 01:11:24.086 [conn67] DBClientCursor::init call() failed Thu Dec 26 01:11:24.086 [conn67] User Assertion: 10276:DBClientBase::findN: transport error: localhost:27001 ns: local.$cmd query: { getnonce: 1 } Thu Dec 26 01:11:24.086 [conn67] query failed to: localhost:27001 exception: DBClientBase::findN: transport error: localhost:27001 ns: local.$cmd query: { getnonce: 1 } Thu Dec 26 01:11:24.087 [conn67] Request::process begin ns: admin.$cmd msg id: 0 op: 2004 attempt: 0 Thu Dec 26 01:11:24.087 [conn67] single query: admin.$cmd { whatsmyuri: 1 } ntoreturn: 1 options : 0 Thu Dec 26 01:11:24.087 [conn67] Request::process end ns: admin.$cmd msg id: 0 op: 2004 attempt: 0 0ms Thu Dec 26 01:11:24.089 [conn67] Request::process begin ns: test.$cmd msg id: 1 op: 2004 attempt: 0 Thu Dec 26 01:11:24.089 [conn67] single query: test.$cmd { getnonce: 1 } ntoreturn: 1 options : 0 Thu Dec 26 01:11:24.089 [conn67] Request::process end ns: test.$cmd msg id: 1 op: 2004 attempt: 0 0ms Thu Dec 26 01:11:24.089 [conn67] Request::process begin ns: test.$cmd msg id: 2 op: 2004 attempt: 0 Thu Dec 26 01:11:24.089 [conn67] single query: test.$cmd { authenticate: 1, nonce: "28cd283a70c93ec3", user: "henrik", key: "ecec87f680f9fcfee7a8a80e77b715be" } ntoreturn: 1 options : 0 Thu Dec 26 01:11:24.089 [conn67] authenticate db: test { authenticate: 1, nonce: "28cd283a70c93ec3", user: "henrik", key: "ecec87f680f9fcfee7a8a80e77b715be" } Thu Dec 26 01:11:24.090 [conn67] trying reconnect to localhost:27001 Thu Dec 26 01:11:24.091 BackgroundJob starting: ConnectBG Thu Dec 26 01:11:24.091 [conn67] reconnect localhost:27001 ok Thu Dec 26 01:11:24.302 [Balancer] Socket recv() timeout 127.0.0.1:27001 Thu Dec 26 01:11:24.302 [Balancer] SocketException: remote: 127.0.0.1:27001 error: 9001 socket exception [RECV_TIMEOUT] server [127.0.0.1:27001] Thu Dec 26 01:11:24.302 [Balancer] DBClientCursor::init call() failed Thu Dec 26 01:11:24.302 [Balancer] User Assertion: 10276:DBClientBase::findN: transport error: localhost:27001 ns: local.$cmd query: { getnonce: 1 } Thu Dec 26 01:11:24.302 [Balancer] query failed to: localhost:27001 exception: DBClientBase::findN: transport error: localhost:27001 ns: local.$cmd query: { getnonce: 1 } Thu Dec 26 01:11:24.304 [Balancer] trying to acquire new distributed lock for balancer on localhost:27001,localhost:27002,localhost:27003 ( lock timeout : 900000, ping interval : 30000, process : hingo-sputnik:27017:1388010584:1804289383 ) Thu Dec 26 01:11:44.802 [PeriodicTask::Runner] task: DBConnectionPool-cleaner took: 0ms Thu Dec 26 01:11:44.802 [PeriodicTask::Runner] task: DBConnectionPool-cleaner took: 0ms Thu Dec 26 01:11:45.442 [LockPinger] distributed lock pinger 'localhost:27001,localhost:27002,localhost:27003/hingo-sputnik:27017:1388010584:1804289383' about to ping. Thu Dec 26 01:11:45.442 [LockPinger] trying reconnect to localhost:27001 Thu Dec 26 01:11:45.442 BackgroundJob starting: ConnectBG Thu Dec 26 01:11:45.443 [LockPinger] reconnect localhost:27001 ok Thu Dec 26 01:11:54.090 [conn67] Socket recv() timeout 127.0.0.1:27001 Thu Dec 26 01:11:54.090 [conn67] SocketException: remote: 127.0.0.1:27001 error: 9001 socket exception [RECV_TIMEOUT] server [127.0.0.1:27001] Thu Dec 26 01:11:54.090 [conn67] DBClientCursor::init call() failed Thu Dec 26 01:11:54.090 [conn67] User Assertion: 10276:DBClientBase::findN: transport error: localhost:27001 ns: local.$cmd query: { getnonce: 1 } Thu Dec 26 01:11:54.090 [conn67] query failed to: localhost:27001 exception: DBClientBase::findN: transport error: localhost:27001 ns: local.$cmd query: { getnonce: 1 } Thu Dec 26 01:11:54.092 [conn67] Request::process end ns: test.$cmd msg id: 2 op: 2004 attempt: 0 30002ms Thu Dec 26 01:11:54.096 [conn67] Request::process begin ns: admin.$cmd msg id: 3 op: 2004 attempt: 0 Thu Dec 26 01:11:54.096 [conn67] single query: admin.$cmd { replSetGetStatus: 1.0, forShell: 1.0 } ntoreturn: -1 options : 0 Thu Dec 26 01:11:54.096 [conn67] Request::process end ns: admin.$cmd msg id: 3 op: 2004 attempt: 0 0ms Thu Dec 26 01:11:54.098 [conn67] Request::process begin ns: test.foo msg id: 4 op: 2004 attempt: 0 Thu Dec 26 01:11:54.098 [conn67] shard query: test.foo {} Thu Dec 26 01:11:54.098 [conn67] [pcursor] creating pcursor over QSpec { ns: "test.foo", n2skip: 0, n2return: -1, options: 0, query: {}, fields: {} } and CInfo { v_ns: "", filter: {} } Thu Dec 26 01:11:54.098 [conn67] [pcursor] initializing over 1 shards required by [unsharded @ shard0000:localhost:27000] Thu Dec 26 01:11:54.098 [conn67] [pcursor] initializing on shard shard0000:localhost:27000, current connection state is { state: {}, retryNext: false, init: false, finish: false, errored: false } Thu Dec 26 01:11:54.099 [conn67] [pcursor] initialized query (lazily) on shard shard0000:localhost:27000, current connection state is { state: { conn: "localhost:27000", vinfo: "shard0000:localhost:27000", cursor: "(empty)", count: 0, done: false }, retryNext: false, init: true, finish: false, errored: false } Thu Dec 26 01:11:54.099 [conn67] [pcursor] finishing over 1 shards Thu Dec 26 01:11:54.099 [conn67] [pcursor] finishing on shard shard0000:localhost:27000, current connection state is { state: { conn: "localhost:27000", vinfo: "shard0000:localhost:27000", cursor: "(empty)", count: 0, done: false }, retryNext: false, init: true, finish: false, errored: false } Thu Dec 26 01:11:54.099 [conn67] [pcursor] finished on shard shard0000:localhost:27000, current connection state is { state: { conn: "(done)", vinfo: "shard0000:localhost:27000", cursor: { _id: ObjectId('52bb5fb8d741e37cda2f195b'), v: "bar" }, count: 0, done: false }, retryNext: false, init: true, finish: true, errored: false } Thu Dec 26 01:11:54.099 [conn67] Request::process end ns: test.foo msg id: 4 op: 2004 attempt: 0 0ms Thu Dec 26 01:11:54.108 [conn67] Request::process begin ns: admin.$cmd msg id: 5 op: 2004 attempt: 0 Thu Dec 26 01:11:54.108 [conn67] single query: admin.$cmd { replSetGetStatus: 1.0, forShell: 1.0 } ntoreturn: -1 options : 0 Thu Dec 26 01:11:54.108 [conn67] Request::process end ns: admin.$cmd msg id: 5 op: 2004 attempt: 0 0ms Thu Dec 26 01:11:54.110 [conn67] Socket recv() conn closed? 127.0.0.1:52259 Thu Dec 26 01:11:54.110 [conn67] SocketException: remote: 127.0.0.1:52259 error: 9001 socket exception [CLOSED] server [127.0.0.1:52259] Thu Dec 26 01:11:54.110 [conn67] end connection 127.0.0.1:52259 (0 connections now open)