Socket Exception

XMLWordPrintableJSON

    • Type: Bug
    • Resolution: Incomplete
    • Priority: Major - P3
    • None
    • Affects Version/s: 2.4.4
    • Component/s: Networking, Sharding
    • None
    • Linux
    • None
    • 3
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      There are usually SocketException happens, and when this error occurs, we may encount a poor performance of query and other options. The error log is as follow:

      Mon Jan 19 09:33:29.699 [LockPinger] cluster 10.11.200.9:9090,10.11.200.7:9090,10.11.200.8:9090 pinged successfully at Mon Jan 19 09:33:29 2015 by distributed lock pinger '10.11.200.9:9090,10.11.200.7:9090,10.11.200.8:9090/localhost.localdomain:6080:1420004034:1804289383', sleeping for 30000ms
      Mon Jan 19 09:33:30.754 [ReplicaSetMonitorWatcher] Socket recv() timeout  10.11.200.13:9009
      Mon Jan 19 09:33:30.754 [ReplicaSetMonitorWatcher] SocketException: remote: 10.11.200.13:9009 error: 9001 socket exception [3] server [10.11.200.13:9009] 
      Mon Jan 19 09:33:30.754 [ReplicaSetMonitorWatcher] DBClientCursor::init call() failed
      Mon Jan 19 09:33:40.776 [ReplicaSetMonitorWatcher] trying reconnect to 10.11.200.13:9009
      Mon Jan 19 09:33:40.777 [ReplicaSetMonitorWatcher] reconnect 10.11.200.13:9009 ok
      Mon Jan 19 09:33:41.375 [conn100558] end connection 10.11.220.47:45888 (500 connections now open)
      Mon Jan 19 09:33:45.776 [ReplicaSetMonitorWatcher] Socket recv() timeout  10.11.200.13:9009
      Mon Jan 19 09:33:45.776 [ReplicaSetMonitorWatcher] SocketException: remote: 10.11.200.13:9009 error: 9001 socket exception [3] server [10.11.200.13:9009] 
      Mon Jan 19 09:33:45.776 [ReplicaSetMonitorWatcher] DBClientCursor::init call() failed
      Mon Jan 19 09:33:56.767 [WriteBackListener-10.11.200.11:9009] DBClientCursor::init call() failed
      Mon Jan 19 09:33:56.767 [WriteBackListener-10.11.200.11:9009] Detected bad connection created at 1421607926129005 microSec, clearing pool for 10.11.200.11:9009
      Mon Jan 19 09:33:56.767 [WriteBackListener-10.11.200.11:9009] WriteBackListener exception : DBClientBase::findN: transport error: 10.11.200.11:9009 ns: admin.$cmd query: { writebacklisten: ObjectId('54a38a7b9304f1764a938034') }
      Mon Jan 19 09:34:00.394 [conn100563] got not master for: 10.11.200.11:9009
      Mon Jan 19 09:34:00.798 [ReplicaSetMonitorWatcher] Socket recv() timeout  10.11.200.12:9009
      Mon Jan 19 09:34:00.798 [ReplicaSetMonitorWatcher] SocketException: remote: 10.11.200.12:9009 error: 9001 socket exception [3] server [10.11.200.12:9009] 
      Mon Jan 19 09:34:00.798 [ReplicaSetMonitorWatcher] DBClientCursor::init call() failed
      Mon Jan 19 09:34:00.798 [ReplicaSetMonitorWatcher] Detected bad connection created at 1421596473823061 microSec, clearing pool for 10.11.200.12:9009
      Mon Jan 19 09:34:00.799 [ReplicaSetMonitorWatcher] trying reconnect to 10.11.200.13:9009
      Mon Jan 19 09:34:00.799 [ReplicaSetMonitorWatcher] reconnect 10.11.200.13:9009 ok
      Mon Jan 19 09:34:02.254 [conn100457] DBClientCursor::init lazy say() failed
      Mon Jan 19 09:34:02.254 [conn100343] DBClientCursor::init lazy say() failed
      Mon Jan 19 09:34:02.254 [conn86557] DBClientCursor::init call() failed
      Mon Jan 19 09:34:02.254 [conn100571] DBClientCursor::init lazy say() failed
      Mon Jan 19 09:34:02.254 [conn100569] DBClientCursor::init lazy say() failed
      Mon Jan 19 09:34:02.254 [conn100343] DBClientCursor::init message from say() was empty
      Mon Jan 19 09:34:02.255 [conn100569] DBClientCursor::init message from say() was empty
      Mon Jan 19 09:34:02.254 [conn100457] DBClientCursor::init message from say() was empty
      Mon Jan 19 09:34:02.254 [ReplicaSetMonitorWatcher] DBClientCursor::init call() failed
      Mon Jan 19 09:34:02.255 [conn100568] Detected bad connection created at 1421607555118835 microSec, clearing pool for 10.11.200.13:9009
      Mon Jan 19 09:34:02.255 [conn100569] got not master for: 10.11.200.13:9009
      Mon Jan 19 09:34:02.255 [conn86557] warning: could not autosplit collection single_msg.single_msg :: caused by :: 10276 DBClientBase::findN: transport error: 10.11.200.13:9009 ns: admin.$cmd query: { splitVector: "single_msg.single_msg", keyPattern: { _id: 1.0 }, min: { _id: 338721191280379031 }, max: { _id: 338721263261672676 }, maxChunkSizeBytes: 67108864, maxSplitPoints: 2, maxChunkObjects: 250000 }
      Mon Jan 19 09:34:02.255 [conn100342] DBClientCursor::init lazy say() failed
      Mon Jan 19 09:34:02.255 [conn100342] DBClientCursor::init message from say() was empty
      Mon Jan 19 09:34:02.255 [conn100342] got not master for: 10.11.200.13:9009
      Mon Jan 19 09:34:02.255 [conn100569] warning: exception when receiving data from set9:set9/10.11.200.11:9009,10.11.200.12:9009,10.11.200.13:9009, current connection state is { state: { conn: "set9/10.11.200.11:9009,10.11.200.12:9009,10.11.200.13:9009", vinfo: "mass_msg.mass_msg @ 50|9619||5453417ef0be311c3f688143", cursor: "(empty)", count: 0, done: false }, retryNext: false, init: true, finish: true, errored: false } :: caused by :: 15988 error querying server
      Mon Jan 19 09:34:02.255 [conn100457] got not master for: 10.11.200.13:9009
      Mon Jan 19 09:34:02.255 [conn86541] Socket say send() errno:104 Connection reset by peer 10.11.200.13:9009
      Mon Jan 19 09:34:02.255 [conn100342] warning: exception when receiving data from set9:set9/10.11.200.11:9009,10.11.200.12:9009,10.11.200.13:9009, current connection state is { state: { conn: "set9/10.11.200.11:9009,10.11.200.12:9009,10.11.200.13:9009", vinfo: "single_msg.single_msg @ 51|9499||545742f8383232076b9e2582", cursor: "(empty)", count: 0, done: false }, retryNext: false, init: true, finish: true, errored: false } :: caused by :: 15988 error querying server
      Mon Jan 19 09:34:02.255 [conn100571] DBClientCursor::init message from say() was empty
      Mon Jan 19 09:34:02.255 [conn100343] got not master for: 10.11.200.13:9009
      Mon Jan 19 09:34:02.255 [WriteBackListener-10.11.200.13:9009] DBClientCursor::init call() failed
      Mon Jan 19 09:34:02.255 [conn100563] Socket recv() errno:104 Connection reset by peer 10.11.200.11:9009
      Mon Jan 19 09:34:02.256 [conn86541] warning: exception during insert :: caused by :: 16460 error inserting 1 documents to shard set9:set9/10.11.200.11:9009,10.11.200.12:9009,10.11.200.13:9009 at version 50|9619||5453417ef0be311c3f688143 :: caused by :: socket exception [SEND_ERROR] for 10.11.200.13:9009
      Mon Jan 19 09:34:02.256 [conn100563] SocketException: remote: 10.11.200.11:9009 error: 9001 socket exception [1] server [10.11.200.11:9009] 
      Mon Jan 19 09:34:02.256 [conn100343] warning: exception when receiving data from set9:set9/10.11.200.11:9009,10.11.200.12:9009,10.11.200.13:9009, current connection state is { state: { conn: "set9/10.11.200.11:9009,10.11.200.12:9009,10.11.200.13:9009", vinfo: "mass_msg.mass_msg @ 50|9619||5453417ef0be311c3f688143", cursor: "(empty)", count: 0, done: false }, retryNext: false, init: true, finish: true, errored: false } :: caused by :: 15988 error querying server
      Mon Jan 19 09:34:02.255 [conn100457] warning: exception when receiving data from set9:set9/10.11.200.11:9009,10.11.200.12:9009,10.11.200.13:9009, current connection state is { state: { conn: "set9/10.11.200.11:9009,10.11.200.12:9009,10.11.200.13:9009", vinfo: "media_doc.media_doc @ 51|367||545341bbf0be311c3f688144", cursor: "(empty)", count: 0, done: false }, retryNext: false, init: true, finish: true, errored: false } :: caused by :: 15988 error querying server
      Mon Jan 19 09:34:02.256 [WriteBackListener-10.11.200.13:9009] WriteBackListener exception : DBClientBase::findN: transport error: 10.11.200.13:9009 ns: admin.$cmd query: { writebacklisten: ObjectId('54a38a7b9304f1764a938034') }
      Mon Jan 19 09:34:02.256 [conn100563] DBClientCursor::init call() failed
      Mon Jan 19 09:34:02.255 [conn100571] got not master for: 10.11.200.13:9009
      Mon Jan 19 09:34:02.256 [conn100571] warning: exception when receiving data from set9:set9/10.11.200.11:9009,10.11.200.12:9009,10.11.200.13:9009, current connection state is { state: { conn: "set9/10.11.200.11:9009,10.11.200.12:9009,10.11.200.13:9009", vinfo: "mass_msg.mass_msg @ 50|9619||5453417ef0be311c3f688143", cursor: "(empty)", count: 0, done: false }, retryNext: false, init: true, finish: true, errored: false } :: caused by :: 15988 error querying server
      Mon Jan 19 09:34:02.257 [conn100568] end connection 10.11.220.29:54661 (499 connections now open)
      Mon Jan 19 09:34:02.264 [mongosMain] connection accepted from 10.11.220.31:57357 #100572 (500 connections now open)
      Mon Jan 19 09:34:02.265 [conn100563] trying reconnect to 10.11.200.13:9009
      Mon Jan 19 09:34:02.265 [conn100563] reconnect 10.11.200.13:9009 ok
      Mon Jan 19 09:34:02.266 [conn100563] Socket recv() errno:104 Connection reset by peer 10.11.200.13:9009
      Mon Jan 19 09:34:02.266 [conn100563] SocketException: remote: 10.11.200.13:9009 error: 9001 socket exception [1] server [10.11.200.13:9009] 
      Mon Jan 19 09:34:02.266 [conn100563] DBClientCursor::init call() failed
      Mon Jan 19 09:34:02.266 [conn100563] Detected bad connection created at 1421607794297937 microSec, clearing pool for 10.11.200.13:9009
      Mon Jan 19 09:34:02.266 [conn100563] trying reconnect to 10.11.200.11:9009
      Mon Jan 19 09:34:02.267 [conn100563] reconnect 10.11.200.11:9009 ok
      Mon Jan 19 09:34:02.267 [conn100563] Socket recv() errno:104 Connection reset by peer 10.11.200.11:9009
      Mon Jan 19 09:34:02.267 [conn100563] SocketException: remote: 10.11.200.11:9009 error: 9001 socket exception [1] server [10.11.200.11:9009] 
      Mon Jan 19 09:34:02.268 [conn100563] DBClientCursor::init call() failed
      Mon Jan 19 09:34:02.268 [conn100563] Detected bad connection created at 1421607925189948 microSec, clearing pool for 10.11.200.11:9009
      Mon Jan 19 09:34:03.268 [conn100563] ReplicaSetMonitor::_checkConnection: 10.11.200.11:9009 { setName: "set9", ismaster: false, secondary: true, hosts: [ "10.11.200.11:9009", "10.11.200.13:9009", "10.11.200.12:9009" ], me: "10.11.200.11:9009", maxBsonObjectSize: 16777216, maxMessageSizeBytes: 48000000, localTime: new Date(1421631243269), ok: 1.0 }
      Mon Jan 19 09:34:03.272 [mongosMain] connection accepted from 10.11.220.29:56287 #100573 (501 connections now open)
      Mon Jan 19 09:34:08.268 [conn100563] Socket recv() timeout  10.11.200.12:9009
      Mon Jan 19 09:34:08.268 [conn100563] SocketException: remote: 10.11.200.12:9009 error: 9001 socket exception [3] server [10.11.200.12:9009] 
      Mon Jan 19 09:34:08.268 [conn100563] DBClientCursor::init call() failed
      Mon Jan 19 09:34:08.268 [conn100563] ReplicaSetMonitor::_checkConnection: caught exception 10.11.200.12:9009 DBClientBase::findN: transport error: 10.11.200.12:9009 ns: admin.$cmd query: { ismaster: 1 }
      Mon Jan 19 09:34:13.268 [conn100563] Socket recv() timeout  10.11.200.13:9009
      Mon Jan 19 09:34:13.268 [conn100563] SocketException: remote: 10.11.200.13:9009 error: 9001 socket exception [3] server [10.11.200.13:9009] 
      Mon Jan 19 09:34:13.268 [conn100563] DBClientCursor::init call() failed
      Mon Jan 19 09:34:13.268 [conn100563] ReplicaSetMonitor::_checkConnection: caught exception 10.11.200.13:9009 DBClientBase::findN: transport error: 10.11.200.13:9009 ns: admin.$cmd query: { ismaster: 1 }
      Mon Jan 19 09:34:13.269 [conn100573] Primary for replica set set9 changed to 10.11.200.11:9009
      Mon Jan 19 09:34:13.270 [conn100572] ReplicaSetMonitor::_checkConnection: 10.11.200.11:9009 { setName: "set9", ismaster: true, secondary: false, hosts: [ "10.11.200.11:9009", "10.11.200.13:9009", "10.11.200.12:9009" ], primary: "10.11.200.11:9009", me: "10.11.200.11:9009", maxBsonObjectSize: 16777216, maxMessageSizeBytes: 48000000, localTime: new Date(1421631253270), ok: 1.0 }
      Mon Jan 19 09:34:13.270 [conn86557] ReplicaSetMonitor::_checkConnection: 10.11.200.11:9009 { setName: "set9", ismaster: true, secondary: false, hosts: [ "10.11.200.11:9009", "10.11.200.13:9009", "10.11.200.12:9009" ], primary: "10.11.200.11:9009", me: "10.11.200.11:9009", maxBsonObjectSize: 16777216, maxMessageSizeBytes: 48000000, localTime: new Date(1421631253271), ok: 1.0 }
      Mon Jan 19 09:34:13.271 [conn100457] ReplicaSetMonitor::_checkConnection: 10.11.200.11:9009 { setName: "set9", ismaster: true, secondary: false, hosts: [ "10.11.200.11:9009", "10.11.200.13:9009", "10.11.200.12:9009" ], primary: "10.11.200.11:9009", me: "10.11.200.11:9009", maxBsonObjectSize: 16777216, maxMessageSizeBytes: 48000000, localTime: new Date(1421631253272), ok: 1.0 }
      Mon Jan 19 09:34:13.272 [conn100342] ReplicaSetMonitor::_checkConnection: 10.11.200.11:9009 { setName: "set9", ismaster: true, secondary: false, hosts: [ "10.11.200.11:9009", "10.11.200.13:9009", "10.11.200.12:9009" ], primary: "10.11.200.11:9009", me: "10.11.200.11:9009", maxBsonObjectSize: 16777216, maxMessageSizeBytes: 48000000, localTime: new Date(1421631253272), ok: 1.0 }
      Mon Jan 19 09:34:13.273 [conn86541] ReplicaSetMonitor::_checkConnection: 10.11.200.11:9009 { setName: "set9", ismaster: true, secondary: false, hosts: [ "10.11.200.11:9009", "10.11.200.13:9009", "10.11.200.12:9009" ], primary: "10.11.200.11:9009", me: "10.11.200.11:9009", maxBsonObjectSize: 16777216, maxMessageSizeBytes: 48000000, localTime: new Date(1421631253273), ok: 1.0 }
      Mon Jan 19 09:34:14.269 [conn100563] warning: No primary detected for set set9
      Mon Jan 19 09:34:14.269 [conn100563] warning: db exception when initializing on set9:set9/10.11.200.11:9009,10.11.200.12:9009,10.11.200.13:9009, current connection state is { state: { conn: "set9/10.11.200.11:9009,10.11.200.12:9009,10.11.200.13:9009", vinfo: "media_doc.media_doc @ 51|367||545341bbf0be311c3f688144", cursor: "(none)", count: 0, done: false }, retryNext: false, init: false, finish: false, errored: false } :: caused by :: 10009 ReplicaSetMonitor no master found for set: set9
      Mon Jan 19 09:34:14.279 [mongosMain] connection accepted from 10.11.220.24:52139 #100574 (502 connections now open)
      Mon Jan 19 09:34:14.300 [conn100574] Primary for replica set set9 changed to 10.11.200.11:9009
      Mon Jan 19 09:34:17.258 [ReplicaSetMonitorWatcher] Socket recv() timeout  10.11.200.15:9010
      Mon Jan 19 09:34:17.258 [ReplicaSetMonitorWatcher] SocketException: remote: 10.11.200.15:9010 error: 9001 socket exception [3] server [10.11.200.15:9010] 
      Mon Jan 19 09:34:17.258 [ReplicaSetMonitorWatcher] DBClientCursor::init call() failed
      Mon Jan 19 09:34:17.956 [conn100574] end connection 10.11.220.24:52139 (501 connections now open)
      Mon Jan 19 09:34:22.258 [ReplicaSetMonitorWatcher] Socket recv() timeout  10.11.200.16:9010
      Mon Jan 19 09:34:22.258 [ReplicaSetMonitorWatcher] SocketException: remote: 10.11.200.16:9010 error: 9001 socket exception [3] server [10.11.200.16:9010] 
      Mon Jan 19 09:34:22.258 [ReplicaSetMonitorWatcher] DBClientCursor::init call() failed
      Mon Jan 19 09:34:22.276 [ReplicaSetMonitorWatcher] trying reconnect to 10.11.200.12:9009
      Mon Jan 19 09:34:22.276 [ReplicaSetMonitorWatcher] reconnect 10.11.200.12:9009 ok
      Mon Jan 19 09:34:22.277 [ReplicaSetMonitorWatcher] trying reconnect to 10.11.200.13:9009
      Mon Jan 19 09:34:22.277 [ReplicaSetMonitorWatcher] reconnect 10.11.200.13:9009 ok
      Mon Jan 19 09:34:37.279 [ReplicaSetMonitorWatcher] Socket recv() timeout  10.11.200.14:9010
      Mon Jan 19 09:34:37.279 [ReplicaSetMonitorWatcher] SocketException: remote: 10.11.200.14:9010 error: 9001 socket exception [3] server [10.11.200.14:9010] 
      Mon Jan 19 09:34:37.279 [ReplicaSetMonitorWatcher] DBClientCursor::init call() failed
      Mon Jan 19 09:34:37.280 [ReplicaSetMonitorWatcher] trying reconnect to 10.11.200.14:9010
      Mon Jan 19 09:34:37.280 [ReplicaSetMonitorWatcher] reconnect 10.11.200.14:9010 ok
      Mon Jan 19 09:34:41.374 [mongosMain] connection accepted from 10.11.220.47:46474 #100575 (502 connections now open)
      Mon Jan 19 09:34:42.280 [ReplicaSetMonitorWatcher] Socket recv() timeout  10.11.200.14:9010
      Mon Jan 19 09:34:42.280 [ReplicaSetMonitorWatcher] SocketException: remote: 10.11.200.14:9010 error: 9001 socket exception [3] server [10.11.200.14:9010] 
      Mon Jan 19 09:34:42.280 [ReplicaSetMonitorWatcher] DBClientCursor::init call() failed
      Mon Jan 19 09:34:42.280 [ReplicaSetMonitorWatcher] trying reconnect to 10.11.200.15:9010
      Mon Jan 19 09:34:42.281 [ReplicaSetMonitorWatcher] reconnect 10.11.200.15:9010 ok
      Mon Jan 19 09:34:42.282 [ReplicaSetMonitorWatcher] trying reconnect to 10.11.200.14:9010
      Mon Jan 19 09:34:42.282 [ReplicaSetMonitorWatcher] reconnect 10.11.200.14:9010 ok
      Mon Jan 19 09:34:47.282 [ReplicaSetMonitorWatcher] Socket recv() timeout  10.11.200.14:9010
      Mon Jan 19 09:34:47.282 [ReplicaSetMonitorWatcher] SocketException: remote: 10.11.200.14:9010 error: 9001 socket exception [3] server [10.11.200.14:9010] 
      Mon Jan 19 09:34:47.283 [ReplicaSetMonitorWatcher] DBClientCursor::init call() failed
      Mon Jan 19 09:34:47.283 [ReplicaSetMonitorWatcher] trying reconnect to 10.11.200.16:9010
      Mon Jan 19 09:34:47.283 [ReplicaSetMonitorWatcher] reconnect 10.11.200.16:9010 ok
      Mon Jan 19 09:34:47.284 [conn140] trying reconnect to 10.11.200.14:9010
      Mon Jan 19 09:34:47.285 [conn140] reconnect 10.11.200.14:9010 ok
      Mon Jan 19 09:34:47.824 [conn100563] DBClientCursor::init lazy say() failed
      Mon Jan 19 09:34:47.824 [conn100563] DBClientCursor::init message from say() was empty
      Mon Jan 19 09:34:47.824 [conn100563] got not master for: 10.11.200.14:9010
      Mon Jan 19 09:34:47.824 [WriteBackListener-10.11.200.14:9010] DBClientCursor::init call() failed
      Mon Jan 19 09:34:47.824 [conn100573] DBClientCursor::init lazy say() failed
      Mon Jan 19 09:34:47.824 [conn100561] DBClientCursor::init lazy say() failed
      Mon Jan 19 09:34:47.825 [conn100573] DBClientCursor::init message from say() was empty
      Mon Jan 19 09:34:47.825 [conn100564] DBClientCursor::init lazy say() failed
      Mon Jan 19 09:34:47.825 [conn100342] DBClientCursor::init lazy say() failed
      Mon Jan 19 09:34:47.825 [conn100564] DBClientCursor::init message from say() was empty
      Mon Jan 19 09:34:47.825 [conn100457] DBClientCursor::init lazy say() failed
      Mon Jan 19 09:34:47.825 [conn100573] got not master for: 10.11.200.14:9010
      Mon Jan 19 09:34:47.825 [conn100561] DBClientCursor::init message from say() was empty
      Mon Jan 19 09:34:47.825 [conn140] DBClientCursor::init call() failed
      Mon Jan 19 09:34:47.825 [conn100342] DBClientCursor::init message from say() was empty
      Mon Jan 19 09:34:47.825 [conn100565] DBClientCursor::init lazy say() failed
      Mon Jan 19 09:34:47.825 [conn100564] got not master for: 10.11.200.14:9010
      Mon Jan 19 09:34:47.825 [conn100457] DBClientCursor::init message from say() was empty
      Mon Jan 19 09:34:47.825 [conn100457] got not master for: 10.11.200.14:9010
      Mon Jan 19 09:34:47.825 [conn100572] DBClientCursor::init lazy say() failed
      Mon Jan 19 09:34:47.825 [WriteBackListener-10.11.200.14:9010] Detected bad connection created at 1421605308573619 microSec, clearing pool for 10.11.200.14:9010
      Mon Jan 19 09:34:47.825 [conn100342] got not master for: 10.11.200.14:9010
      

              Assignee:
              Ramon Fernandez Marina
              Reporter:
              guoliang
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Created:
                Updated:
                Resolved: