[SERVER-32010] SocketException: remote: 10.200.66.92:27019 error: 9001 socket exception Created: 17/Nov/17  Updated: 26/Jan/18  Resolved: 27/Dec/17

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: 3.0.12
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: hancang2000 Assignee: Mark Agarunov
Resolution: Incomplete Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Operating System: ALL
Steps To Reproduce:

we restart the config server will solve this problem, and when this time in de config server will have close_time tcp connect , and in the mongos will have FIN_WAIT2 status.

Participants:

 Description   

Nov 17 16:21:43 [LockPinger] DBClientCursor::init call() failed
Nov 17 16:21:43 [LockPinger] scoped connection to 10.200.67.92:27019,10.200.65.92:27019,10.200.66.92:27019 not being returned to the pool
Nov 17 16:21:43 [LockPinger] distributed lock pinger '10.200.67.92:27019,10.200.65.92:27019,10.200.66.92:27019/g37-gs09-10021.i.nease.net:30000:1507709978:1804289383' detected an exception while pinging. :: caused by :: SyncClusterConnection write op failed: 10.200.66.92:27019 (10.200.66.92) failed: {} DBClientBase::findN: transport error: 10.200.66.92:27019 ns: admin.$cmd query: { getlasterror: 1, fsync: 1 }
Nov 17 16:22:02 Socket recv() timeout  10.200.66.92:27019
Nov 17 16:22:02 SocketException: remote: 10.200.66.92:27019 error: 9001 socket exception [RECV_TIMEOUT] server [10.200.66.92:27019]
Nov 17 16:22:02 DBClientCursor::init call() failed
Nov 17 16:22:02  couldn't check dbhash on config server 10.200.66.92:27019 :: caused by :: 10276 DBClientBase::findN: transport error: 10.200.66.92:27019 ns: config.$cmd query: { dbhash: 1, collections: [ "chunks", "databases", "collections", "shards", "version" ] }
Nov 17 16:22:18 [UserCacheInvalidator] SyncClusterConnection connecting to [10.200.67.92:27019]
Nov 17 16:22:18 [UserCacheInvalidator] SyncClusterConnection connecting to [10.200.65.92:27019]
Nov 17 16:22:18 [UserCacheInvalidator] SyncClusterConnection connecting to [10.200.66.92:27019]
Nov 17 16:22:45 [LockPinger] Socket recv() timeout  10.200.66.92:27019
Nov 17 16:22:45 [LockPinger] SocketException: remote: 10.200.66.92:27019 error: 9001 socket exception [RECV_TIMEOUT] server [10.200.66.92:27019]
Nov 17 16:22:45 [LockPinger] DBClientCursor::init call() failed
Nov 17 16:22:45 [LockPinger] scoped connection to 10.200.67.92:27019,10.200.65.92:27019,10.200.66.92:27019 not being returned to the pool
Nov 17 16:22:45 [LockPinger] distributed lock pinger '10.200.67.92:27019,10.200.65.92:27019,10.200.66.92:27019/g37-gs09-10021.i.nease.net:30000:1507709978:1804289383' detected an exception while pinging. :: caused by :: SyncClusterConnection write op failed: 10.200.66.92:27019 (10.200.66.92) failed: {} DBClientBase::findN: transport error: 10.200.66.92:27019 ns: admin.$cmd query: { getlasterror: 1, fsync: 1 }



 Comments   
Comment by Mark Agarunov [ 27/Dec/17 ]

Hello zhouhancang,

We haven’t heard back from you for some time, so I’m going to mark this ticket as resolved. If this is still an issue for you, please provide additional information and we will reopen the ticket.

Thanks,
Mark

Comment by Mark Agarunov [ 01/Dec/17 ]

Hello zhouhancang,

We still need additional information to diagnose the problem. If this is still an issue for you, would you please provide the following:

  • The complete log files from all of the affected nodes
  • An archive (tar or zip) of the $dbpath/diagnostic.data directory from all affected mongod nodes.

Thanks,
Mark

Comment by Mark Agarunov [ 17/Nov/17 ]

Hello zhouhancang,

Thank you for the report. To get a better idea of what may be causing this, could you please provide the following:

  • The complete log files from all of the affected nodes
  • An archive (tar or zip) of the $dbpath/diagnostic.data directory from all affected mongod nodes.

This should give some insight into why you are experiencing this behavior.

Thanks,
Mark

Generated at Thu Feb 08 04:28:53 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.