[SERVER-7519] SocketException: remote: 10.81.221.12:54344 error: 9001 Created: 31/Oct/12  Updated: 07/Mar/13  Resolved: 07/Mar/13

Status: Closed
Project: Core Server
Component/s: Performance
Affects Version/s: 2.2.0
Fix Version/s: None

Type: Bug Priority: Blocker - P1
Reporter: larry.loi Assignee: Stennie Steneker (Inactive)
Resolution: Incomplete Votes: 1
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

sandbox. Linux redhat enterprise 2.6.18-274.el5 x86_64
HW Intel(R) Xeon(R) CPU X5650 @ 2.67GHz * 2 for each VM (Total 2 VM for DB servers)


Attachments: File LOG.rar    
Operating System: Linux
Participants:

 Description   

we are running performance load test in sandbox env. we got 4 shard server running on 2 VM with 2 cpu and 8GB memory. we got SocketException: remote: 10.81.221.12:54344 error: 9001 on and off, even change several version for our DB and App servers. it block our production plan at the moment.



 Comments   
Comment by Stennie Steneker (Inactive) [ 07/Mar/13 ]

Hi Larry,

We don't have any further information to investigate for this issue so I'm closing as "Incomplete" since it has been idling a few months now. If you are still experiencing this issue and have further information on keepalive settings (as per Randolph's last comment) please feel free to reopen.

Alternatively, if you were able to find a resolution it would be useful to comment so others may benefit from your fix.

Thanks,
Stephen

Comment by Randolph Tan [ 03/Jan/13 ]

Hi,

Here is a summary of my findings on your logs. The socket errors you are getting from logs were caused by the sockets being closed on the other end (mongos). I also noticed 2 patterns of these errors in your log:

1. The error happened close to the time the mongos restarted. This is expected because the process would have closed the sockets when it exits.
2. Right before the error, there is no activity for about 30 minutes or more in the thread handling the incoming connection from mongos. One possible explanation for this is that the socket was assumed to be dead because of inactivity so it was closed. Do you know the TCP keepalive setting (http://www.mongodb.org/display/DOCS/Troubleshooting#Troubleshooting-Socketerrorsinshardedclustersandreplicasets) of all the machines in this cluster?

Comment by larry.loi [ 31/Oct/12 ]

Dear Eliot,

thanks for your reply, you mention IP is the application server installed mongos. and our application connect to local mongos then perform DB operations. we got another 2 VM as DB server; 2 shard server are running on top of each VM. I also attach their log files here. please help to check.

Comment by Eliot Horowitz (Inactive) [ 31/Oct/12 ]

What machine is 10.81.221.12?
Would need a lot more info to tell anything useful.
Logs, topology, etc...

Generated at Thu Feb 08 03:14:46 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.