[SERVER-3184] Need to specify timeout for replica set Created: 02/Jun/11  Updated: 29/Aug/11  Resolved: 04/Jun/11

Status: Closed
Project: Core Server
Component/s: Internal Client
Affects Version/s: None
Fix Version/s: None

Type: Improvement Priority: Major - P3
Reporter: Alexey Guseynov Assignee: Unassigned
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: File diff    
Backwards Compatibility: Major Change
Participants:

 Description   

We have run in-to following issue in our software:
We have 3 servers running mongodb in replicaset configuration. One servers runs our application and is loaded by testing software at rate of 50 requests pes second. Then we configure iptables on server running primary monga node to drop all packets from and to other servers. By this we emulate network probems which may occur.
In some small period of time (I think 20 seconds) secondary nodes understand, that link is broken and elect new primary. Server works fine. But not our application. C++ driver uses default network timeout for connection which on our system is 15 minutes. When netwok fails our application still tries to make queries and all available threas hang inside C++ driver waiting for network response. For next 15 minutes our service does not respond because all threads in thread pool are used out.
What we need is to set timeout for about 15 seconds instead of 15 minutes.

Attached patch adds ability to specify timeout to replica set. This patch breaks ABI, but API is backward compatible.



 Comments   
Comment by Alexey Guseynov [ 08/Jun/11 ]

Actually, described situation is not something that never occurs. Consider application, that eats all RAM. As far as I know mongo may do this eventually in openVZ container. System becomes veruy unresponsible, but it is still reachable. You can even ping it because ping is implemented on kernel level. So you are not going to get "Destination unreachable". But mongo would not be able to process any requests.
Or you can have hardware problems withh you hard drives. This may cause filesystem to freeze and freeze all processes using that filesystem.

Comment by Alexey Guseynov [ 05/Jun/11 ]

Before implementing this feature I've asked our server administrators group whether this situation is possible. They answered that it is. So we decided, that we need to fix it. And we don' t want to maintain our branch of mongodb driver, so we would like to push this patch upstream.
Currently I don't see any downsides of this solution so I don't understand why it shouldn't be implemented even if problem the it fixes is quite rare.

Comment by Eliot Horowitz (Inactive) [ 04/Jun/11 ]

This is not a very standard test.
If there are network issues, you would normally get an error, not a black hole effect.

Generated at Thu Feb 08 03:02:19 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.