Loading...

XML

Word

Printable

JSON

Type: Task
Resolution: Done
Fix Version/s: 2.4.0
Affects Version/s: None
Component/s: None
Labels:
- feature

Confidence Status:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Link:
None
Goal Name(s):
None

When a replica set is reconfigured (e.g. forcing a member to be primary) the mongo driver may raise a Mongo::OperationFailure error, with a message "10054: not master". This happens because the current master has changed, but the Mongo connection still points to the previous one. Reconnecting after this error seems to work, but only after the new primary has been elected (which can take some time).

While this could be handled by the application, it would make sense to handle this error and attempt to reconnect. In fact, mongoid already does this in Mongo::Collections::Retry module, but it only rescue from Mongo::ConnectionFailure. The only difference is that Mongo::OperationFailure could be raised with other error messages, meaning different kind of errors, specially when using safe mode (you can check for it in here).

My first attempt to solve this would be to add another rescue like this:

def retry_on_connection_failure
retries = 0
begin
yield
rescue Mongo::ConnectionFailure => ex
retries += 1
raise ex if retries > Mongoid.max_retries_on_connection_failure
Kernel.sleep(0.5)
retry
rescue Mongo::OperationFailure => ex
if ex.message =~ /not master/

master has changed, retrying to connect
retries += 1
raise ex if retries > Mongoid.max_retries_on_connection_failure
Kernel.sleep(0.5)
retry
else
some other Mongo::OperationFailure error, re-raising it
raise ex
end
end
end

Any suggestions on this topic?

Assignee:: Unassigned
Reporter:: Vicente Mundim
Votes:: 0 Vote for this issue
Watchers:: 0 Start watching this issue

Created:: Nov 10 2011 09:22:25 PM UTC
Updated:: May 29 2015 02:10:21 PM UTC
Resolved:: May 29 2015 02:10:21 PM UTC

Details

Description

Attachments

Activity

People

Dates