-
Type: Task
-
Resolution: Done
-
Affects Version/s: None
-
Component/s: None
The retry code has logic for "not master" errors, but I'm pretty sure it won't actually work, it'll just retry against the non-master node several times and give up: https://github.com/mongoid/mongoid/blob/master/lib/mongoid/collections/retry.rb#L33
We saw something like this tonight, a mostly properly configured Mongoid didn't fail over properly after a somewhat janky stepdown. Both servers stayed available, the connections didn't drop, but the member Mongoid thought was primary suddenly became secondary.
Am I interpreting this code right? It seems like the retry on exception should issue a reconnect, then retry.