-
Type: Bug
-
Resolution: Cannot Reproduce
-
Priority: Major - P3
-
Affects Version/s: 2.3.1
-
Component/s: Server Selection
-
None
-
Environment:Rails 4.2.7.1, Mongoid 5.1.4, MongoDB 2.6.12
After we perform a failover in our cluster (rs.stepDown) our app throws an error "not master and slaveOk=false". Looking closer, failover seems to happen normally (we have 9 separate replica sets and therefore clients configured)
```
D, 2016-11-03T11:36:31.299424 #9937 DEBUG – : MONGODB | IOError
D, 2016-11-03T11:36:31.406014 #9937 DEBUG – : MONGODB | IOError
D, 2016-11-03T11:36:31.410145 #9937 DEBUG – : MONGODB | IOError
D, 2016-11-03T11:36:32.053781 #9937 DEBUG – : MONGODB | IOError
D, 2016-11-03T11:36:32.055543 #9937 DEBUG – : MONGODB | IOError
D, 2016-11-03T11:36:32.360623 #9937 DEBUG – : MONGODB | IOError
D, 2016-11-03T11:36:32.464632 #9937 DEBUG – : MONGODB | IOError
D, 2016-11-03T11:36:33.145646 #9937 DEBUG – : MONGODB | IOError
D, 2016-11-03T11:36:33.147382 #9937 DEBUG – : MONGODB | IOError
D, 2016-11-03T11:36:41.172013 #9937 DEBUG – : MONGODB | Server 1234567890-a0.domain.com:51936 elected as primary in rs-1234567890.
D, 2016-11-03T11:36:41.586878 #9937 DEBUG – : MONGODB | Server 1234567890-a0.domain.com:51936 elected as primary in rs-1234567890.
D, 2016-11-03T11:36:41.599024 #9937 DEBUG – : MONGODB | Server 1234567890-a0.domain.com:51936 elected as primary in rs-1234567890.
D, 2016-11-03T11:36:41.949060 #9937 DEBUG – : MONGODB | Server 1234567890-a0.domain.com:51936 elected as primary in rs-1234567890.
D, 2016-11-03T11:36:42.090859 #9937 DEBUG – : MONGODB | Server 1234567890-a0.domain.com:51936 elected as primary in rs-1234567890.
D, 2016-11-03T11:36:42.312174 #9937 DEBUG – : MONGODB | Server 1234567890-a0.domain.com:51936 elected as primary in rs-1234567890.
D, 2016-11-03T11:36:42.707659 #9937 DEBUG – : MONGODB | Server 1234567890-a0.domain.com:51936 elected as primary in rs-1234567890.
D, 2016-11-03T11:36:43.195367 #9937 DEBUG – : MONGODB | Server 1234567890-a0.domain.com:51936 elected as primary in rs-1234567890.
D, 2016-11-03T11:36:43.198079 #9937 DEBUG – : MONGODB | Server 1234567890-a0.domain.com:51936 elected as primary in rs-1234567890.
```
However on seemingly random queries after the failover (maybe 5%?) we get the above mentioned error. The error occurs because the mongo driver is returning a stale replica set configuration during those certain queries from our app.
So if I throw in a `puts config` at https://github.com/mongodb/mongo-ruby-driver/blob/master/lib/mongo/server/description.rb#L443 I get a stream of statements showing that that on all 9 clients the correct primary is set to `1234567890-a0.domain.com:51936` which is correct:
```
{"setName"=>"rs-1234567890", "setVersion"=>6, "ismaster"=>false, "secondary"=>true, "hosts"=>["1234567890-a1.domain.com:51936", "1234567890-a0.domain.com:51936"], "arbiters"=>["1234567890-ar0.domain.com:51936"], "primary"=>"1234567890-a0.domain.com:51936", "me"=>"1234567890-a1.domain.com:51936", "maxBsonObjectSize"=>16777216, "maxMessageSizeBytes"=>48000000, "maxWriteBatchSize"=>1000, "localTime"=>2016-11-03 18:54:49 UTC, "maxWireVersion"=>2, "minWireVersion"=>0, "ok"=>1.0}```
However, if I hit our app, certain queries produce the following config object in the logs and try to hit the old primary (now secondary):
```
```
notice that `localTime` is significantly behind the time of the "good config" - no matter how many times I try to hit that endpoint, I will get get the same old primary/old config back from the mongo driver.