When reading from a sharded cluster via mongos with a specific read preference, mongos never re-evaluates the preference as long as it connects to a valid member. This can in certain circumstances lead to situations where mongos reads from nodes for prolonged times that do not match the user's intention and expectation.
When the "secondaryPreferred" read preference is set, mongos connects to an available secondary on a new connection for reads. If there are no longer any available secondaries, mongos correctly switches to a primary node. However, even when a secondary node is available again, mongos does not switch back to read from the secondary node. The connection is pinned to the primary because under "secondaryPreferred", the primary is a valid target to read from and no re-evaluation is carried out until the the target becomes invalid or unreachable.
Reads can go to primary nodes for prolonged times even though the user specified that they prefer secondary reads. Users may not even be aware of this fact, if they don't closely monitor the state of their replica sets at all times. Depending on the application architecture, this can lead to degraded read and write throughput.
The only workaround is to forcibly unpin the connection by specifying a different readPreference on said connection.
All previous production releases are affected by this issue.
The fix is included in the 2.6.4 production release.
- Secondary connections are now drawn from the global pool.
- For mongos, the active ReplicaSet connection will release its secondary connection back to the pool after the end of the query/command. This also has a side effect of 'unpinning' the read preference settings. In other words, when this connection is reused again, the node selection will be evaluated again according to the read preference.
As these changes could not be backported to 2.6, a different fix was implemented specifically for 2.6: a new mongos server parameter, internalDBClientRSReselectNodePercentage was introduced. This can be set to any value from 0 to 100 (defaults to 0) and represents the probability (expressed in percentage) of a replica set connection in mongos to reevaluate replica set node selection from scratch, regardless of the compatibility of the current read preference to the last-used secondary node. Extra care should be taken since reselecting a replica set node will destroy the old connection and create a new connection. This means in extreme cases (for example, 100%), mongos can be creating and destroying connections for every read request.