[SERVER-35944] Session Pinning / Server Selection Algorithm Causes Blocking with Causal Consistency Created: 03/Jul/18 Updated: 17/Nov/23 |
|
| Status: | Backlog |
| Project: | Core Server |
| Component/s: | Sharding |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Simon Yarde | Assignee: | Backlog - Cluster Scalability |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||
| Assigned Teams: |
Cluster Scalability
|
||||||||||||
| Sprint: | Sharding 2018-10-22 | ||||||||||||
| Participants: | |||||||||||||
| Description |
|
The server selection algorithm randomly directs reads to different servers which causes blocking if the reads are causally dependent and the selected server has yet to apply the operations depended upon. Example: a web-application can tolerate some staleness but must have predictably fast response times. It reads some state (RS) which is predicated upon by the next read (R). The server selected for RS (S1) could also respond to R without delay but instead a different server (S2) is selected at random whose replication lags behind S1 and the response is blocked until S2 catches up. "Pinning" for clients has been deprecated but client applications that need to distribute reads to secondaries and have predictable latencies under causal consistency could benefit from shorter-lived pinning in sessions. "Session pinning" can be achieved by the future work identified in the Max Staleness specification (below).
Rather than pinning a client to a particular server, a session becomes pinned to a set of eligible servers that can respond equivalently without blocking. Applications may need to consider that whilst starting a new session with no initial last optime (read-concern afterClusterTime) would allow selection from all servers regardless of staleness/lag, servers with the least replication lag may be selected disproportionately because they meet the after-operation-time criteria of more sessions. |
| Comments |
| Comment by Misha Tyulenev [ 19/Oct/18 ] |
|
I believe its a good candidate for 4.2. The fix should go to the RSM server selection algorithm and give preference to the servers that are already at cluster time if it is included in the afterClusterTime read concern and read preference is nearest. This behavior may be different from secondary or secondaryPreffered |