[SERVER-35944] Session Pinning / Server Selection Algorithm Causes Blocking with Causal Consistency Created: 03/Jul/18  Updated: 17/Nov/23

Status: Backlog
Project: Core Server
Component/s: Sharding
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Simon Yarde Assignee: Backlog - Cluster Scalability
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
Duplicate
is duplicated by SERVER-36042 Server Selection Algorithm Causes Blo... Closed
Assigned Teams:
Cluster Scalability
Sprint: Sharding 2018-10-22
Participants:

 Description   

The server selection algorithm randomly directs reads to different servers which causes blocking if the reads are causally dependent and the selected server has yet to apply the operations depended upon.

Example: a web-application can tolerate some staleness but must have predictably fast response times. It reads some state (RS) which is predicated upon by the next read (R). The server selected for RS (S1) could also respond to R without delay but instead a different server (S2) is selected at random whose replication lags behind S1 and the response is blocked until S2 catches up.

"Pinning" for clients has been deprecated but client applications that need to distribute reads to secondaries and have predictable latencies under causal consistency could benefit from shorter-lived pinning in sessions.

"Session pinning" can be achieved by the future work identified in the Max Staleness specification (below).

"If a future spec allows applications to use readConcern "afterOptime" [also "afterClusterTime"], clients should prefer secondaries that have already replicated to that opTime, so reads do not block. This is an extension of the mongos logic for CSRS to applications."

Rather than pinning a client to a particular server, a session becomes pinned to a set of eligible servers that can respond equivalently without blocking.

Applications may need to consider that whilst starting a new session with no initial last optime (read-concern afterClusterTime) would allow selection from all servers regardless of staleness/lag, servers with the least replication lag may be selected disproportionately because they meet the after-operation-time criteria of more sessions.



 Comments   
Comment by Misha Tyulenev [ 19/Oct/18 ]

I believe its a good candidate for 4.2. The fix should go to the RSM server selection algorithm and give preference to the servers that are already at cluster time if it is included in the afterClusterTime read concern and read preference is nearest. This behavior may be different from secondary or secondaryPreffered

Generated at Thu Feb 08 04:41:32 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.