[SERVER-4936] Server support for "maxStalenessMS" read preference option Created: 12/Feb/12  Updated: 22/Mar/17  Resolved: 27/Aug/16

Status: Closed
Project: Core Server
Component/s: Replication
Affects Version/s: None
Fix Version/s: 3.3.12

Type: Improvement Priority: Major - P3
Reporter: Eliot Horowitz (Inactive) Assignee: Misha Tyulenev
Resolution: Done Votes: 3
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Documented
is documented by DOCS-8723 3.4: add maxStalenessSeconds to read ... Closed
is documented by DOCS-9544 Docs for SERVER-4936: Server support ... Closed
Duplicate
is duplicated by SERVER-12861 Introduce a maxStalenessMS option whe... Closed
is duplicated by SERVER-3346 MAX SLAVE LAG - Features to provide a... Closed
Related
related to SERVER-25842 Secondary started accepting queries b... Closed
related to SERVER-23892 Do periodic replicated writes every 1... Closed
related to SERVER-4935 Mark node Recovering when replication... Closed
related to SERVER-8858 Extend `isMaster` to return opTime an... Closed
Backwards Compatibility: Fully Compatible
Sprint: Sharding 18 (08/05/16), Sharding 2016-08-29, Sharding 2016-09-19
Participants:

 Description   

This would be meta information returned with the query results



 Comments   
Comment by A. Jesse Jiryu Davis [ 23/Jul/16 ]

I've renamed this ticket "Server support for "maxStalenessMS" read preference option" and set its version to MongoDB 3.3.11. Let's make this the central ticket for tracking server-side support for the maxStalenessMS feature.

The drivers side of the feature is DRIVERS-293. The spec is in these two documents:

https://github.com/mongodb/specifications/tree/master/source/max-staleness
https://github.com/mongodb/specifications/blob/master/source/server-selection/server-selection.rst#maxstalenessms

Comment by James Blackburn [ 15/Sep/15 ]

It's currently possible for a SECONDARY to be infinitely lagged w.r.t. the PRIMARY (in version 3.0.6). We saw an issue where large packets (>1500bytes were lost by the network). Heartbeat still work, but replication stopped. However the SECONDARY never stopped servicing queries CS-24224.

Ideally the SECONDARY should move to RECOVERING if it becomes too stale, or be configured to fail altogether.

We have a reasonable tolerance for stale secondaries, but not for hours or days...

Comment by Henrik Ingo (Inactive) [ 12/Aug/14 ]

Actually, this is a broader topic than just knowing how much delayed your secondary was. For example, if every operation returned the opTime (or equivalent term+serial stamp in Raft), then you could guarantee consistent reads (or at least detect inconsistent ones) via the client ensuring that the sequence of opTimes it receives is growing. (...to be precise, it must never be less than an already observed opTime).

The common use case is "read your own writes": this could be achieved from a secondary by checking that the opTime of the secondary >= the opTime of the just completed write.

Most likely a similar mechanism would be helpful if we want to implement automatic retry after failover. (The challenge there is knowing which ops need to be retried.) However, that may be a bit more complicated than just server returning opTime to client. I'll elaborate on that somewhere else.

Generated at Thu Feb 08 03:07:23 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.