Loading...

XML

Word

Printable

JSON

Type: Improvement
Resolution: Duplicate
Priority: Major - P3
Fix Version/s: None
Affects Version/s: 1.8.1
Component/s: Replication
Labels:
- LAG
- replication
- slaveOk
Environment:
A collection of servers PRIMARY and SECONDAY servers, where the query load is high on the SECONDAY servers and replication load is also very high.

This problem exists in all current Master-Slave servers that I use today. MongoDB has the exact same issues with Replication that I see in MySQL.

The situation is:

1) Primay is adding/updating data at a rate that is near or at the MAX for the hardware.
2) Seconday server is starting to LAG due to loads other than Replication. ( other Secondary servers may be OK at the same time. )
3) Requests for "read" are allowed to provide more read performance.

Effect:

The Seconday starts to LAG and does not keep up with the flow of updates and inserts from the Primary. LAG just grows and grows.

Solution:

Provide ways for the drivers and applications to "back off" and reduce the impact on the Seconday so it can "catch-up". For example select a Seconday that has lower LAG.

Ideas:

1) Add to the slaveOk request a condition of how long the LAG is allowed to be before the request must be performed by some "other" server.
2) Do not allow requests from servers that are over MAX_SLAVE_LAG set by each servers conf file.
3) Push back on the Primary if a high load on ALL Seconday servers would stop replication for all known servers. Could be a Read-Only mode.
4) Stop allowing read requests from servers with high Loads or LAGS.

I know that the Root Cause of this issue is an overloaded cluster. What I am asking for is a nice easy "push-back" from the MongoDB and not a crash.

Today a MySQL Slave will also just stop replication if the OpLog runs too long/late. If the OpLog is very huge this situation can LAG for hours/days.

If the application is OK with a LAG that is Huge, this needs to be allowed. But for applicaitons that require a LAG of say "under 60 sec." a MongoDB feature that helps provide that service would be a great feature.

The cause of the LAG may also be a Secondary server that is used for backups and was not working for a moment due to a backup request. Today all of the Seconday servers get an equal number of requests.

Please call any time:
Cell: 916-202-1600
Skype: EdwardMGoldberg

Edward M. Goldberg
http://myCloudWatcher.com/
e.m.g.

duplicates

SERVER-4936 Server support for "maxStalenessMS" read preference option

Closed

is related to

SERVER-12861 Introduce a maxStalenessMS option when querying secondaries

Closed

SERVER-4935 Mark node Recovering when replication lag exceeds a configured threshold

Closed

Assignee:: A. Jesse Jiryu Davis

Reporter:: Edward M. Goldberg

Participants:: A. Jesse Jiryu Davis, Andy Schwerin, Christian Ribe, Colin Howe, Edward M. Goldberg, hongyu.bi, Juho Mäkinen, Kevin Rice, Ramon Fernandez Marina, ygimantas Stauga

Votes:: 17 Vote for this issue

Watchers:: 26 Start watching this issue

Created:: Jun 29 2011 12:16:29 AM UTC

Updated:: Jul 23 2016 12:55:50 PM UTC

Resolved:: Jul 11 2016 02:05:28 PM UTC

Details

Description

Attachments

Issue Links

Activity

People

Dates