[SERVER-3346] MAX SLAVE LAG - Features to provide a more stable Replication Set under high load Created: 29/Jun/11 Updated: 23/Jul/16 Resolved: 11/Jul/16 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Replication |
| Affects Version/s: | 1.8.1 |
| Fix Version/s: | None |
| Type: | Improvement | Priority: | Major - P3 |
| Reporter: | Edward M. Goldberg | Assignee: | A. Jesse Jiryu Davis |
| Resolution: | Duplicate | Votes: | 17 |
| Labels: | LAG, replication, slaveOk | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: |
A collection of servers PRIMARY and SECONDAY servers, where the query load is high on the SECONDAY servers and replication load is also very high. |
||
| Issue Links: |
|
||||||||||||||||||||||||
| Participants: | |||||||||||||||||||||||||
| Description |
|
This problem exists in all current Master-Slave servers that I use today. MongoDB has the exact same issues with Replication that I see in MySQL. The situation is: 1) Primay is adding/updating data at a rate that is near or at the MAX for the hardware. Effect: The Seconday starts to LAG and does not keep up with the flow of updates and inserts from the Primary. LAG just grows and grows. Solution: Provide ways for the drivers and applications to "back off" and reduce the impact on the Seconday so it can "catch-up". For example select a Seconday that has lower LAG. Ideas: 1) Add to the slaveOk request a condition of how long the LAG is allowed to be before the request must be performed by some "other" server. I know that the Root Cause of this issue is an overloaded cluster. What I am asking for is a nice easy "push-back" from the MongoDB and not a crash. Today a MySQL Slave will also just stop replication if the OpLog runs too long/late. If the OpLog is very huge this situation can LAG for hours/days. If the application is OK with a LAG that is Huge, this needs to be allowed. But for applicaitons that require a LAG of say "under 60 sec." a MongoDB feature that helps provide that service would be a great feature. The cause of the LAG may also be a Secondary server that is used for backups and was not working for a moment due to a backup request. Today all of the Seconday servers get an equal number of requests. Please call any time: Edward M. Goldberg |
| Comments |
| Comment by Ramon Fernandez Marina [ 23/Jul/16 ] |
|
I have marked this ticket as a duplicate of Users interested in this feature can tune to Regards, |
| Comment by Andy Schwerin [ 11/Jul/16 ] |
|
I believe that this request is effectively duplicated by a combination of |
| Comment by Kevin Rice [ 01/Mar/16 ] |
|
This also applies to situations where the slave is running on slower hardware and batches are infrequent. This is okay if the rate of updates is not big and we are only using the slave as a "backup" server, or a read-assist server during the day, and the updates happen at night. BUT: if the rate of updates gets too large, the slaves can't catch up by the start of business in the morning, or, perhaps ever. Having the mix of hardware would allow cheaper slave servers with the tradeoff that the info might be up to time-lag seconds old. Once the time-lag was too large, though, there should be a throttle on updates so it all can catch up instead of falling over and dying. |
| Comment by hongyu.bi [ 21/Dec/15 ] |
|
+1 |
| Comment by ygimantas Stauga [ 17/Dec/15 ] |
|
+1 |
| Comment by Juho Mäkinen [ 10/Sep/13 ] |
|
A slave can be lagging behind for example when it has been resurrected from backup. Currently DBA needs to alter the configuration and mark the slave to be hidden until it has caught up, a task which this would make obsolete. +1 |
| Comment by Colin Howe [ 08/May/12 ] |
|
This would be awesome. We are hosted on EC2 and sometimes are slaves get a little bit more behind than normal due to the hardware playing up. If that happens we want to stop the reads automatically. Currently, we're thinking of a watch process that changes the slave to 'hidden' if the lag goes over a certain value.. |
| Comment by Christian Ribe [ 03/Sep/11 ] |
|
Indeed, managing replication lag needs to be added. |