[SERVER-23392] Increase Replication Rollback (Data) Limit Created: 29/Mar/16 Updated: 08/Nov/22 Resolved: 17/Apr/18 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Replication |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Improvement | Priority: | Major - P3 |
| Reporter: | Alexander Komyagin | Assignee: | Judah Schvimer |
| Resolution: | Done | Votes: | 0 |
| Labels: | rollback-optional | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||||||||||||||||||||||
| Sprint: | Repl 2018-05-07 | ||||||||||||||||||||||||||||||||||||
| Participants: | |||||||||||||||||||||||||||||||||||||
| Description |
|
Current rollback limit is 300MB, but it's not practical for high-throughput systems that write 100's of MB per second. We should consider upping the limit or removing it completely. |
| Comments |
| Comment by Judah Schvimer [ 17/Apr/18 ] |
|
Yes, the limit is in rs_rollback.cpp, so it is rollback via refetch specific. |
| Comment by Spencer Brody (Inactive) [ 17/Apr/18 ] |
|
judah@mongodb.com, can you confirm that there is no rollback data size limit with Rollback to Timestamp? Assuming this is indeed gone from 4.0, can you resolve this ticket and file a DOCS ticket to remove references to the 300MB data size limit from the docs? |
| Comment by Scott Hernandez (Inactive) [ 06/Apr/16 ] |
|
Given how rollback works and the fact that we buffer _ids for each roll-back document this may create an issue with increasing the limit without re-writing how it works. So, it may not be as simple as changing the constants. |
| Comment by Eric Milkie [ 06/Apr/16 ] |
|
I think we should make the limits "1 TB" and "24 hours". |
| Comment by Daniel Pasette (Inactive) [ 29/Mar/16 ] |
|
If there's no real "reason" for a limit other than it being a slow/expensive process, it seems to me that recovering a node via rollback is always the right choice unless it takes longer than an initial sync would. The time limit seems like it should be included in any work we do here as well. Yes, the cluster may be in a degraded state, but at least there is the possibility that it would be able to self-heal. That said, the onus shouldn't be on Alex to choose a number here. |
| Comment by Alexander Komyagin [ 29/Mar/16 ] |
|
Personally, I think we should keep the time limit as a safety check [on our replication protocol logic]. However, size limit should be increased. I don't want to name a random number, but perhaps 10GB would be enough |
| Comment by Scott Hernandez (Inactive) [ 29/Mar/16 ] |
|
Alex, what limit do you think would be acceptable? Do you also want to change the time limit, which is no more than 30 minutes behind, as well? |