[SERVER-35941] Don't maintain full stable optime candidate list on secondaries in PV0 Created: 02/Jul/18 Updated: 29/Oct/23 Resolved: 22/Aug/18 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Replication |
| Affects Version/s: | 3.6.5 |
| Fix Version/s: | 3.6.8 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | William Schultz (Inactive) | Assignee: | Tess Avitabile (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||||||||||||||
| Operating System: | ALL | ||||||||||||||||||||||||
| Sprint: | Repl 2018-08-27 | ||||||||||||||||||||||||
| Participants: | |||||||||||||||||||||||||
| Case: | (copied to CRM) | ||||||||||||||||||||||||
| Description |
|
In PV0, we do not propagate the necessary information to secondaries for them to advance their commit point. We still, however, keep around a list of stable optime candidates that we add to every time we update our lastApplied optime. Normally, we can purge timestamps from this list that are earlier than the stable timestamp (which is the latest timestamp in this list earlier than the commit point), since we don't need them any more. If the commit point never moves on secondaries, though, this list will never get purged, and it will grow without bound. To avoid keeping around an unbounded amount of storage engine history in PV0, we already manually advance the storage engine's stable timestamp to whatever the lastApplied timestamp is. We should do something similar for the stable optime candidate list. For PV0 secondaries it should likely be sufficient to keep no stable optime candidates in this list. When we become a primary, we can then start adding optimes to this list, and purging them appropriately since we advance the commit point as a primary. Upon protocol version upgrade from PV0 => PV1, we will start without any stable optime candidates. We can then start adding optime candidates when we start applying writes in PV1, and set a new stable timestamp as soon as we learn of a commit point later than one of our candidates. This only applies to 3.6, which is where we first added the "stable optime candidate" list. Protocol version 0 is banned in versions >= 4.0 as of |
| Comments |
| Comment by Githook User [ 22/Aug/18 ] |
|
Author: {'name': 'Tess Avitabile', 'email': 'tess.avitabile@mongodb.com', 'username': 'tessavitabile'}Message: |
| Comment by Alyson Cabral (Inactive) [ 07/Aug/18 ] |
|
I agree that we should do it, I also didn't realize this would impact everyone with the PV0/3.6 combination, but it's not super urgent. PV0 is deprecated in 3.6 and the clear fix is to upgrade to PV1. |
| Comment by Tess Avitabile (Inactive) [ 07/Aug/18 ] |
|
Thanks for calling attention back to this. I had missed the fact that this will always happen in a 3.6 PV0 set. If we do this work, then users would still need to upgrade minor versions to address the problem, which might not be easier than upgrading protocol version. But at least then future upgrades to 3.6 would have the fix, so it's probably a good idea to do this work. alyson.cabral, what do you think? |
| Comment by William Schultz (Inactive) [ 06/Aug/18 ] |
|
tess.avitabile Just to double check, you are ok with the resolution of this ticket as "won't fix"? spencer pointed out that it effectively makes 3.6 + PV0 unusable, since this bug causes memory usage to grow without bound on PV0 secondaries. |
| Comment by Gregory McKeon (Inactive) [ 05/Jul/18 ] |
|
The solution is to upgrade from PV0 to PV1, which is required in 3.6 to upgrade to 4.0. |