[SERVER-30577] Clear list of stable timestamp candidates on Rollback and Initial Sync Created: 09/Aug/17 Updated: 30/Oct/23 Resolved: 21/Nov/17 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Replication |
| Affects Version/s: | None |
| Fix Version/s: | 3.6.0-rc5 |
| Type: | Task | Priority: | Major - P3 |
| Reporter: | William Schultz (Inactive) | Assignee: | William Schultz (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | bkp | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||||||||||||||||||
| Backport Requested: |
v3.6
|
||||||||||||||||||||||||||||
| Sprint: | Repl 2017-11-13, Repl 2017-12-04 | ||||||||||||||||||||||||||||
| Participants: | |||||||||||||||||||||||||||||
| Linked BF Score: | 0 | ||||||||||||||||||||||||||||
| Description |
| Comments |
| Comment by Githook User [ 21/Nov/17 ] | |
|
Author: {'name': 'William Schultz', 'username': 'will62794', 'email': 'william.schultz@mongodb.com'}Message: (cherry picked from commit 30de2f7c46a9aa0914fe91cba2075b244e9b516b) | |
| Comment by Githook User [ 21/Nov/17 ] | |
|
Author: {'name': 'William Schultz', 'username': 'will62794', 'email': 'william.schultz@mongodb.com'}Message: | |
| Comment by Eric Milkie [ 09/Nov/17 ] | |
|
Thanks for that analysis; I’m now confident that your proposal is comprehensive. | |
| Comment by William Schultz (Inactive) [ 09/Nov/17 ] | |
|
That is what I was thinking about, but I don't think there are. The following are the states where our data might be inconsistent:
RECOVERING after rollback seems like the only place where we might be updating our applied optimes and therefore updating the stable timestamp while in an inconsistent state. It seems that recovery after shutdown doesn't set the applied optime until finishes recovering from the oplog, and initial sync is handled specially by the way we call the StorageInterface::setInitialDataTimestamp method. I think that if we check the condition
that will be a sufficient (albeit not elegant) way to check if we should avoid updating the stable timestamp in ReplicationCoordinatorImpl::setMyLastAppliedOpTime.
| |
| Comment by Eric Milkie [ 09/Nov/17 ] | |
|
That sounds okay to me. Are there other RECOVERING states where the data is not consistent? | |
| Comment by William Schultz (Inactive) [ 09/Nov/17 ] | |
|
So, for 3.6, I think there are two things we need to do: 1. Upon leaving ROLLBACK state, clear the list of stable optime candidates so that there are no optimes in the set that would have been rolled back (they are optimes that no longer represent valid states for this node, since we rolled back the ops). milkie Does this seem sufficient? | |
| Comment by Githook User [ 08/Nov/17 ] | |
|
Author: {'name': 'William Schultz', 'username': 'will62794', 'email': 'william.schultz@mongodb.com'}Message: Revert " This reverts commit 68035d80f3b382e591fa381ee44550d920a6d432. | |
| Comment by Eric Milkie [ 08/Nov/17 ] | |
|
Yes, we'll need to be careful, as we use stable timestamps to set read concern majority values as well as setting oldest_timestamp. The timestamps should only be for point-in-time values where all the data will be consistent. | |
| Comment by William Schultz (Inactive) [ 08/Nov/17 ] | |
|
I am wondering if we need to be careful about setting the stable timestamp to a timestamp where the database state is inconsistent in 3.6. The one scenario I am thinking about is after rollback (via refetch), when we enter RECOVERING mode, and apply operations from the sync source until we reach minValid. There is nothing preventing us from setting our applied optime forward during this phase (even though we disallow external reads), so we could be adding optime candidates to our set and potentially setting the stable timestamp to one of these (inconsistent) optimes, during the RECOVERING phase. Maybe, during this phase, we need to disallow checkpoints or somehow refrain from updating the stable timestamp. I am not sure if this would actually be a real problem, depending on how the stable timestamp is being used in the storage layer in 3.6. | |
| Comment by Githook User [ 08/Nov/17 ] | |
|
Author: {'name': 'William Schultz', 'username': 'will62794', 'email': 'william.schultz@mongodb.com'}Message: | |
| Comment by William Schultz (Inactive) [ 07/Nov/17 ] | |
|
Clearing the stable optime candidate list on initial sync attempts seems to have been addressed by Eric's work. It can be seen in the ReplicationCoordinatorImpl::resetMyLastOpTimes function, which is called on a new initial sync attempt. | |
| Comment by Spencer Brody (Inactive) [ 26/Sep/17 ] | |
|
This is likely to fall out of work that milkie is doing already |