[SERVER-31039] set minvalid term to -1 on pv1 downgrade Created: 11/Sep/17 Updated: 26/Oct/17 Resolved: 26/Oct/17 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Replication |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Judah Schvimer | Assignee: | Siyuan Zhou |
| Resolution: | Duplicate | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Attachments: |
|
||||||||||||||||||||||||
| Issue Links: |
|
||||||||||||||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||||||||||||||
| Operating System: | ALL | ||||||||||||||||||||||||
| Backport Requested: |
v3.4, v3.2
|
||||||||||||||||||||||||
| Sprint: | Repl 2017-10-02, Repl 2017-10-23, Repl 2017-11-13 | ||||||||||||||||||||||||
| Participants: | |||||||||||||||||||||||||
| Case: | (copied to CRM) | ||||||||||||||||||||||||
| Description |
|
When we upgrade to pv1 after previously downgrading, the node's term will be set to 0. If the term of minValid is greater than 0, we will not be able to find a sync source. Setting it to -1 on downgrade will avoid this. |
| Comments |
| Comment by Siyuan Zhou [ 26/Oct/17 ] |
|
Marking this as a dup to |
| Comment by Siyuan Zhou [ 24/Oct/17 ] |
|
After PV downgrade to PV0, the minValid will be set to something that never exists too, but the logic of choosing new sync source that compares the last fetched optime and the minValid ignores the term if any one of them is -1, so the required optime isn't checked on the sync source. As a result, the problem rarely appears after PV downgrade unless the node is in an inconsistent state while choosing new sync source. If PV gets upgraded again, the last fetched optime has a valid term. Comparing it to a non-exist optime triggers this problem. |
| Comment by Judah Schvimer [ 19/Oct/17 ] |
|
I uploaded Spencer's repro attached to HELP-4911. As far as I remember, we do need to backport it and it was not caused by changes in 3.6. |
| Comment by Judah Schvimer [ 11/Sep/17 ] |
|
yes, my mistake. |
| Comment by Spencer Brody (Inactive) [ 11/Sep/17 ] |
|
Shouldn't this actually be done on protocolVersion downgrade? |