[SERVER-31805] rollbackViaRefetchNoUUID fails if rollback occurs during upgrade Created: 02/Nov/17 Updated: 30/Oct/23 Resolved: 14/Nov/17 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Replication |
| Affects Version/s: | None |
| Fix Version/s: | 3.6.0-rc5, 3.7.1 |
| Type: | Improvement | Priority: | Major - P3 |
| Reporter: | Judah Schvimer | Assignee: | Judah Schvimer |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | bkp | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||||||||||||||||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||||||||||||||||||||||||||||||
| Backport Requested: |
v3.6
|
||||||||||||||||||||||||||||||||||||||||
| Sprint: | Repl 2017-11-13, Repl 2017-12-04 | ||||||||||||||||||||||||||||||||||||||||
| Participants: | |||||||||||||||||||||||||||||||||||||||||
| Linked BF Score: | 0 | ||||||||||||||||||||||||||||||||||||||||
| Description |
|
We allow rollbacks during upgrade since they run with the old rollback algorithm. rollbackViarefetchNoUUID cannot resync collections with UUIDs (see |
| Comments |
| Comment by Githook User [ 14/Nov/17 ] |
|
Author: {'name': 'Judah Schvimer', 'username': 'judahschvimer', 'email': 'judah@mongodb.com'}Message: (cherry picked from commit aa8b6f7657450d537cc14a77371dcd8742018a28) |
| Comment by Githook User [ 14/Nov/17 ] |
|
Author: {'name': 'Judah Schvimer', 'username': 'judahschvimer', 'email': 'judah@mongodb.com'}Message: |
| Comment by Githook User [ 10/Nov/17 ] |
|
Author: {'name': 'Judah Schvimer', 'username': 'judahschvimer', 'email': 'judah@mongodb.com'}Message: |
| Comment by Judah Schvimer [ 10/Nov/17 ] |
|
To summarize a discussion with schwerin, the above plan does not solve the case of rolling back during a downgrade. In that case the rollback node will be rolling back a collMod that removes a UUID. The rolling back node will not have a UUID and the sync source will, but there will be no collMod during recovering to add the UUID back. To fix this, if a rollback node sees that the sync source has a UUID but it does not, it will check if it is in the process of downgrading. If so, it will take the UUID from the sync source. There are three ways the UUID can be mismatched: |
| Comment by Githook User [ 09/Nov/17 ] |
|
Author: {'name': 'Judah Schvimer', 'username': 'judahschvimer', 'email': 'judah@mongodb.com'}Message: |
| Comment by Githook User [ 08/Nov/17 ] |
|
Author: {'name': 'Judah Schvimer', 'username': 'judahschvimer', 'email': 'judah@mongodb.com'}Message: |
| Comment by Gregory McKeon (Inactive) [ 07/Nov/17 ] |
|
judah.schvimer can this be brought into sprint? |
| Comment by Judah Schvimer [ 02/Nov/17 ] |
|
To summarize our discussion, one proposed solution is to fix rollbackViaRefetchNoUUID in two ways: |
| Comment by Andy Schwerin [ 02/Nov/17 ] |
|
We'll need to spend some time on a solution that doesn't require rollback to fail during fCV upgrade. Upgrade could run for a while, and some node is going to lost an election during one somewhere. |
| Comment by Judah Schvimer [ 02/Nov/17 ] |
|
After discussion with william.schultz, this looks serious. It seems like we should fail rollback if we are in a targetVersion when we begin rollback, possibly if we roll back a change to the fCV document (though there will be no UUID changes in the rollback so it's probably safe), and also if the sync source does any operation on the fCV document (if the sync source is mid-upgrade/downgrade at the common point, then the rolling back node will be too and fail earlier). While this is potentially a coarser grained solution than is required, it is difficult to think through all of the different cases we could be in and we have little test coverage of it. |
| Comment by Judah Schvimer [ 02/Nov/17 ] |
|
rollbackViaRefetchNoUUID appears to also not handle resyncing UUIDs when it resyncs collection metadata. I think this could lead to bugs where UUIDs mismatched between nodes in a replica set. |