[SERVER-50071] Change rollback common point checks in rollback_impl.cpp from invariants to fasserts Created: 03/Aug/20 Updated: 06/Dec/22 |
|
| Status: | Backlog |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Improvement | Priority: | Major - P3 |
| Reporter: | Ali Mir | Assignee: | Backlog - Replication Team |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | neweng | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Assigned Teams: |
Replication
|
| Sprint: | Repl 2020-10-19, Repl 2020-11-02 |
| Participants: |
| Description |
|
In rollback_impl.cpp, we verify that the rollback common point is greater than both the replication commit point and the committed snapshot optime. We are currently using invariants to validate this. However, users may encounter situations where the common point will not be greater than the replication commit point and/or committed snapshot optime, and this invariant will fail. According to the wiki page on assertions, it may be better to use uasserts due to the possibility of users seeing this occur. |
| Comments |
| Comment by Siyuan Zhou [ 30/Oct/20 ] | ||||||||||
|
I just realized this test in question explicitly tests the failure case and expects the crash. The original proposal of changing invariant to fassert makes sense to me. I agree it's better to give a bit more context on the crash. However, this seems a low-priority issue. I'd suggest marking this with neweng and putting into backlog. | ||||||||||
| Comment by Ali Mir [ 20/Oct/20 ] | ||||||||||
|
Upon further investigation, it looks like node3 is resync'd and thus goes through initial sync. This explains how we can sync node3 from node2 with the shorter oplog - node3 starts fresh and syncs up to node2's oplog, and thus still loses the second write. Thanks to samy.lanka, we know that we cannot sync from a node with a shorter oplog. Instead, a syncing node will select the older node as a candidate, but not actually sync until the older node has advanced its optime. I believe syncing nodes will attempt to select a new node as a sync source candidate, as described here. I think this scenario still exemplifies the situation in | ||||||||||
| Comment by Siyuan Zhou [ 15/Oct/20 ] | ||||||||||
|
Thanks for the investigation! The diagram made it pretty easy to follow.
I don't see why node3 with more oplog entries could sync from node2 with a shorter oplog. If this happens, it seems a problem to me. Could you please investigate that? | ||||||||||
| Comment by Ali Mir [ 15/Oct/20 ] | ||||||||||
|
The test expects node1 to exit with MongoRunner.EXIT_ABORT here. If we were to change these invariants to fasserts, we could pass in a unique error code and expect the node in the test to fail with that error code, which would be cleaner. However, as mentioned in | ||||||||||
| Comment by Ali Mir [ 15/Oct/20 ] | ||||||||||
|
It appears the invariant is occurring to a very similar scenario as
Node 2 stops server replication, and we issue a write to the primary with {w: majority}. The oplog appears as:
Then, we disconnect the primary, and change node3's sync source to node2. The second write is then lost on node3. We then step down node1, and step up node2. The oplog appears as:
As node1 attempts to rejoin the replica set, it will attempt to rollback a majority committed write (write 2), and the node will cause the invariant failure as the rollback common point is behind the its commit point. | ||||||||||
| Comment by Siyuan Zhou [ 21/Aug/20 ] | ||||||||||
|
ali.mir, what was the original issue where you found this problem? |