[SERVER-50071] Change rollback common point checks in rollback_impl.cpp from invariants to fasserts Created: 03/Aug/20  Updated: 06/Dec/22

Status: Backlog
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Improvement Priority: Major - P3
Reporter: Ali Mir Assignee: Backlog - Replication Team
Resolution: Unresolved Votes: 0
Labels: neweng
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Assigned Teams:
Replication
Sprint: Repl 2020-10-19, Repl 2020-11-02
Participants:

 Description   

In rollback_impl.cpp, we verify that the rollback common point is greater than both the replication commit point and the committed snapshot optime. We are currently using invariants to validate this. However, users may encounter situations where the common point will not be greater than the replication commit point and/or committed snapshot optime, and this invariant will fail. According to the wiki page on assertions, it may be better to use uasserts due to the possibility of users seeing this occur.



 Comments   
Comment by Siyuan Zhou [ 30/Oct/20 ]

I just realized this test in question explicitly tests the failure case and expects the crash. The original proposal of changing invariant to fassert makes sense to me. I agree it's better to give a bit more context on the crash. However, this seems a low-priority issue. I'd suggest marking this with neweng and putting into backlog.

Comment by Ali Mir [ 20/Oct/20 ]

Upon further investigation, it looks like node3 is resync'd and thus goes through initial sync. This explains how we can sync node3 from node2 with the shorter oplog - node3 starts fresh and syncs up to node2's oplog, and thus still loses the second write. 

Thanks to samy.lanka, we know that we cannot sync from a node with a shorter oplog. Instead, a syncing node will select the older node as a candidate, but not actually sync until the older node has advanced its optime. I believe syncing nodes will attempt to select a new node as a sync source candidate, as described here

I think this scenario still exemplifies the situation in SERVER-39626.

Comment by Siyuan Zhou [ 15/Oct/20 ]

Thanks for the investigation! The diagram made it pretty easy to follow.

Then, we disconnect the primary, and change node3's sync source to node2.

I don't see why node3 with more oplog entries could sync from node2 with a shorter oplog. If this happens, it seems a problem to me. Could you please investigate that?

Comment by Ali Mir [ 15/Oct/20 ]

The test expects node1 to exit with MongoRunner.EXIT_ABORT here. If we were to change these invariants to fasserts, we could pass in a unique error code and expect the node in the test to fail with that error code, which would be cleaner. However, as mentioned in SERVER-39626, this bug has never surfaced in the field, and this is the first time this we've encountered it in testing. Considering those factors, I'm not sure if it's worth doing this ticket. However, if we plan to do it, I would suggest marking this as a neweng ticket.

Comment by Ali Mir [ 15/Oct/20 ]

It appears the invariant is occurring to a very similar scenario as SERVER-39626. First, we have a three node replica set, and we insert a document with {w: 3}. As a result, the oplog appears as:

node1: [1] (primary)
node2: [1]
node3: [1]

Node 2 stops server replication, and we issue a write to the primary with {w: majority}. The oplog appears as: 

node1: [1] [2] (primary)
node2: [1]
node3: [1] [2]

Then, we disconnect the primary, and change node3's sync source to node2. The second write is then lost on node3.  We then step down node1, and step up node2. The oplog appears as:

node1: [1] [2]
---------------
node2: [1] (primary) 
node3: [1]

As node1 attempts to rejoin the replica set, it will attempt to rollback a majority committed write (write 2), and the node will cause the invariant failure as the rollback common point is behind the its commit point. 

Comment by Siyuan Zhou [ 21/Aug/20 ]

ali.mir, what was the original issue where you found this problem?

Generated at Thu Feb 08 05:21:38 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.