[SERVER-61977] Concurrent rollback and stepUp can cause a node to fetch from a timestamp before lastApplied once it has stepped down. Created: 10/Dec/21  Updated: 29/Oct/23  Resolved: 10/Jan/22

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: 5.3.0, 4.4.13, 5.0.7

Type: Bug Priority: Major - P3
Reporter: Jason Chan Assignee: Moustafa Maher
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: File test.js    
Issue Links:
Backports
Depends
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Requested:
v5.2, v5.1, v5.0, v4.4, v4.2, v4.0
Sprint: Replication 2022-01-10
Participants:
Linked BF Score: 15

 Comments   
Comment by Githook User [ 03/Feb/22 ]

Author:

{'name': 'Moustafa Maher Khalil', 'email': 'm.maher@mongodb.com', 'username': 'moustafamaher'}

Message: SERVER-61977 Concurrent rollback and stepUp can cause a node to fetch from a timestamp before lastApplied once it has stepped down
Branch: v4.4
https://github.com/mongodb/mongo/commit/4eb86bdf03bf18c269c1d03211a5382d272ab7ac

Comment by Githook User [ 02/Feb/22 ]

Author:

{'name': 'Moustafa Maher Khalil', 'email': 'm.maher@mongodb.com', 'username': 'moustafamaher'}

Message: SERVER-61977 Concurrent rollback and stepUp can cause a node to fetch from a timestamp before lastApplied once it has stepped down
Branch: v5.0
https://github.com/mongodb/mongo/commit/97af173f161f19013b52f9986bccfa1ed420e707

Comment by Githook User [ 10/Jan/22 ]

Author:

{'name': 'Moustafa Maher Khalil', 'email': 'm.maher@mongodb.com', 'username': 'moustafamaher'}

Message: SERVER-61977 Concurrent rollback and stepUp can cause a node to fetch from a timestamp before lastApplied once it has stepped down
Branch: master
https://github.com/mongodb/mongo/commit/992e538d26427798a661e0ab37dcebfc89a3fba1

Comment by Wenbin Zhu [ 13/Dec/21 ]

I think in step 3 when Node B wins the elections, it will stop BackgroundSync, but when the context switched back to rollback, it will stop the BackgroundSync again when it's already stopped, and restart it, which is incorrect because bgSync should not be running after it becomes a writable primary. If that is the case, I think there could be a really quick fix, to add sanity check and a boolean return value in BackgroundSync::stop which checks if it's already stop and return false, and do not restart it in that case.

Generated at Thu Feb 08 05:53:49 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.