[SERVER-66529] The oplog manager thread updating the oplogReadTimestamp can race with a cappedTruncateAfter operation directly updating the oplogReadTimestamp Created: 17/May/22  Updated: 29/Oct/23  Resolved: 14/Jun/22

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: 6.0.1, 6.1.0-rc0

Type: Bug Priority: Major - P3
Reporter: Dianna Hohensee (Inactive) Assignee: Dianna Hohensee (Inactive)
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Depends
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Requested:
v6.0
Sprint: Execution Team 2022-06-27
Participants:
Linked BF Score: 0

 Description   

TL;DR the periodic thread updating oplogReadTimestamp doesn't have sufficient mutex coverage to avoid immediately setting the oplogReadTimestamp forward again after cappedTruncateAfter does a direct oplogReadTimestamp update backwards. If the timing is just right.

 -----------------------------------------------------------------
This is my explanation from the test failure ticket:

RecordStore::cappedTruncateAfter has special logic to update the oplogReadTimestamp if it's the record store for the oplog collection. Meanwhile, there's a thread that periodically updates the oplogReadTimestamp. Of note in the thread's logic, it releases the mutex protecting oplogReadTimestamp writes/reads while fetching the WT all_durable timestamp. So here's what I propose happened:

1. The oplogReadTimestamp is T(5,30)
2. PeriodicThread fetches the all_durable timestamp T(5,30)
3. Op1 truncates the oplog back to T(5,1), deleting T(5,20) & T(5,30)
4. Op1 then sets the oplogReadTimestamp to T(5,3)
5. PeriodicThread then moves the oplorReadTimestamp forward to T(5,30)

 So in theory, any internal operation truncating the oplog while the server is up and running (not startup or rollback) could cause this race. If such code exists anywhere. Startup and rollback both restart the storage engine, reseting the all_durable timestamp, and do not have this issue with oplog truncation.



 Comments   
Comment by Githook User [ 20/Jul/22 ]

Author:

{'name': 'Dianna Hohensee', 'email': 'dianna.hohensee@mongodb.com', 'username': 'DiannaHohensee'}

Message: SERVER-66529 OplogManager must not release mutex, so as to avoid wrongly fowarding the oplogReadTimestamp after a cappedTruncateAfter operation sets it back.

(cherry picked from commit d9e643fe4dfb7da4ab68d6929477613f244ad361)
Branch: v6.0
https://github.com/mongodb/mongo/commit/4e85654b613ab91f9507cd97997db5fa62c3076e

Comment by Githook User [ 14/Jun/22 ]

Author:

{'name': 'Dianna Hohensee', 'email': 'dianna.hohensee@mongodb.com', 'username': 'DiannaHohensee'}

Message: SERVER-66529 OplogManager must not release mutex, so as to avoid wrongly fowarding the oplogReadTimestamp after a cappedTruncateAfter operation sets it back.
Branch: master
https://github.com/mongodb/mongo/commit/d9e643fe4dfb7da4ab68d6929477613f244ad361

Comment by Dianna Hohensee (Inactive) [ 18/May/22 ]

We should test what happens to the WT all_durable timestamp when the oplog is truncated. I'm not sure whether it unwinds, in which case the unit test isn't going to work as designed.

Generated at Thu Feb 08 06:05:42 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.