Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-66529

The oplog manager thread updating the oplogReadTimestamp can race with a cappedTruncateAfter operation directly updating the oplogReadTimestamp

    • Type: Icon: Bug Bug
    • Resolution: Fixed
    • Priority: Icon: Major - P3 Major - P3
    • 6.0.1, 6.1.0-rc0
    • Affects Version/s: None
    • Component/s: None
    • Labels:
    • Fully Compatible
    • ALL
    • v6.0
    • Execution Team 2022-06-27
    • 0

      TL;DR the periodic thread updating oplogReadTimestamp doesn't have sufficient mutex coverage to avoid immediately setting the oplogReadTimestamp forward again after cappedTruncateAfter does a direct oplogReadTimestamp update backwards. If the timing is just right.

      This is my explanation from the test failure ticket:

      RecordStore::cappedTruncateAfter has special logic to update the oplogReadTimestamp if it's the record store for the oplog collection. Meanwhile, there's a thread that periodically updates the oplogReadTimestamp. Of note in the thread's logic, it releases the mutex protecting oplogReadTimestamp writes/reads while fetching the WT all_durable timestamp. So here's what I propose happened:

      1. The oplogReadTimestamp is T(5,30)
      2. PeriodicThread fetches the all_durable timestamp T(5,30)
      3. Op1 truncates the oplog back to T(5,1), deleting T(5,20) & T(5,30)
      4. Op1 then sets the oplogReadTimestamp to T(5,3)
      5. PeriodicThread then moves the oplorReadTimestamp forward to T(5,30)

       So in theory, any internal operation truncating the oplog while the server is up and running (not startup or rollback) could cause this race. If such code exists anywhere. Startup and rollback both restart the storage engine, reseting the all_durable timestamp, and do not have this issue with oplog truncation.

            dianna.hohensee@mongodb.com Dianna Hohensee (Inactive)
            dianna.hohensee@mongodb.com Dianna Hohensee (Inactive)
            0 Vote for this issue
            4 Start watching this issue