[SERVER-50955] oplog_rollover.js pauses the OplogCapMaintainerThread until truncation is needed Created: 15/Sep/20  Updated: 29/Oct/23  Resolved: 25/Sep/20

Status: Closed
Project: Core Server
Component/s: Testing Infrastructure
Affects Version/s: None
Fix Version/s: 4.8.0, 4.4.2, 4.2.15

Type: Bug Priority: Major - P3
Reporter: Gregory Wlodarek Assignee: Gregory Wlodarek
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Depends
Related
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Requested:
v4.4, v4.2
Sprint: Execution Team 2020-10-05
Participants:
Linked BF Score: 14

 Description   

There was a build failure related to this test where oplog truncation was running concurrently while inserting a record that should cause the oplog to rollover.

Looking at the insertions the test performed we have the following oplog entries (some fields not shown due to the test using a projection):

[ 
 { "op" : "i", "ns" : "test.foo", "ts" : Timestamp(1598009744, 3), "t" : NumberLong(1) },
 { "op" : "i", "ns" : "test.foo", "ts" : Timestamp(1598009745, 1), "t" : NumberLong(1) }, 
 { "op" : "i", "ns" : "test.foo", "ts" : Timestamp(1598009747, 1), "t" : NumberLong(1) } 
]

The oplog truncation thread was truncating the oplog between RecordId's 0 and 6863399589169332000.
These have RecordId's 6863399589169332227, 6863399593464299521 and 6863399602054234113 respectively.
All of these have a RecordId higher then what the oplog truncation method was truncating, and so none of these oplog entries were truncated when it was expected for the first oplog entry to be truncated.

The third record inserted that was supposed to roll over the oplog failed to create a stone, causing the test to hang as the OplogCapMaintainerThread saw nothing to reclaim.

The test waits until there are two oplog entries remaining, but there were always three oplog entries in this run.

From my observation, based on when the oplog truncation thread was running and when the third record was inserted, I think we tried to create a new oplog stone while oplog truncation was running. The oplog truncation thread can hold a mutex for a short amount of time when calling either peekOldestStoneIfNeeded() or popOldestStone() in the reclaimOplog() function.

During this time, the third record insertion tried to create a new oplog stone but because of the possibility of the mutex being held by the oplog truncation thread, we returned early.

This without a doubt is a transient issue as the subsequent insertions would try to create the oplog stone. But this test does not perform any other insertions and expects that oplog stone to be created no matter what.



 Comments   
Comment by Githook User [ 17/May/21 ]

Author:

{'name': 'Gregory Wlodarek', 'email': 'gregory.wlodarek@mongodb.com', 'username': 'GWlodarek'}

Message: SERVER-50955 oplog_rollover.js pauses the OplogCapMaintainerThread until truncation is needed

(cherry picked from commit 297a44e948fae079039882c88f6191984b58990a)
Branch: v4.2
https://github.com/mongodb/mongo/commit/b6a36a0bc447a09b8273fceff1e6a35b584953e6

Comment by Githook User [ 28/Sep/20 ]

Author:

{'name': 'Gregory Wlodarek', 'email': 'gregory.wlodarek@mongodb.com', 'username': 'GWlodarek'}

Message: SERVER-50955 Add 'multiversion_incompatible' tag to 'oplog_rollover.js'
Branch: v4.4
https://github.com/mongodb/mongo/commit/5d7f31ea981602c4cf3471c48a42652adb12c5b3

Comment by Githook User [ 25/Sep/20 ]

Author:

{'name': 'Gregory Wlodarek', 'email': 'gregory.wlodarek@mongodb.com', 'username': 'GWlodarek'}

Message: SERVER-50955 oplog_rollover.js pauses the OplogCapMaintainerThread until truncation is needed

(cherry picked from commit 45cbc24908900b708659188dea11af4d14f3ded1)
Branch: v4.4
https://github.com/mongodb/mongo/commit/297a44e948fae079039882c88f6191984b58990a

Comment by Githook User [ 25/Sep/20 ]

Author:

{'name': 'Gregory Wlodarek', 'email': 'gregory.wlodarek@mongodb.com', 'username': 'GWlodarek'}

Message: SERVER-50955 oplog_rollover.js pauses the OplogCapMaintainerThread until truncation is needed
Branch: master
https://github.com/mongodb/mongo/commit/45cbc24908900b708659188dea11af4d14f3ded1

Comment by Gregory Wlodarek [ 15/Sep/20 ]

An idea here could be to pause the oplog truncation thread using a fail point before performing the third insertion that tries to create the oplog stone.

Generated at Thu Feb 08 05:24:06 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.