[SERVER-50955] oplog_rollover.js pauses the OplogCapMaintainerThread until truncation is needed Created: 15/Sep/20 Updated: 29/Oct/23 Resolved: 25/Sep/20 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Testing Infrastructure |
| Affects Version/s: | None |
| Fix Version/s: | 4.8.0, 4.4.2, 4.2.15 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Gregory Wlodarek | Assignee: | Gregory Wlodarek |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||
| Operating System: | ALL | ||||||||||||
| Backport Requested: |
v4.4, v4.2
|
||||||||||||
| Sprint: | Execution Team 2020-10-05 | ||||||||||||
| Participants: | |||||||||||||
| Linked BF Score: | 14 | ||||||||||||
| Description |
|
There was a build failure related to this test where oplog truncation was running concurrently while inserting a record that should cause the oplog to rollover. Looking at the insertions the test performed we have the following oplog entries (some fields not shown due to the test using a projection):
The oplog truncation thread was truncating the oplog between RecordId's 0 and 6863399589169332000. The third record inserted that was supposed to roll over the oplog failed to create a stone, causing the test to hang as the OplogCapMaintainerThread saw nothing to reclaim. The test waits until there are two oplog entries remaining, but there were always three oplog entries in this run. From my observation, based on when the oplog truncation thread was running and when the third record was inserted, I think we tried to create a new oplog stone while oplog truncation was running. The oplog truncation thread can hold a mutex for a short amount of time when calling either peekOldestStoneIfNeeded() or popOldestStone() in the reclaimOplog() function. During this time, the third record insertion tried to create a new oplog stone but because of the possibility of the mutex being held by the oplog truncation thread, we returned early. This without a doubt is a transient issue as the subsequent insertions would try to create the oplog stone. But this test does not perform any other insertions and expects that oplog stone to be created no matter what. |
| Comments |
| Comment by Githook User [ 17/May/21 ] |
|
Author: {'name': 'Gregory Wlodarek', 'email': 'gregory.wlodarek@mongodb.com', 'username': 'GWlodarek'}Message: (cherry picked from commit 297a44e948fae079039882c88f6191984b58990a) |
| Comment by Githook User [ 28/Sep/20 ] |
|
Author: {'name': 'Gregory Wlodarek', 'email': 'gregory.wlodarek@mongodb.com', 'username': 'GWlodarek'}Message: |
| Comment by Githook User [ 25/Sep/20 ] |
|
Author: {'name': 'Gregory Wlodarek', 'email': 'gregory.wlodarek@mongodb.com', 'username': 'GWlodarek'}Message: (cherry picked from commit 45cbc24908900b708659188dea11af4d14f3ded1) |
| Comment by Githook User [ 25/Sep/20 ] |
|
Author: {'name': 'Gregory Wlodarek', 'email': 'gregory.wlodarek@mongodb.com', 'username': 'GWlodarek'}Message: |
| Comment by Gregory Wlodarek [ 15/Sep/20 ] |
|
An idea here could be to pause the oplog truncation thread using a fail point before performing the third insertion that tries to create the oplog stone. |