[SERVER-51002] Fix jstests/noPassthrough/read_concern_snapshot_yielding.js random failure issue Created: 17/Sep/20  Updated: 29/Oct/23  Resolved: 18/Sep/20

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: 4.8.0

Type: Bug Priority: Major - P3
Reporter: Mindaugas Malinauskas Assignee: Mindaugas Malinauskas
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
Backwards Compatibility: Fully Compatible
Operating System: ALL
Sprint: Query 2020-09-21, Query 2020-10-05
Participants:
Linked BF Score: 22

 Description   

As described in BF-18570, jstests/noPassthrough/read_concern_snapshot_yielding.js randomly hangs and timeouts with following message: 

assert.soon failed: () => {
results = adminDB.aggregate([{$currentOp: options}, {$match: filter}]).toArray();
return results.length > 0;
} : Failed to find a matching op for filter: {
"$and" : [
{
"ns" : "test.coll"
},
{
"op" : "update"
},
{
"$or" : [
{
"failpointMsg" : "setInterruptOnlyPlansCheckForInterruptHang"
},
{
"msg" : "setInterruptOnlyPlansCheckForInterruptHang"
}
]
}
]
}in currentOp output: [ ]

The root cause of the problem is that  setInterruptOnlyPlansCheckForInterruptHang fail point trapping mechanism is vulnerable to CPU time scheduling unevenness. In test jstests/noPassthrough/read_concern_snapshot_yielding.js https://github.com/mongodb/mongo/blob/fd8e132ebe4d544a5c99d81fffa2ffb8fcb3f841/jstests/noPassthrough/read_concern_snapshot_yielding.js#L31 assumes that commands will yield on the second try, but actually it can yield on the first try if the 10ms time window closes (https://github.com/mongodb/mongo/blob/fd8e132ebe4d544a5c99d81fffa2ffb8fcb3f841/src/mongo/util/elapsed_tracker.cpp#L47). This causes the transaction start command to block on setInterruptOnlyPlansCheckForInterruptHang fail point, which is not expected. The thread then does not reach  a point where it can block on setInterruptOnlyPlansCheckForInterruptHang fail point as expected, and then the main test thread timeouts.



 Comments   
Comment by Githook User [ 18/Sep/20 ]

Author:

{'name': 'Mindaugas Malinauskas', 'email': 'mindaugas.malinauskas@mongodb.com'}

Message: SERVER-51002 Fix jstests/noPassthrough/read_concern_snapshot_yielding.js random failure issue
Branch: master
https://github.com/mongodb/mongo/commit/3a2dc8cb509c9445e5904f4dae46d83f0fe5122d

Generated at Thu Feb 08 05:24:14 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.