|
As described in BF-18570, jstests/noPassthrough/read_concern_snapshot_yielding.js randomly hangs and timeouts with following message:
assert.soon failed: () => {
|
results = adminDB.aggregate([{$currentOp: options}, {$match: filter}]).toArray();
|
return results.length > 0;
|
} : Failed to find a matching op for filter: {
|
"$and" : [
|
{
|
"ns" : "test.coll"
|
},
|
{
|
"op" : "update"
|
},
|
{
|
"$or" : [
|
{
|
"failpointMsg" : "setInterruptOnlyPlansCheckForInterruptHang"
|
},
|
{
|
"msg" : "setInterruptOnlyPlansCheckForInterruptHang"
|
}
|
]
|
}
|
]
|
}in currentOp output: [ ]
|
The root cause of the problem is that setInterruptOnlyPlansCheckForInterruptHang fail point trapping mechanism is vulnerable to CPU time scheduling unevenness. In test jstests/noPassthrough/read_concern_snapshot_yielding.js https://github.com/mongodb/mongo/blob/fd8e132ebe4d544a5c99d81fffa2ffb8fcb3f841/jstests/noPassthrough/read_concern_snapshot_yielding.js#L31 assumes that commands will yield on the second try, but actually it can yield on the first try if the 10ms time window closes (https://github.com/mongodb/mongo/blob/fd8e132ebe4d544a5c99d81fffa2ffb8fcb3f841/src/mongo/util/elapsed_tracker.cpp#L47). This causes the transaction start command to block on setInterruptOnlyPlansCheckForInterruptHang fail point, which is not expected. The thread then does not reach a point where it can block on setInterruptOnlyPlansCheckForInterruptHang fail point as expected, and then the main test thread timeouts.
|