Details
-
New Feature
-
Resolution: Duplicate
-
Major - P3
-
None
-
None
-
None
-
Developer Tools
Description
Frequently, tests must do something like:
- enable a failpoint,
- wait for the code to reach the point where it checks if the failpoint is enabled,
- the test continues.
Today we do this by manually adding a log line to the C++ code like:
if (MONGO_unlikely(rsSyncApplyStop.shouldFail())) {
|
log() << "Oplog Applier - rsSyncApplyStop fail point enabled. Blocking until fail "
|
"point is disabled.";
|
/* ... snip ... */
|
}
|
Then in Javascript we use the checkLog mechanism to wait for that log message to appear. This is prone to various errors:
- We don't apply this pattern everywhere we need it, which causes race conditions like
SERVER-43703 - We don't apply this pattern in Python test fixtures, because we haven't implemented checkLog in Python yet
- We forget to clear the log before enabling the failpoint, so we see an old log message and continue immediately instead of waiting for the new log message
- We change a log message because we didn't know a test depended on it
It would be awesome if we had a better mechanism for synchronizing tests with failpoints. I propose something approximately like:
if (MONGO_unlikely(rsSyncApplyStop.shouldFail())) {
|
rsSyncApplyStop.signal(); /* new method */
|
/* ... snip ... */
|
}
|
Then in jsTests:
db.adminCommand({
|
configureFailPoint: 'rsSyncApplyStop',
|
mode: 'alwaysOn',
|
wait: true /* new parameter */
|
});
|
If "wait" is true, the command first changes the failpoint's mode, then waits for the next call to signal() before returning. If the mechanism were this convenient I think we would use it more consistently and with fewer mistakes.
Attachments
Issue Links
- duplicates
-
SERVER-42308 Improve synchronization between two fail points
-
- Backlog
-