-
Type: New Feature
-
Resolution: Duplicate
-
Priority: Major - P3
-
None
-
Affects Version/s: None
-
Component/s: Internal Code
-
None
Frequently, tests must do something like:
- enable a failpoint,
- wait for the code to reach the point where it checks if the failpoint is enabled,
- the test continues.
Today we do this by manually adding a log line to the C++ code like:
if (MONGO_unlikely(rsSyncApplyStop.shouldFail())) { log() << "Oplog Applier - rsSyncApplyStop fail point enabled. Blocking until fail " "point is disabled."; /* ... snip ... */ }
Then in Javascript we use the checkLog mechanism to wait for that log message to appear. This is prone to various errors:
- We don't apply this pattern everywhere we need it, which causes race conditions like
SERVER-43703 - We don't apply this pattern in Python test fixtures, because we haven't implemented checkLog in Python yet
- We forget to clear the log before enabling the failpoint, so we see an old log message and continue immediately instead of waiting for the new log message
- We change a log message because we didn't know a test depended on it
It would be awesome if we had a better mechanism for synchronizing tests with failpoints. I propose something approximately like:
if (MONGO_unlikely(rsSyncApplyStop.shouldFail())) { rsSyncApplyStop.signal(); /* new method */ /* ... snip ... */ }
Then in jsTests:
db.adminCommand({ configureFailPoint: 'rsSyncApplyStop', mode: 'alwaysOn', wait: true /* new parameter */ });
If "wait" is true, the command first changes the failpoint's mode, then waits for the next call to signal() before returning. If the mechanism were this convenient I think we would use it more consistently and with fewer mistakes.
- duplicates
-
SERVER-42308 Improve synchronization between two fail points
- Backlog