Loading...

XML

Word

Printable

JSON

Type: New Feature
Resolution: Duplicate
Priority: Major - P3
Fix Version/s: None
Affects Version/s: None
Component/s: Internal Code
Labels:
None

Confidence Status:
None
Work Order:
3
CAR Domain/s:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

Frequently, tests must do something like:

enable a failpoint,
wait for the code to reach the point where it checks if the failpoint is enabled,
the test continues.

Today we do this by manually adding a log line to the C++ code like:

        if (MONGO_unlikely(rsSyncApplyStop.shouldFail())) {
            log() << "Oplog Applier - rsSyncApplyStop fail point enabled. Blocking until fail "
                     "point is disabled.";
            /* ... snip ... */
        }

Then in Javascript we use the checkLog mechanism to wait for that log message to appear. This is prone to various errors:

We don't apply this pattern everywhere we need it, which causes race conditions like ~~SERVER-43703~~
We don't apply this pattern in Python test fixtures, because we haven't implemented checkLog in Python yet
We forget to clear the log before enabling the failpoint, so we see an old log message and continue immediately instead of waiting for the new log message
We change a log message because we didn't know a test depended on it

It would be awesome if we had a better mechanism for synchronizing tests with failpoints. I propose something approximately like:

        if (MONGO_unlikely(rsSyncApplyStop.shouldFail())) {
            rsSyncApplyStop.signal(); /* new method */
            /* ... snip ... */
        }

Then in jsTests:

        db.adminCommand({
            configureFailPoint: 'rsSyncApplyStop',
            mode: 'alwaysOn',
            wait: true /* new parameter */
        });

If "wait" is true, the command first changes the failpoint's mode, then waits for the next call to signal() before returning. If the mechanism were this convenient I think we would use it more consistently and with fewer mistakes.

duplicates

SERVER-42308 Improve synchronization between two fail points

Backlog

Assignee:: DO NOT USE - Backlog - Dev Tools
Reporter:: A. Jesse Jiryu Davis
Participants:: A. Jesse Jiryu Davis, DO NOT USE - Backlog - Dev Tools, Esha Maharishi, Will Schultz
Votes:: 0 Vote for this issue
Watchers:: 3 Start watching this issue

Created:: Oct 01 2019 07:13:43 PM UTC
Updated:: Feb 22 2024 03:42:07 PM UTC
Resolved:: Oct 01 2019 08:25:38 PM UTC

Details

Description

Attachments

Issue Links

Forms

Activity

People

Dates