[SERVER-43748] Convenient synchronization between tests and failpoints Created: 01/Oct/19  Updated: 06/Dec/22  Resolved: 01/Oct/19

Status: Closed
Project: Core Server
Component/s: Internal Code
Affects Version/s: None
Fix Version/s: None

Type: New Feature Priority: Major - P3
Reporter: A. Jesse Jiryu Davis Assignee: DO NOT USE - Backlog - Dev Tools
Resolution: Duplicate Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Duplicate
duplicates SERVER-42308 Improve synchronization between two f... Backlog
Assigned Teams:
Developer Tools
Participants:

 Description   

Frequently, tests must do something like:

  • enable a failpoint,
  • wait for the code to reach the point where it checks if the failpoint is enabled,
  • the test continues.

Today we do this by manually adding a log line to the C++ code like:

        if (MONGO_unlikely(rsSyncApplyStop.shouldFail())) {
            log() << "Oplog Applier - rsSyncApplyStop fail point enabled. Blocking until fail "
                     "point is disabled.";
            /* ... snip ... */
        }

Then in Javascript we use the checkLog mechanism to wait for that log message to appear. This is prone to various errors:

  • We don't apply this pattern everywhere we need it, which causes race conditions like SERVER-43703
  • We don't apply this pattern in Python test fixtures, because we haven't implemented checkLog in Python yet
  • We forget to clear the log before enabling the failpoint, so we see an old log message and continue immediately instead of waiting for the new log message
  • We change a log message because we didn't know a test depended on it

It would be awesome if we had a better mechanism for synchronizing tests with failpoints. I propose something approximately like:

        if (MONGO_unlikely(rsSyncApplyStop.shouldFail())) {
            rsSyncApplyStop.signal(); /* new method */
            /* ... snip ... */
        }

Then in jsTests:

        db.adminCommand({
            configureFailPoint: 'rsSyncApplyStop',
            mode: 'alwaysOn',
            wait: true /* new parameter */
        });

If "wait" is true, the command first changes the failpoint's mode, then waits for the next call to signal() before returning. If the mechanism were this convenient I think we would use it more consistently and with fewer mistakes.



 Comments   
Comment by Esha Maharishi (Inactive) [ 02/Oct/19 ]

Yep, also it would be nice if you could wait for the failpoint to be hit separately from hitting it, but I am hugely in support of adding better synchronization around this

– Edit –

Oh, didn't see it had already been closed as a dupe.

Comment by William Schultz (Inactive) [ 01/Oct/19 ]

Possibly similar to SERVER-42308, SERVER-39165.

Generated at Thu Feb 08 05:03:59 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.