Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-43748

Convenient synchronization between tests and failpoints

    • Type: Icon: New Feature New Feature
    • Resolution: Duplicate
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: None
    • Component/s: Internal Code
    • None

      Frequently, tests must do something like:

      • enable a failpoint,
      • wait for the code to reach the point where it checks if the failpoint is enabled,
      • the test continues.

      Today we do this by manually adding a log line to the C++ code like:

              if (MONGO_unlikely(rsSyncApplyStop.shouldFail())) {
                  log() << "Oplog Applier - rsSyncApplyStop fail point enabled. Blocking until fail "
                           "point is disabled.";
                  /* ... snip ... */
              }
      

      Then in Javascript we use the checkLog mechanism to wait for that log message to appear. This is prone to various errors:

      • We don't apply this pattern everywhere we need it, which causes race conditions like SERVER-43703
      • We don't apply this pattern in Python test fixtures, because we haven't implemented checkLog in Python yet
      • We forget to clear the log before enabling the failpoint, so we see an old log message and continue immediately instead of waiting for the new log message
      • We change a log message because we didn't know a test depended on it

      It would be awesome if we had a better mechanism for synchronizing tests with failpoints. I propose something approximately like:

              if (MONGO_unlikely(rsSyncApplyStop.shouldFail())) {
                  rsSyncApplyStop.signal(); /* new method */
                  /* ... snip ... */
              }
      

      Then in jsTests:

              db.adminCommand({
                  configureFailPoint: 'rsSyncApplyStop',
                  mode: 'alwaysOn',
                  wait: true /* new parameter */
              });
      

      If "wait" is true, the command first changes the failpoint's mode, then waits for the next call to signal() before returning. If the mechanism were this convenient I think we would use it more consistently and with fewer mistakes.

            Assignee:
            backlog-server-devtools DO NOT USE - Backlog - Dev Tools
            Reporter:
            jesse@mongodb.com A. Jesse Jiryu Davis
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated:
              Resolved: