Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-43748

Convenient synchronization between tests and failpoints

    XMLWordPrintable

    Details

    • Type: New Feature
    • Status: Closed
    • Priority: Major - P3
    • Resolution: Duplicate
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: Internal Code
    • Labels:
      None

      Description

      Frequently, tests must do something like:

      • enable a failpoint,
      • wait for the code to reach the point where it checks if the failpoint is enabled,
      • the test continues.

      Today we do this by manually adding a log line to the C++ code like:

              if (MONGO_unlikely(rsSyncApplyStop.shouldFail())) {
                  log() << "Oplog Applier - rsSyncApplyStop fail point enabled. Blocking until fail "
                           "point is disabled.";
                  /* ... snip ... */
              }
      

      Then in Javascript we use the checkLog mechanism to wait for that log message to appear. This is prone to various errors:

      • We don't apply this pattern everywhere we need it, which causes race conditions like SERVER-43703
      • We don't apply this pattern in Python test fixtures, because we haven't implemented checkLog in Python yet
      • We forget to clear the log before enabling the failpoint, so we see an old log message and continue immediately instead of waiting for the new log message
      • We change a log message because we didn't know a test depended on it

      It would be awesome if we had a better mechanism for synchronizing tests with failpoints. I propose something approximately like:

              if (MONGO_unlikely(rsSyncApplyStop.shouldFail())) {
                  rsSyncApplyStop.signal(); /* new method */
                  /* ... snip ... */
              }
      

      Then in jsTests:

              db.adminCommand({
                  configureFailPoint: 'rsSyncApplyStop',
                  mode: 'alwaysOn',
                  wait: true /* new parameter */
              });
      

      If "wait" is true, the command first changes the failpoint's mode, then waits for the next call to signal() before returning. If the mechanism were this convenient I think we would use it more consistently and with fewer mistakes.

        Attachments

          Issue Links

            Activity

              People

              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: