Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-28251

Fix race condition in index_killop.js

    XMLWordPrintable

Details

    • Fully Compatible
    • v3.4
    • Storage 2017-11-13
    • 0

    Description

      index_killop.js uses the 'hangAfterStartingIndexBuild' fail point in order to freeze an index build in-flight and allow for the build to be killed via killop. Execution of the killop depends on a call to 'checkForInterrupt()' from within the fail point block to allow the killop to execute. As implemented it is possible for index_killop.js to fail in the following scenario:

      1. The hangAfterStartingIndexBuild failpoint is activated
      2. Index build starts, entry appears in curop
      3. The test detects the running index build and kills the op (which flags it for kill at next interrupt check).
      4. The test deactivates the failpoint
      5. the index build hasn't yet reached the failpoint code yet, so it actually does not perform the failpoint log, the sleep, or, critically, the checkForInterrupt call.
      6. The index build completes successfully, because there are no further checkForInterrupt calls.

      To fix we could:

      1. Call 'checkForInterrupt()' on each loop of the 'hangAfterStartingIndexBuild' failpoint.
      2. Move disabling of the 'hangAfterStartingIndexBuild' failpoint (in index_killop.js) to after confirmation that the index build has stopped. This move requires the above change, as otherwise the killop is blocked waiting for a checkForInterrupt call.

      As part of this work we should look to reimplement the 'hangAfterStartingIndexBuild' failpoint to be part of the insert loop, taking advantage of the checkForInterrupt() call performed there. Removing the failpoint-only call allows us to more closely test real-world behavior.

      Steps to reproduce:
      1. Force the server thread to sleep for 5 seconds before the failpoint checkpoint to simulate the situation when the test runs 'faster' than the server. In this case, checkForInterrupt won't get executed because the failpoint definitely gets deactivated by the test in this 5-second period.

      + sleepmillis(5000);
       
      if (MONGO_FAIL_POINT(hangAfterStartingIndexBuild)) {
          // Need the index build to hang before the progress meter is marked as finished so we can
          // reliably check that the index build has actually started in js tests.
          while (MONGO_FAIL_POINT(hangAfterStartingIndexBuild)) {
              log() << "Hanging index build due to 'hangAfterStartingIndexBuild' failpoint";
              sleepmillis(1000);
          }
       
          // Check for interrupt to allow for killop prior to index build completion.
          _opCtx->checkForInterrupt();
      }
      

      Attachments

        Activity

          People

            xiangyu.yao@mongodb.com Xiangyu Yao (Inactive)
            james.wahlin@mongodb.com James Wahlin
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: