Loading...

XML

Word

Printable

JSON

Type: Improvement
Resolution: Done
Priority: Major - P3
Fix Version/s: 3.4.18, 3.6.0-rc4
Affects Version/s: None
Component/s: Index Maintenance
Labels:
- neweng

Backwards Compatibility:
Fully Compatible
Backport Requested:

v3.4
Sprint:
Storage 2017-11-13
Linked BF Score:
0
Confidence Status:
None
Work Order:
3
CAR Domain/s:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

index_killop.js uses the 'hangAfterStartingIndexBuild' fail point in order to freeze an index build in-flight and allow for the build to be killed via killop. Execution of the killop depends on a call to 'checkForInterrupt()' from within the fail point block to allow the killop to execute. As implemented it is possible for index_killop.js to fail in the following scenario:

The hangAfterStartingIndexBuild failpoint is activated
Index build starts, entry appears in curop
The test detects the running index build and kills the op (which flags it for kill at next interrupt check).
The test deactivates the failpoint
the index build hasn't yet reached the failpoint code yet, so it actually does not perform the failpoint log, the sleep, or, critically, the checkForInterrupt call.
The index build completes successfully, because there are no further checkForInterrupt calls.

To fix we could:

Call 'checkForInterrupt()' on each loop of the 'hangAfterStartingIndexBuild' failpoint.
Move disabling of the 'hangAfterStartingIndexBuild' failpoint (in index_killop.js) to after confirmation that the index build has stopped. This move requires the above change, as otherwise the killop is blocked waiting for a checkForInterrupt call.

As part of this work we should look to reimplement the 'hangAfterStartingIndexBuild' failpoint to be part of the insert loop, taking advantage of the checkForInterrupt() call performed there. Removing the failpoint-only call allows us to more closely test real-world behavior.

Steps to reproduce:
1. Force the server thread to sleep for 5 seconds before the failpoint checkpoint to simulate the situation when the test runs 'faster' than the server. In this case, checkForInterrupt won't get executed because the failpoint definitely gets deactivated by the test in this 5-second period.

+ sleepmillis(5000);

if (MONGO_FAIL_POINT(hangAfterStartingIndexBuild)) {
    // Need the index build to hang before the progress meter is marked as finished so we can
    // reliably check that the index build has actually started in js tests.
    while (MONGO_FAIL_POINT(hangAfterStartingIndexBuild)) {
        log() << "Hanging index build due to 'hangAfterStartingIndexBuild' failpoint";
        sleepmillis(1000);
    }

    // Check for interrupt to allow for killop prior to index build completion.
    _opCtx->checkForInterrupt();
}

Assignee:: Xiangyu Yao (Inactive)
Reporter:: James Wahlin
Participants:: Githook User, James Wahlin, Xiangyu Yao
Votes:: 0 Vote for this issue
Watchers:: 4 Start watching this issue

Created:: Mar 08 2017 07:27:24 PM UTC
Updated:: Sep 11 2018 08:40:08 PM UTC
Resolved:: Nov 10 2017 09:15:42 PM UTC
Confidence Status Last Update:: 09/Nov/17 3:08 PM

Details

Description

Attachments

Forms

Activity

People

Dates