Loading...

XML

Word

Printable

JSON

Type: Bug
Resolution: Fixed
Priority: Major - P3
Fix Version/s: WT11.2.0, 7.2.0-rc0, 7.1.0-rc3, 7.0.3, 6.0.12, 5.0.23
Affects Version/s: None
Component/s: None
Labels:
- code-quality
- stability

Sprint:
TheMoon-StorEng - 2023-09-19
Story Points:
None
Case:

Backport Requested:

v7.0, v6.0, v5.0, v4.4, v4.2

Here is the relevant code portion from __wt_cache_eviction_worker, the function that the application thread calls when it is forced to evict:

..
for (initial_progress = cache->eviction_progress;; ret = 0) {
..

        /* Evict a page. */
        switch (ret = __evict_page(session, false)) {
        case 0:
            if (busy)
                goto err;
        /* FALLTHROUGH */
        case EBUSY:
            break;
        ....
        }
        /* Stop if we've exceeded the time out. */
        if (time_start != 0 && cache_max_wait_us != 0) {
            time_stop = __wt_clock(session);
            if (session->cache_wait_us + WT_CLOCKDIFF_US(time_stop, time_start) > cache_max_wait_us)
                goto err;
        }
}

err:
    if (time_start != 0) {
        time_stop = __wt_clock(session);
        elapsed = WT_CLOCKDIFF_US(time_stop, time_start);
        WT_STAT_CONN_INCRV(session, application_cache_time, elapsed);
        WT_STAT_SESSION_INCRV(session, cache_time, elapsed);
        session->cache_wait_us += elapsed;
        if (cache_max_wait_us != 0 && session->cache_wait_us > cache_max_wait_us) {
            WT_TRET(__wt_txn_rollback_required(session, WT_TXN_ROLLBACK_REASON_CACHE_OVERFLOW));
            --cache->evict_aggressive_score;
            WT_STAT_CONN_INCR(session, cache_timed_out_ops);
            __wt_verbose_notice(session, WT_VERB_TRANSACTION, "%s", session->txn->rollback_reason);
        }
    }

The for loop resets the return value, ret = 0, and hence, in general, doesn't let EBUSY from the eviction attempt be leaked to the caller. This is true even when all the attempts to evict the page return busy, and the call returns. This is desired as the callers to __wt_cache_eviction_worker might not check for EBUSY returns and a caller like page_in could then return EBUSY to the application. We would not like the application call to fail because eviction did not succeed.

But, at the end of the loop, we have the code that checks if the application thread doesn't want to wait too long. In such a case, we stop trying to evict past a timeout period. But, if the page had last failed to evict with EBUSY, we do not end up suppressing the error. This would be a bug and lead to the application call failing instead of going through with the operation.

On the other hand, if an application thread can't get space in the cache and needs to fail, it should result in a WT_ROLLBACK error. Look at the code past the err: label, specifically the following line:

            WT_TRET(__wt_txn_rollback_required(session, WT_TXN_ROLLBACK_REASON_CACHE_OVERFLOW));

If the last attempt at eviction before timeout failed with EBUSY and we decide that this transaction needs to be rolled-back, we end up returning EBUSY instead of WT_ROLLBACK. This is because we are using WT_TRET and an error is already set. Also, in this case we will have EBUSY returned with a rollback reason set, which is an inconsistent API return.

Assignee:: Sulabh Mahajan
Reporter:: Sulabh Mahajan
Votes:: 0 Vote for this issue
Watchers:: 5 Start watching this issue

Created:: Sep 06 2023 04:26:04 AM UTC
Updated:: Nov 13 2023 03:18:33 AM UTC
Resolved:: Sep 13 2023 06:20:32 AM UTC

Details

Description

Attachments

Forms

Activity

People

Dates