Loading...

XML

Word

Printable

JSON

Type: Bug
Resolution: Fixed
Priority: Minor - P4
Fix Version/s: WT10.0.0, 4.4.1, 4.7.0
Affects Version/s: None
Component/s: None
Labels:
- wt-rtm

Sprint:
None
Story Points:
5

We typically open and close a history store cursor through a matching pair of calls:

__wt_hs_cursor(session, &session_flags, &is_owner);
...
__wt_hs_cursor_close(session, session_flags, is_owner);

These pairs can be nested and wt_hs_cursor() will only call wt_hs_cursor_open() to open a cursor if the session doesn't already have one (i.e., the first time it is called).

The problem is that we don't follow this paradigm in thread_group_resize(), where we directly call wt_hs_cursor_open() to open a HS cursor for new threads. I couldn't find a corresponding place where this cursor would be closed. In practice, this only happens when we start the eviction server threads. MongoDB never changes the number of eviction servers so this leak would only show up at shutdown, where it should be harmless.

This has another side effect in the eviction server code. Shortly after opening the HS cursor, the new eviction server threads enter wt_evict_thread_run() and execute:

    /*
     * Cache a history store cursor to avoid deadlock: if an eviction thread thread marks a file
     * busy and then opens a different file (in this case, the HS file), it can deadlock with a
     * thread waiting for the first file to drain from the eviction queue. See WT-5946 for details.
     */
    if (!F_ISSET(conn, WT_CONN_IN_MEMORY)) {
        session_flags = 0; /* [-Werror=maybe-uninitialized] */
        WT_RET(__wt_hs_cursor(session, &session_flags, &is_owner));
        WT_RET(__wt_hs_cursor_close(session, session_flags, is_owner));
    }

A side effect here is that WT_SESSION_IGNORE_CACHE_SIZE will be added to the session flags and never cleared. Normally that flag is cleared on the last call to wt_hs_cursor_close(). But that won't happen here because the open HS cursor will make these calls think they are nested inside another set of calls to the same functions.

The obvious caveat is that I may be missing something in the code. But this seems odd, and I wanted to document it before I forget it.

UPDATE:

Based on a comment/suggestion by alexander.gorrod, I've expanded the scope of this ticket. In addition to fixing the immediate problems described above, we should eliminate the need for these nested calls to wt_hs_cursor() and wt_hs_cursor_close(). Once nothing in the code needs that functionality, we should change those functions to disallow such nesting. That way we won't have to avoid unexpected surprises in the future if, for example, a nested use of the hs cursor resets or repositions it out from under another user further down on the call stack.

Assignee:: Keith Smith
Reporter:: Keith Smith
Votes:: 0 Vote for this issue
Watchers:: 5 Start watching this issue

Created:: May 20 2020 12:20:08 AM UTC
Updated:: Oct 29 2023 04:43:36 PM UTC
Resolved:: Aug 06 2020 06:22:49 PM UTC

Details

Description

Attachments

Forms

Activity

People

Dates