Loading...

XML

Word

Printable

JSON

Type: Bug
Resolution: Fixed
Priority: Major - P3
Fix Version/s: 4.0.3, 4.1.2
Affects Version/s: None
Component/s: Storage
Labels:
- neweng

Backwards Compatibility:
Fully Compatible
Operating System:
ALL
Sprint:
Storage NYC 2018-08-13
Linked BF Score:
3
Confidence Status:
None
Work Order:
3
CAR Domain/s:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

~~SERVER-34798~~ requires all the clients to be destroyed before the destruction of ServiceContext. However, WiredTigerCheckpointThread destroys its client asynchronously and could have a race condition with the main thread because in background.cpp:

Unable to find source-code formatter for language: c++. Available languages are: actionscript, ada, applescript, bash, c, c#, c++, cpp, css, erlang, go, groovy, haskell, html, java, javascript, js, json, lua, none, nyan, objc, perl, php, python, r, rainbow, ruby, scala, sh, sql, swift, visualbasic, xml, yaml

    {
        // It is illegal to access any state owned by this BackgroundJob after leaving this
        // scope, with the exception of the call to 'delete this' below.
        stdx::unique_lock<stdx::mutex> l(_status->mutex);
        _status->state = Done;
        _status->done.notify_all();
    }

    if (selfDelete)
        delete this;
}

We set the state to be "Done" before the thread_local client gets destroyed because the thread is still running. But setting the state to be "Done" and notifying would unblock the main thread which could go all the way to the destructor of ServiceContext. Therefore, we could have a situation where the client of WTCheckpointThread gets destroyed by its thread after ServiceContext gets destroyed by main thread.

The way to reproduce BF-10032 is adding a big sleep here.

The fix should be similar to ~~SERVER-35985~~: Add a ON_BLOCK_EXIT in the run() function of WTCheckPointThread

We should check other BackgroundJobs which create clients in their run() function.

is related to

SERVER-35985 sessions_test and sharding_catalog_manager_test don't destroy all Clients before destroying the ServiceContext

Closed

SERVER-34798 Replace subclasses of ServiceContext with decorations and flexible initialization code

Closed

SERVER-36473 Make a dedicated RAII class to manage Client lifetime

Closed

Assignee:: Xiangyu Yao (Inactive)
Reporter:: Xiangyu Yao (Inactive)
Participants:: Andy Schwerin, Githook User, Xiangyu Yao
Votes:: 0 Vote for this issue
Watchers:: 4 Start watching this issue

Created:: Aug 01 2018 07:18:26 PM UTC
Updated:: Oct 29 2023 10:29:22 PM UTC
Resolved:: Aug 06 2018 10:39:12 PM UTC
Confidence Status Last Update:: 03/Aug/18 3:06 PM

Details

Description

Attachments

Issue Links

Forms

Activity

People

Dates