[SERVER-36400] Explicitly destroy the client on exiting the run body of each BackgroundJob Created: 01/Aug/18  Updated: 29/Oct/23  Resolved: 06/Aug/18

Status: Closed
Project: Core Server
Component/s: Storage
Affects Version/s: None
Fix Version/s: 4.0.3, 4.1.2

Type: Bug Priority: Major - P3
Reporter: Xiangyu Yao (Inactive) Assignee: Xiangyu Yao (Inactive)
Resolution: Fixed Votes: 0
Labels: neweng
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
Related
is related to SERVER-35985 sessions_test and sharding_catalog_ma... Closed
is related to SERVER-34798 Replace subclasses of ServiceContext ... Closed
is related to SERVER-36473 Make a dedicated RAII class to manage... Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Sprint: Storage NYC 2018-08-13
Participants:
Linked BF Score: 3

 Description   

SERVER-34798 requires all the clients to be destroyed before the destruction of ServiceContext. However, WiredTigerCheckpointThread destroys its client asynchronously and could have a race condition with the main thread because in background.cpp:

    {
        // It is illegal to access any state owned by this BackgroundJob after leaving this
        // scope, with the exception of the call to 'delete this' below.
        stdx::unique_lock<stdx::mutex> l(_status->mutex);
        _status->state = Done;
        _status->done.notify_all();
    }
 
    if (selfDelete)
        delete this;
}

We set the state to be "Done" before the thread_local client gets destroyed because the thread is still running. But setting the state to be "Done" and notifying would unblock the main thread which could go all the way to the destructor of ServiceContext. Therefore, we could have a situation where the client of WTCheckpointThread gets destroyed by its thread after ServiceContext gets destroyed by main thread.

The way to reproduce BF-10032 is adding a big sleep here.

The fix should be similar to SERVER-35985: Add a ON_BLOCK_EXIT in the run() function of WTCheckPointThread

We should check other BackgroundJobs which create clients in their run() function.



 Comments   
Comment by Githook User [ 13/Sep/18 ]

Author:

{'name': 'Henrik Edin', 'email': 'henrik.edin@mongodb.com', 'username': 'henrikedin'}

Message: SERVER-34798 Remove ServiceContext subclasses and use new ServiceContext in every unit test.

This patch does several loosely related and surprisingly hard to separate things.

1.) Make the ServiceContext class final

2.) Create a mechanism, called ConstructorActions, for running methods on
ServiceContexts immediately after they're built and immediately before they're
destroyed.

3.) Introduce / improve test fixture base classes for tests, giving them fresh
ServiceContext instances for each test case. There is one fixture for tests that
need a storage engine and another for those that do not.

4.) Make several remaining global variables SC decorations in support of (3)

5.) Replace many MONGO_INITIALIZERS that access getGlobalServiceContext with the
new constructor-actions system, which is needed for (3.)

6.) Fix up tests to use the fixtures from (3) and fix tests that silently used
different service contexts in together in a technically illegal fashion that now
breaks.

7.) Utilize (2) as necessary to simplify initialization of new ServiceContexts,
simplifying the fixtures in (3).

(cherry picked from commit d520be0814492c262515cf0a5d62a127ace70dce)

SERVER-35985 Destroy clients started in other threads.

(cherry picked from commit 9a68eb0cc65a93233b4ff5746330f9eb77de9b90)

SERVER-36258 Construct ServiceContext after mongo initializers complete.

(cherry picked from commit bfe170e49b1dc10b2badde45bc13c057a2f8ab61)

SERVER-36400 Explicitly destroy the client on exiting run() of each BackgroundJob

(cherry picked from commit b079e4713d897b5541c2804386025817ec720800)

SERVER-36351 Fix so ServiceContextMongoDTest removes the temp directory in its destructor.

(cherry picked from commit 4c16f0f336f4db77034e8aa594bbd4a5bac3f40c)

SERVER-36347 Fix parse_zone_info.py after ServiceContext refactor.

(cherry picked from commit c9d4204b6243e5eee6fe0b5e2c34d02af9ac5edb)
Branch: v4.0
https://github.com/mongodb/mongo/commit/58feaec9c55629ba253b1ff013736eb8b8e9c79d

Comment by Githook User [ 06/Aug/18 ]

Author:

{'username': 'xy24', 'name': 'Xiangyu Yao', 'email': 'xiangyu.yao@mongodb.com'}

Message: SERVER-36400 Explicitly destroy the client on exiting run() of each BackgroundJob
Branch: master
https://github.com/mongodb/mongo/commit/b079e4713d897b5541c2804386025817ec720800

Comment by Xiangyu Yao (Inactive) [ 06/Aug/18 ]

I've created SERVER-36473 to do this work.

Comment by Andy Schwerin [ 06/Aug/18 ]

xiangyu.yao, yeah, I think that's probably the best choice. I'd like to see if we can make Client::initThread private / accessible only to ScopedClient, and maybe get rid of Client::initThreadIfNotAlready, to further improve the situation, but maybe that's a longer term improvement. People do a lot of copy-paste / coding-by-example, so eliminating the bad pattern would be ideal, if it's not more than a few hours work.

Comment by Xiangyu Yao (Inactive) [ 06/Aug/18 ]

schwerin Is it something like:

/**
 * RAII-style Client helper.
 */
class ScopedClient {
public:
    explicit ScopedClient(StringData desc) {
        Client::initThread(desc);
    }   
    ~ScopedClient() {
        Client::destroy();
    }   
};

Comment by Andy Schwerin [ 06/Aug/18 ]

Perhaps it's time to make a dedicated RAII class to manage Client lifetime on threads. I have ideas.

Generated at Thu Feb 08 04:42:59 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.