[SERVER-36007] Attempting to check out an already checked out session leads to self-deadlock Created: 07/Jul/18  Updated: 29/Oct/23  Resolved: 08/Aug/18

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: None
Fix Version/s: 4.1.2

Type: Bug Priority: Major - P3
Reporter: Max Hirschhorn Assignee: Jinny Byun
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
Gantt Dependency
has to be done before SERVER-36306 Add FSM workloads that run transactio... Closed
Problem/Incident
is caused by SERVER-35173 Add autocommit value to transaction s... Closed
Related
related to SERVER-36736 Avoid doing work in Client lock when ... Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Steps To Reproduce:

python buildscripts/resmoke.py --suites=no_server already_checked_out_session.js

already_checked_out_session.js

(function() {
    "use strict";
 
    load("jstests/libs/parallelTester.js");
 
    const rst = new ReplSetTest({nodes: 1});
    rst.startSet();
    rst.initiate();
 
    const primary = rst.getPrimary();
    const db = primary.getDB("test");
 
    function doInsertWithSession(host, lsid, txnNumber) {
        try {
            const conn = new Mongo(host);
            const db = conn.getDB("test");
            assert.commandWorked(db.runCommand({
                insert: "mycoll",
                documents: [{_id: txnNumber}],
                lsid: {id: eval(lsid)},
                txnNumber: NumberLong(txnNumber),
            }));
            return {ok: 1};
        } catch (e) {
            return {ok: 0, error: e.toString(), stack: e.stack};
        }
    }
 
    let thread1;
    let thread2;
 
    assert.commandWorked(db.fsyncLock());
    try {
        // JavaScript objects backed by C++ objects (e.g. BSON values) do not serialize correctly
        // when passed through the ScopedThread constructor. To work around this behavior, we
        // instead pass a stringified form of the JavaScript object through the ScopedThread
        // constructor and use eval() to rehydrate it.
        const lsid = UUID();
        thread1 = new ScopedThread(doInsertWithSession, primary.host, tojson(lsid), 0);
        thread1.start();
 
        assert.soon(
            () => {
                const ops = db.currentOp({"command.insert": "mycoll", waitingForLock: true});
                return ops.inprog.length === 1;
            },
            () => {
                return "insert operation was never found to be waiting for a lock: " +
                    tojson(db.currentOp());
            });
 
        thread2 = new ScopedThread(doInsertWithSession, primary.host, tojson(lsid), 1);
        thread2.start();
 
        // XXX: Wait a little bit for thread2 to have sent its insert command to the server or we
        // otherwise won't trigger the bug.
        sleep(5000);
    } finally {
        // We run the fsyncUnlock command in a finally block to avoid leaving the server fsyncLock'd
        // if the test were to fail.
        assert.commandWorked(db.fsyncUnlock());
    }
 
    thread1.join();
    thread2.join();
 
    assert.commandWorked(thread1.returnData());
    assert.commandWorked(thread2.returnData());
 
    rst.stopSet();
})();

Sprint: Repl 2018-07-30, Repl 2018-08-13
Participants:
Linked BF Score: 58

 Description   

The changes from 1447252 as part of SERVER-35173 made it so we acquire the Client::_lock before calling SessionCatalog::checkOutSession(). However, if

pred = [&sri]() { return !sri->checkedOut; }

isn't immediately satisfied, then we'll attempt to acquire the Client::_lock a second time while already holding it.

auto& checkedOutSession = operationSessionDecoration(opCtx);
if (!checkedOutSession) {
    auto sessionTransactionTable = SessionCatalog::get(opCtx);
    // We acquire a Client lock here to guard the construction of this session so that
    // references to this session are safe to use while the lock is held.
    stdx::lock_guard<Client> lk(*opCtx->getClient());
    checkedOutSession.emplace(sessionTransactionTable->checkOutSession(opCtx));

https://github.com/mongodb/mongo/blob/026f69dbf4f98e91b499bde5cb4ce73c332e9549/src/mongo/db/session_catalog.cpp#L257-L260

// Wait until the session is no longer checked out
opCtx->waitForConditionOrInterrupt(
    sri->availableCondVar, ul, [&sri]() { return !sri->checkedOut; });

https://github.com/mongodb/mongo/blob/026f69dbf4f98e91b499bde5cb4ce73c332e9549/src/mongo/db/session_catalog.cpp#L132-L134

Note: The changes from SERVER-35173 exist only on the master branch and this issue therefore does not affect 4.0 or earlier branches.



 Comments   
Comment by Githook User [ 16/Aug/18 ]

Author:

{'username': 'jinichu', 'email': 'jinnybyun@gmail.com', 'name': 'jinichu'}

Message: SERVER-36007 Add JS test to test that checking out an already checked out session doesn't lead to a self-deadlock

(cherry picked from commit 1a32566e3173a309b109f1b53d5fe7dfe6cc704b)
Branch: v4.0
https://github.com/mongodb/mongo/commit/beb1cebf667ea63b3d899500e6d32c2b90aff0f2

Comment by Githook User [ 15/Aug/18 ]

Author:

{'name': 'jinichu', 'email': 'jinnybyun@gmail.com', 'username': 'jinichu'}

Message: SERVER-36007 Add JS test to test that checking out an already checked out session doesn't lead to a self-deadlock
Branch: master
https://github.com/mongodb/mongo/commit/1a32566e3173a309b109f1b53d5fe7dfe6cc704b

Comment by Githook User [ 15/Aug/18 ]

Author:

{'username': 'jinichu', 'email': 'jinnybyun@gmail.com', 'name': 'jinichu'}

Message: SERVER-36007 Fix self-deadlock caused by checking out an already checked out session in SessionCatalog

(cherry picked from commit e3a69aae7d5a45a61ea7796efdeb77b4392597e4)
Branch: v4.0
https://github.com/mongodb/mongo/commit/d95af15bd7894bd572e505e2397fcbb39d877d72

Comment by Tess Avitabile (Inactive) [ 10/Aug/18 ]

jinny.byun, can BF-9860 be closed now?

Comment by Githook User [ 08/Aug/18 ]

Author:

{'username': 'jinichu', 'name': 'jinichu', 'email': 'jinnybyun@gmail.com'}

Message: SERVER-36007 Fixed self-deadlock caused by checking out an already checked out session in SessionCatalog
Branch: master
https://github.com/mongodb/mongo/commit/e3a69aae7d5a45a61ea7796efdeb77b4392597e4

Comment by William Schultz (Inactive) [ 07/Jul/18 ]

Thanks for triaging this max.hirschhorn. We added the client lock acquisitions inside OperationContextSession to make sure that references to this object were safe to access while the associated Client lock was held (for the sake of accessing the Session object from currentOp). Right now, we guard both the construction and destruction of the object with a Client mutex. Technically, we were only intending to guard the modification of the decoration, not actually the session check out. Hopefully this can be fixed by simply moving the lock acquisition to after the session is checked out and before we modify operationSessionDecoration. tess.avitabile

Generated at Thu Feb 08 04:41:44 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.