[SERVER-41278] FSM killSession helper should not kill sessions being run by background hooks Created: 22/May/19  Updated: 29/Oct/23  Resolved: 07/Aug/19

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: None
Fix Version/s: 4.3.1, 4.2.6

Type: Bug Priority: Major - P3
Reporter: Gregory Wlodarek Assignee: Jack Mulrow
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Depends
Related
related to SERVER-40183 Create kill_sessions version of multi... Closed
is related to SERVER-44473 Disable implicit sessions for the bac... Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Requested:
v4.2
Sprint: Sharding 2019-07-15, Sharding 2019-07-29, Sharding 2019-08-12
Participants:
Linked BF Score: 14

 Description   

In SERVER-40183, the FSM kill session helper which kills a random session from config.system.sessions collection was introduced. A drawback of killing a random session is that it could inadvertently kill sessions being run by background hooks, like the checkReplDbhashBackgroundThread hook that runs periodically in the background during the test.

In BF-13152, we see that the checkReplDbhashBackgroundThread was chosen to be killed by the FSM test and it caused the hook to fail, resulting in the test failure.



 Comments   
Comment by Kelly Lewis [ 06/Apr/20 ]

Hi jack.mulrow, with this ticket and BACKPORT-5032 complete, are you able to close BF-13481 and BF-13152?

Comment by Githook User [ 06/Apr/20 ]

Author:

{'name': 'Jack Mulrow', 'email': 'jack.mulrow@mongodb.com', 'username': 'jsmulrow'}

Message: SERVER-41278 FSM dbhash background check shouldn't use sessions outside of error retry loop

(cherry picked from commit 5a385bb97b9af3d2c02996bc25c121198e1d9d54)
Branch: v4.2
https://github.com/mongodb/mongo/commit/f49bac6306ed96f3cb0db2e1895d706710eeb8a3

Comment by Githook User [ 07/Aug/19 ]

Author:

{'name': 'Jack Mulrow', 'username': 'jsmulrow', 'email': 'jack.mulrow@mongodb.com'}

Message: SERVER-41278 FSM dbhash background check shouldn't use sessions outside of error retry loop
Branch: master
https://github.com/mongodb/mongo/commit/5a385bb97b9af3d2c02996bc25c121198e1d9d54

Comment by Max Hirschhorn [ 23/May/19 ]

Other parts of the run_check_repl_dbhash_background.js hook were intended to be compatible with the snapshot_read_kill_operations.js FSM workload and FSM workloads which use the killSession() helper function.

const isTransientError = (e) => {
    // It is possible for the ReplSetTest#getHashesUsingSessions() function to be
    // interrupted due to active sessions being killed by a test running concurrently.
    // We treat this as a transient error and simply retry running the dbHash check.
    //
    // Note that unlike auto_retry_transaction.js, we do not treat CursorKilled or
    // CursorNotFound error responses as transient errors because the
    // run_check_repl_dbhash_background.js hook would only establish a cursor via
    // ReplSetTest#getCollectionDiffUsingSessions() upon detecting a dbHash mismatch. It
    // is presumed to still useful to know that a bug exists even if we cannot get more
    // diagnostics for it.
    if (e.code === ErrorCodes.Interrupted) {
        hasTransientError = true;
    }
 
    ...
};

I don't believe there is a good way for FSM workloads to restrict the logical sessions they should be able to kill. (The only basic check they do is to avoid killing themselves.) I think it'd be easier to have the run_check_repl_dbhash_background.js hook disable implicit sessions so the logic outside of the retry-on-Interrupted error responses loop isn't impacted by concurrent FSM workloads.

Generated at Thu Feb 08 04:57:18 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.