[SERVER-43079] failpoint triggered by LogicalSessionCacheRefresh Created: 29/Aug/19  Updated: 29/Oct/23  Resolved: 02/Oct/19

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: 4.3.1, 4.2.2, 4.0.14

Type: Bug Priority: Major - P3
Reporter: Jeffrey Yemin Assignee: Misha Tyulenev
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Depends
is depended on by SERVER-44815 Adding arbiter to PSS deployment usin... Closed
is depended on by SERVER-44816 LogicalSessionCacheReap segfault Closed
Duplicate
duplicates SERVER-44815 Adding arbiter to PSS deployment usin... Closed
is duplicated by SERVER-42922 failCommand + closeConnection can der... Closed
is duplicated by SERVER-43080 Server seg faults in logical session ... Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Requested:
v4.2, v4.0
Sprint: Sharding 2019-09-09, Sharding 2019-09-23, Sharding 2019-10-07
Participants:
Linked BF Score: 28

 Description   

A LogicalSessionCacheRefresh is triggering a failpoint configured in driver's retryable reads tests.

Here's a log of the failpoint being configured:

2019-08-28T23:50:02.549+0000 W  COMMAND  [conn770] failpoint: failCommand set to: { mode: 3, data: { failCommands: [ "find" ], errorCode: 91 } }
2019-08-28T23:50:02.549+0000 I  COMMAND  [conn770] command admin.$cmd command: configureFailPoint { configureFailPoint: "failCommand", mode: { times: 1 }, data: { failCommands: [ "find" ], errorCode: 91 }, $db: "admin", $clusterTime: { clusterTime: Timestamp(1567036202, 335), signature: { hash: BinData(0, 57D9AC22A95A2C0F6B3FC988600F0EBDAE1A3\
98B), keyId: 6730367942157926402 } }, lsid: { id: UUID("bbcf418f-9ebf-4399-815e-063007855163") } } numYields:0 reslen:163 locks:{} protocol:op_msg 0ms

And here's a log of the failpoint being triggered:

2019-08-28T23:50:02.618+0000 I  COMMAND  [LogicalSessionCacheRefresh] Failing command 'find' via 'failCommand' failpoint. Action: returning error code 91.
2019-08-28T23:50:02.618+0000 D1 -        [LogicalSessionCacheRefresh] User Assertion: ShutdownInProgress: Failing command due to 'failCommand' failpoint src/mongo/db/commands.cpp 534
2019-08-28T23:50:02.619+0000 D1 COMMAND  [LogicalSessionCacheRefresh] assertion while executing command 'find' on database 'config' with arguments '{ find: "system.sessions", filter: { _id: { $in: [ { id: UUID("896c9ac8-e5c4-4205-85c0-506610731f71"), uid: BinData(0, 225D5B72D8EF7D8440D09A4B518E1795ED84E9BC3E28CD58A4CAF8C79DAA1A01) }, { id: UUID("bcc718a3-c2\
ab-44f2-98f9-2bbcf323d8fa"), uid: BinData(0, 225D5B72D8EF7D8440D09A4B518E1795ED84E9BC3E28CD58A4CAF8C79DAA1A01) }, { id: UUID("574be726-cac4-41ca-bd71-c770413c8863"), uid: BinData(0, 225D5B72D8EF7D8440D09A4B518E1795ED84E9BC3E28CD58A4CAF8C79DAA1A01) }, { id: UUID("b25e9f15-87d5-467a-8ccf-de5d88ae4d24"), uid: BinData(0, 225D5B72D8EF7D8440D09A4B518E1795ED84E9BC\
3E28CD58A4CAF8C79DAA1A01) }

Driver tests expect that, by default, internal commands like this do not trigger failpoints.

This is a significant bug for drivers because tests fail in apparently random fashion based on the timing of the background processes running in the server, resulting in difficult to debug test failures.



 Comments   
Comment by Githook User [ 12/Feb/20 ]

Author:

{'name': 'Oleg Pudeyev', 'username': 'p-mongo', 'email': '39304720+p-mongo@users.noreply.github.com'}

Message: remove SERVER-43079 workaround (#1689)

Co-authored-by: Oleg Pudeyev <p@users.noreply.github.com>
Branch: master
https://github.com/mongodb/mongo-ruby-driver/commit/a7ccbb58d20156389f86d8a15d36e9db464f205c

Comment by Githook User [ 17/Oct/19 ]

Author:

{'name': 'Misha Tyulenev', 'email': 'misha.tyulenev@10gen.com'}

Message: SERVER-43079 ignore internal connections in failCommand failpoint

(cherry picked from commit 0507c3ba15231cb60fe319fa4f18f65ec1cbd4f3)
Branch: v4.2
https://github.com/mongodb/mongo/commit/ed8a4f24d9d2816112054d28d48c69a0dc97fc00

Comment by Githook User [ 17/Oct/19 ]

Author:

{'name': 'Misha Tyulenev', 'email': 'misha.tyulenev@10gen.com'}

Message: SERVER-43079 ignore internal connections in failCommand failpoint
Branch: v4.0
https://github.com/mongodb/mongo/commit/b74e7e62d879ccd4143b297d5931e8e0d5bc3fd8

Comment by Jeffrey Yemin [ 03/Oct/19 ]

Drivers test all the way back to MongoDB 2.6 so really as far back as is feasible. But I leave it to you to decide how costly the backports are and weight that in the decision.

Comment by Misha Tyulenev [ 03/Oct/19 ]

jeff.yemin please clarify which releases it needs to be backported.

Comment by Jeffrey Yemin [ 03/Oct/19 ]

Yes, please.

Comment by Misha Tyulenev [ 02/Oct/19 ]

jeff.yemin it should be fixed now in master. Please indicate if need a backport.

Comment by Githook User [ 02/Oct/19 ]

Author:

{'name': 'Misha Tyulenev', 'email': 'misha.tyulenev@10gen.com'}

Message: SERVER-43079 ignore internal connections in failCommand failpoint
Branch: master
https://github.com/mongodb/mongo/commit/0507c3ba15231cb60fe319fa4f18f65ec1cbd4f3

Generated at Thu Feb 08 05:02:12 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.