[SERVER-67722] Shard cursor is not killed on MaxTimeMSExpired Created: 30/Jun/22  Updated: 29/Oct/23  Resolved: 22/Dec/22

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: 6.3.0-rc0, 6.0.5

Type: Bug Priority: Major - P3
Reporter: Romans Kasperovics Assignee: Romans Kasperovics
Resolution: Fixed Votes: 0
Labels: greenerbuild
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Depends
Duplicate
is duplicated by SERVER-69274 .maxTimeMs leaks shard cursor Closed
Problem/Incident
causes SERVER-72798 Only auto-cleanup cursors that have b... Closed
Related
is related to SERVER-43155 Queries which exceed maxTimeMS may re... Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Requested:
v6.2, v6.0
Steps To Reproduce:

The test max_time_ms_does_not_leak_shard_cursor.js fails sporadically and we might need to find a way to stabilise the reproduction of the problem.

Sprint: QE 2022-10-31, QE 2022-11-14, QE 2022-11-28, QE 2022-12-12, QE 2022-12-26
Participants:
Linked BF Score: 0

 Description   

In SERVER-62710 I've discovered 2 reasons why a shard cursor might not be killed explicitly by mongos, and I've added the max_time_ms_does_not_leak_shard_cursor.js to test that. Unfortunately, the test fails regularly and sporadically (BF-24684) on all possible build variants.

This means there are few more reasons why a shard cursor might still remain alive. In some of BFGs the shard response contains the MaxTimeMSExpired error, but the shard cursor is not deleted. In some other BFGs, the shard returns NetworkInterfaceExceededTimeLimit instead of MaxTimeMSExpired.

Possible solution idea: the remaining time is transferred from the cursor to the opCtx. Perhaps, the timeout should be removed from opCtx at the end of getMore.



 Comments   
Comment by Githook User [ 25/Jan/23 ]

Author:

{'name': 'Romans Kasperovics', 'email': 'romans.kasperovics@mongodb.com', 'username': 'romanskas'}

Message: SERVER-67722 Clean-up shard cursors on MaxTimeMSExpired
Branch: v6.0
https://github.com/mongodb/mongo/commit/7f5b43a5f8900ef0e29d762818a07290fa99c98d

Comment by Githook User [ 22/Dec/22 ]

Author:

{'name': 'Romans Kasperovics', 'email': 'romans.kasperovics@mongodb.com', 'username': 'romanskas'}

Message: SERVER-67722 Add special handling for MaxTimeMSExpired in shard cursor lifecycle
Branch: master
https://github.com/mongodb/mongo/commit/f293955ab3c651cf79217fcf3e9c5fbd4b2ce541

Comment by Romans Kasperovics [ 05/Sep/22 ]

As expected, MaxTimeMSExpired can be thrown outside getMore inner body in a shard. In this case, the shard returns MaxTimeMSExpired, but the cursor is still alive. Here is a way to reproduce it by using the waitBeforeUnpinningOrDeletingCursorAfterGetMoreBatch failpoint:

const curs = coll.find().batchSize(2).maxTimeMS(100);
assert.eq(getIdleCursors(st.shard0, collName).length, 0);
const fp = configureFailPoint(st.shard0,
                              "waitBeforeUnpinningOrDeletingCursorAfterGetMoreBatch",
                              {shouldCheckForInterrupt: true},
                              "alwaysOn");
assert.throwsWithCode(() => {
    curs.itcount();
}, ErrorCodes.MaxTimeMSExpired);
fp.off();
assertNoIdleCursors(st.shard0, collName);

Generated at Thu Feb 08 06:08:53 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.