[SERVER-74421] Propagate maxTimeMS to mongot for $search queries Created: 27/Feb/23  Updated: 26/Dec/23

Status: Backlog
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Task Priority: Major - P3
Reporter: Oren Ovadia Assignee: Backlog - Query Integration
Resolution: Unresolved Votes: 0
Labels: qi-search
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
related to SERVER-63765 Pass batchSize zero to mongot during ... Backlog
Assigned Teams:
Query Integration
Participants:

 Description   

Mongot will be able to reduce the amount of work it does if it is aware of maxTimeMS users specify in $search queries. 

For instance, mongot can discard a query before or during execution if the timeout has passed. 

 



 Comments   
Comment by Nicholas Zolnierz [ 04/May/23 ]

Sending back to QI triage as we are not planning to do this in PM-2802

Comment by Nicholas Zolnierz [ 02/Mar/23 ]

Small note: If we end up implementing SERVER-63765 (which gets us more in line with the way sharded aggregations work between mongos and shards), then there shouldn't be an issue with the first batch because ideally mongot doesn't do any work until a cursor is established.

Comment by Oren Ovadia [ 28/Feb/23 ]

Kevin, good point about killCursors not working for the initial batch. That's a problem given we are going towards the direction of serving as many search queries as possible in the first batch.

Comment by Xiaobo Zhou [ 28/Feb/23 ]

According to the MongoDB gRPC protocol:

Clients SHOULD enforce timeouts client side by closing the stream after a deadline has been reached.

Mongot can kill the running cursor after stream is cancelled by client.

For mongorpc, mongod may terminate the TCP connection instead.

Comment by Kevin Rosendahl [ 27/Feb/23 ]

Definitely agree we should improve the handling of killCursors.

However, I'm not sure that will be sufficient, since you need a cursorId in order to issue a killCursor, so effectively you can only prevent getMores from doing more work and can't cancel an outstanding initial search command. There may be other ways to handle this in theory at the network layer, but simply letting mongot cancel work it knows is useless would definitely be the most elegant, and as Oren mentioned, it's very useful signal for other purposes as well. cc xiaobo.zhou@mongodb.com 

Comment by Oren Ovadia [ 27/Feb/23 ]

From our discussion:
nicholas.zolnierz@mongodb.com said:

mongod respects the maxTimeMS on the overall aggregation, and will kill the mongot cursor appropriately. Does that not work or did you have something else in mind?
.....
There are some situations where mongos or a driver can issue a killCursors

 

Good point Nick.

Today mongot does not know how to terminate an executing query or query in queue. We should fix that. I'll open a ticket for that on mongot. We should also evaluate how this behaves.

There is a difference from mongot's perspective between being aware of maxTimeMS when the query (or getMore) is received compared to having killCursor being called, because it gives mongot this information up front, it can be wiser about whether it puts the timeout guardrails on queries or not, and how often it should be checking those. This may also apply to returning partial results if we ever want to support that. That said, we will still want to have better support for killCursor, in cases where killCursor is explicitly called, or for cases like hedged reads, where we want to reduce redundant work, but we may not want to enforce timeouts up front.

In the decoupled architecture, where a search query may involve more than one RPC, this timeout will be forwarded down the RPC call stack and self enforced by each process, which makes enforcing deadlines simple (FYI: kevin.rosendahl@mongodb.com , not sure I captured your intent on this point, feel free to correct me on this one).

Generated at Thu Feb 08 06:27:23 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.