[SERVER-74421] Propagate maxTimeMS to mongot for $search queries Created: 27/Feb/23 Updated: 26/Dec/23 |
|
| Status: | Backlog |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Task | Priority: | Major - P3 |
| Reporter: | Oren Ovadia | Assignee: | Backlog - Query Integration |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | qi-search | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||
| Assigned Teams: |
Query Integration
|
||||||||
| Participants: | |||||||||
| Description |
|
Mongot will be able to reduce the amount of work it does if it is aware of maxTimeMS users specify in $search queries. For instance, mongot can discard a query before or during execution if the timeout has passed.
|
| Comments |
| Comment by Nicholas Zolnierz [ 04/May/23 ] |
|
Sending back to QI triage as we are not planning to do this in PM-2802 |
| Comment by Nicholas Zolnierz [ 02/Mar/23 ] |
|
Small note: If we end up implementing SERVER-63765 (which gets us more in line with the way sharded aggregations work between mongos and shards), then there shouldn't be an issue with the first batch because ideally mongot doesn't do any work until a cursor is established. |
| Comment by Oren Ovadia [ 28/Feb/23 ] |
|
Kevin, good point about killCursors not working for the initial batch. That's a problem given we are going towards the direction of serving as many search queries as possible in the first batch. |
| Comment by Xiaobo Zhou [ 28/Feb/23 ] |
|
According to the MongoDB gRPC protocol: Clients SHOULD enforce timeouts client side by closing the stream after a deadline has been reached. Mongot can kill the running cursor after stream is cancelled by client. For mongorpc, mongod may terminate the TCP connection instead. |
| Comment by Kevin Rosendahl [ 27/Feb/23 ] |
|
Definitely agree we should improve the handling of killCursors. However, I'm not sure that will be sufficient, since you need a cursorId in order to issue a killCursor, so effectively you can only prevent getMores from doing more work and can't cancel an outstanding initial search command. There may be other ways to handle this in theory at the network layer, but simply letting mongot cancel work it knows is useless would definitely be the most elegant, and as Oren mentioned, it's very useful signal for other purposes as well. cc xiaobo.zhou@mongodb.com |
| Comment by Oren Ovadia [ 27/Feb/23 ] |
|
From our discussion:
Good point Nick. Today mongot does not know how to terminate an executing query or query in queue. We should fix that. I'll open a ticket for that on mongot. We should also evaluate how this behaves. There is a difference from mongot's perspective between being aware of maxTimeMS when the query (or getMore) is received compared to having killCursor being called, because it gives mongot this information up front, it can be wiser about whether it puts the timeout guardrails on queries or not, and how often it should be checking those. This may also apply to returning partial results if we ever want to support that. That said, we will still want to have better support for killCursor, in cases where killCursor is explicitly called, or for cases like hedged reads, where we want to reduce redundant work, but we may not want to enforce timeouts up front. In the decoupled architecture, where a search query may involve more than one RPC, this timeout will be forwarded down the RPC call stack and self enforced by each process, which makes enforcing deadlines simple (FYI: kevin.rosendahl@mongodb.com , not sure I captured your intent on this point, feel free to correct me on this one). |