[SERVER-31578] Add a query parameter to limit the time an operation runs in the server Created: 16/Oct/17 Updated: 06/Dec/22 Resolved: 20/Oct/17 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Querying |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Improvement | Priority: | Major - P3 |
| Reporter: | Conchi Bueno | Assignee: | Backlog - Query Team (Inactive) |
| Resolution: | Won't Fix | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||
| Assigned Teams: |
Query
|
||||||||
| Participants: | |||||||||
| Case: | (copied to CRM) | ||||||||
| Description |
|
Add a new query parameter, similar to maxTimeMS, with the purpose of limiting how long an operation runs in the database from the moment it reaches the server. Including blocking time and being a hard limit timeout for the operation. |
| Comments |
| Comment by Asya Kamsky [ 03/Nov/17 ] |
|
It sounds like there's a consensus that this configuration/feature must live on the client/driver level. Since the SERVER project is to track features in the server, it's not appropriate to re-open this ticket, but we should open a feature request/tracking ticket in jira - possible DRIVERS project. We can link it here for related discussion and context. |
| Comment by Naga Mayakuntla [ 20/Oct/17 ] |
|
Cassandra example : |
| Comment by Charlie Swanson [ 20/Oct/17 ] |
|
Can you provide an example? |
| Comment by Naga Mayakuntla [ 20/Oct/17 ] |
|
When i said other databases support, i meant they support from client point of view. |
| Comment by Charlie Swanson [ 20/Oct/17 ] |
|
nmayakuntla In general I don't see how we or any other database can provide a guarantee that we will return within X milliseconds, since any arbitrary system call or mutex acquisition might stall indefinitely. Additionally, there could be an arbitrary delay in the network returning to your application, so any kind of operation timeout cannot provide a guarantee to the application that it will receive a response within that time frame. This maxTimeMS mechanism was not designed to provide that behavior, and I don't think it's really possible to achieve. If you'd like we could repurpose this ticket into one whose purpose is to periodically check for interrupt while yielded, which would alleviate the particular scenario described here. |
| Comment by Naga Mayakuntla [ 20/Oct/17 ] |
|
1. Isn't maxTimeMS not reliable from application point of view. If the time to acquire lock takes say 400 milliseconds where as application has only 200 milliseconds to fetch the data, how would it work from application point of view. 2. SocketTimeout cannot be an option : In an application where there are several usecases/flows, which means different SLA's for different queries/flows/usecases, how can socket timeout be used as a way to get reliable response times for different flows. Can you set socket timeout at operation level? Couchbase, Memcache, Cassandra supports operation level timeouts and they help improve application resiliency when database or network have issues. |
| Comment by Ian Whalen (Inactive) [ 20/Oct/17 ] |
|
As Charlie described above, we don't believe that this is different from the existing maxTimeMS option. |
| Comment by Asya Kamsky [ 20/Oct/17 ] |
|
I wonder if this could be handled on the driver side though - if it were timing a request sent, it could return timeout error to the client knowing eventually the op will be killed? |
| Comment by Charlie Swanson [ 16/Oct/17 ] |
|
It looks like the implication is that maxTimeMS does not include time spent waiting for locks. This is not accurate. The timer tracking maxTimeMS does not stop while the operation is yielded or otherwise waiting for locks. However, it is possible an operation can exceed the maxTimeMS because the operation only periodically checks for interrupts. For a query, this most commonly happens while yielding. So the following scenario could happen:
So I think this request is really asking for is to eliminate the behavior of 'periodic interrupt checking', somewhat at the developer's intuition/discretion, and instead somehow ensure that the operation is killed 'exactly' when the deadline hits. I'm not sure how feasible this goal is... Conceivably any thread could be blocked indefinitely, whether waiting for a lock, or waiting for the OS to return from a system call. The above series of events could also happen if some other operation (allocating memory, reading from disk, etc.) stalls for a number of milliseconds, so we'll never be able to abide by the deadline perfectly. We could potentially make more of an effort to periodically check for interrupts while we are waiting for something that we know might take a while, like acquiring a lock, though I don't think we could really ever offer a guarantee that we will take no more than X milliseconds and get back to you with a response within that amount of time. |