[SERVER-31578] Add a query parameter to limit the time an operation runs in the server Created: 16/Oct/17  Updated: 06/Dec/22  Resolved: 20/Oct/17

Status: Closed
Project: Core Server
Component/s: Querying
Affects Version/s: None
Fix Version/s: None

Type: Improvement Priority: Major - P3
Reporter: Conchi Bueno Assignee: Backlog - Query Team (Inactive)
Resolution: Won't Fix Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
related to JAVA-2795 Add a query parameter to limit the ti... Closed
Assigned Teams:
Query
Participants:
Case:

 Description   

Add a new query parameter, similar to maxTimeMS, with the purpose of limiting how long an operation runs in the database from the moment it reaches the server. Including blocking time and being a hard limit timeout for the operation.



 Comments   
Comment by Asya Kamsky [ 03/Nov/17 ]

It sounds like there's a consensus that this configuration/feature must live on the client/driver level. Since the SERVER project is to track features in the server, it's not appropriate to re-open this ticket, but we should open a feature request/tracking ticket in jira - possible DRIVERS project.

We can link it here for related discussion and context.

Comment by Naga Mayakuntla [ 20/Oct/17 ]

Cassandra example :
session.execute(
new SimpleStatement("TRUNCATE tmp").setReadTimeoutMillis(100));

Comment by Charlie Swanson [ 20/Oct/17 ]

Can you provide an example?

Comment by Naga Mayakuntla [ 20/Oct/17 ]

When i said other databases support, i meant they support from client point of view.

Comment by Charlie Swanson [ 20/Oct/17 ]

nmayakuntla In general I don't see how we or any other database can provide a guarantee that we will return within X milliseconds, since any arbitrary system call or mutex acquisition might stall indefinitely. Additionally, there could be an arbitrary delay in the network returning to your application, so any kind of operation timeout cannot provide a guarantee to the application that it will receive a response within that time frame. This maxTimeMS mechanism was not designed to provide that behavior, and I don't think it's really possible to achieve.

If you'd like we could repurpose this ticket into one whose purpose is to periodically check for interrupt while yielded, which would alleviate the particular scenario described here.

Comment by Naga Mayakuntla [ 20/Oct/17 ]

1. Isn't maxTimeMS not reliable from application point of view. If the time to acquire lock takes say 400 milliseconds where as application has only 200 milliseconds to fetch the data, how would it work from application point of view.

2. SocketTimeout cannot be an option : In an application where there are several usecases/flows, which means different SLA's for different queries/flows/usecases, how can socket timeout be used as a way to get reliable response times for different flows. Can you set socket timeout at operation level?

Couchbase, Memcache, Cassandra supports operation level timeouts and they help improve application resiliency when database or network have issues.

Comment by Ian Whalen (Inactive) [ 20/Oct/17 ]

As Charlie described above, we don't believe that this is different from the existing maxTimeMS option.

Comment by Asya Kamsky [ 20/Oct/17 ]

I wonder if this could be handled on the driver side though - if it were timing a request sent, it could return timeout error to the client knowing eventually the op will be killed?

Comment by Charlie Swanson [ 16/Oct/17 ]

It looks like the implication is that maxTimeMS does not include time spent waiting for locks. This is not accurate. The timer tracking maxTimeMS does not stop while the operation is yielded or otherwise waiting for locks. However, it is possible an operation can exceed the maxTimeMS because the operation only periodically checks for interrupts. For a query, this most commonly happens while yielding. So the following scenario could happen:

  1. Operation decides it is time to yield.
  2. Operation checks for interrupt, is at 150ms with a 200ms maxTimeMS.
  3. Operation decides the timeout has not expired, so yields all locks and tries to re-acquire them before proceeding.
  4. For some reason it takes 500ms to acquire the locks again.
  5. The operation resumes control, maybe does some more work, or maybe immediately checks for interrupt - in either case it will soon realize its maxTimeMS has expired, and return an error to the client, having spent approximately 650ms processing, despite the 200ms 'budget'.

So I think this request is really asking for is to eliminate the behavior of 'periodic interrupt checking', somewhat at the developer's intuition/discretion, and instead somehow ensure that the operation is killed 'exactly' when the deadline hits. I'm not sure how feasible this goal is... Conceivably any thread could be blocked indefinitely, whether waiting for a lock, or waiting for the OS to return from a system call. The above series of events could also happen if some other operation (allocating memory, reading from disk, etc.) stalls for a number of milliseconds, so we'll never be able to abide by the deadline perfectly. We could potentially make more of an effort to periodically check for interrupts while we are waiting for something that we know might take a while, like acquiring a lock, though I don't think we could really ever offer a guarantee that we will take no more than X milliseconds and get back to you with a response within that amount of time.

Generated at Thu Feb 08 04:27:31 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.