[SERVER-15815] maxTimeMS on initial tailable cursor query and getMore. Created: 27/Oct/14 Updated: 06/Apr/23 Resolved: 02/Feb/16 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Querying |
| Affects Version/s: | 2.6.5 |
| Fix Version/s: | None |
| Type: | Improvement | Priority: | Major - P3 |
| Reporter: | Dissatisfied Former User | Assignee: | David Storch |
| Resolution: | Duplicate | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||
| Participants: | |||||||||||||||||
| Description |
|
Hello! I ran into an issue (
Without the ability to have controllable timeouts on the operation of tailing cursors I am forced to resort to polling behaviour to implement the timeout myself in Python. This is not a good solution for a wide variety of reasons, least of all for efficiency and added latency. Use case: awaiting messages from distributed RPC calls over a capped collection used as a message bus. A specific example from my codebase is task.result(timeout=None) — return the result of the RPC call, waiting up to timeout seconds for it. (Indefinitely if unspecified.)
|
| Comments |
| Comment by David Storch [ 02/Feb/16 ] |
|
Again, apologies for our delayed response. I believe that MongoDB version 3.2 adds the behavior that you are looking for. In version 3.2, we added new database commands for performing find and getMore operations (see SERVER-15176 for more detail on this). As part of this work, the new getMore command accepts a parameter called maxTimeMS which is legal only for awaitData cursors over capped collections. The getMore command is documented here; in particular, see the documentation for the maxTimeMS parameter. This new feature should allow you to configure the amount of time for which an awaitData cursor will block waiting for capped insertions. The official MongoDB drivers support this new behavior via an option called maxAwaitTimeMS. See I am closing as a duplicate of Best, |
| Comment by Dissatisfied Former User [ 19/Jan/16 ] |
|
A year later, any word? This issue has technically been blocking release of a pure-MongoDB (and superior, I feel) alternative to the Celery distributed task system for four years now. Hasn't blocked fire-and-forget task support, but anything involving waiting on tasks (producers waiting for results, or generator pipelines) dies a horrible death to starvation because of this. Well, mostly the aforementioned doubling of latency, but you get the idea. |
| Comment by Ramon Fernandez Marina [ 10/Dec/14 ] |
|
amcgregor, the fixversion went from "debugging with submitter", which means we're working on our end to verify the issue and its impact, to "2.9 Desired", which means we want to include a fix in the next development version. The fixversion will change again to 2.9.X when this ticket is included in the planning of a specific point release. |
| Comment by Dissatisfied Former User [ 10/Dec/14 ] |
|
I'll be sure to write less detail-filled initial reports in the future. :wry: I notice the Fixversion was updated. Is there hope? |
| Comment by Ramon Fernandez Marina [ 13/Nov/14 ] |
|
alice@gothcandy.com, apologies for the late reply. The "debugging with submitter" fixversion means we're working to verify whether this is a bug or not and the potential impact on users. It is often the case that we ask reporters for more information, but not always. Since you're a watcher on the ticket you'll receive any updates as they happen. |
| Comment by Dissatisfied Former User [ 03/Nov/14 ] |
|
A week has passed with the state in “debugging with submitter” and I have had no communication at all. |
| Comment by Dissatisfied Former User [ 28/Oct/14 ] |
|
While I marked this as an improvement, not operating according to the documentation and expectations provided therein may be “bug-worthy”. As it stands, falling back on polling to implement timeouts is not a long-term viable solution. A half-second retry delay will ensure tasks which take 0.51 seconds to complete can only return data after one full second, reducing performance by half and doubling latency. With a badly tuned capped collection (too small for the activity level) it could make the difference between getting the message and not. |