[SERVER-46161] Reduce frequency of yielding during query execution Created: 14/Feb/20 Updated: 29/Oct/23 Resolved: 22/Feb/20 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Querying |
| Affects Version/s: | None |
| Fix Version/s: | 4.3.4 |
| Type: | Improvement | Priority: | Major - P3 |
| Reporter: | David Storch | Assignee: | David Storch |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||||||
| Sprint: | Query 2020-02-24, Query 2020-03-09 | ||||||||||||||||
| Participants: | |||||||||||||||||
| Linked BF Score: | 50 | ||||||||||||||||
| Description |
|
By default, any auto-yielding queries will yield every 128 work cycles or 10 milliseconds, whichever comes first. Profile data shows that this yielding process is CPU intensive: we propagate saveState() and restoreState() throughout the execution tree, we call into a special lock manager functions to release and reacquire locks, and we save/restore any storage cursors. When MMAPv1 was supported, yielding frequently was necessary to allow writers to proceed. Now that the supported storage engines all support document-level concurrency, yielding is only necessary for the following:
I'll address these points one-by-one. First, checking for interrupt regularly is still necessary. But there is no indication that the interrupt check itself is slow. We can simply check for interrupt in between every call to PlanExecutor::work(), decoupling interrupt checking from yielding. Abandoning the snapshot is still necessary, but there's no evidence that doing it less frequently is problematic. Similarly, allowing operations which require strong locks to make progress is still necessary. But the storage team has been working to remove as many strong lock acquisitions as possible. Furthermore, operations which take strong locks are typically not on the hot path, and so it makes sense to block them for up to 10ms in order to allow query workloads to be faster. |
| Comments |
| Comment by Githook User [ 22/Feb/20 ] |
|
Author: {'name': 'David Storch', 'username': 'dstorch', 'email': 'david.storch@mongodb.com'}Message: The new behavior is for queries to yield every 10ms or 1000 |
| Comment by Geert Bosch [ 14/Feb/20 ] |
|
I have done experiments with the yield frequency at various points during the 3.4-3.6 timeframe, because I wondered whether this would help for simple queries, such as collection scans with simple predicates and whether we should have different defaults for MMAPv1 and WiredTiger. I did not see any significant benefits from yielding less often, so didn't pursue this. Of course this may have changed. How much time does a yield take compared to executing a work cycle? There is a balance between the cost of yielding that you described and the cost of not yielding, especially in update-heavy workloads where reads may block on I/O. So, I would advocate for stepwise/gradual change of the query knobs if supported by concrete benchmarking evidence. So, you might want to increase the number of work cycles to 256 or up to 1000. I'm not sure it would really help to check for interruption at every work. Checking every 10ms seems fine-grained enough. |