[SERVER-46161] Reduce frequency of yielding during query execution Created: 14/Feb/20  Updated: 29/Oct/23  Resolved: 22/Feb/20

Status: Closed
Project: Core Server
Component/s: Querying
Affects Version/s: None
Fix Version/s: 4.3.4

Type: Improvement Priority: Major - P3
Reporter: David Storch Assignee: David Storch
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
Problem/Incident
Related
related to SERVER-78556 Return default of internalInsertMaxBa... Open
Backwards Compatibility: Fully Compatible
Sprint: Query 2020-02-24, Query 2020-03-09
Participants:
Linked BF Score: 50

 Description   

By default, any auto-yielding queries will yield every 128 work cycles or 10 milliseconds, whichever comes first. Profile data shows that this yielding process is CPU intensive: we propagate saveState() and restoreState() throughout the execution tree, we call into a special lock manager functions to release and reacquire locks, and we save/restore any storage cursors.

When MMAPv1 was supported, yielding frequently was necessary to allow writers to proceed. Now that the supported storage engines all support document-level concurrency, yielding is only necessary for the following:

  1. It's where our interrupt checks are housed in query execution.
  2. It's where we call abandonSnapshot(), allowing the storage engine to relinquish any resources necessary to hold open the snapshot.
  3. Yielding our intent locks allows operations which require strong database or collection locks to proceed.

I'll address these points one-by-one. First, checking for interrupt regularly is still necessary. But there is no indication that the interrupt check itself is slow. We can simply check for interrupt in between every call to PlanExecutor::work(), decoupling interrupt checking from yielding. Abandoning the snapshot is still necessary, but there's no evidence that doing it less frequently is problematic. Similarly, allowing operations which require strong locks to make progress is still necessary. But the storage team has been working to remove as many strong lock acquisitions as possible. Furthermore, operations which take strong locks are typically not on the hot path, and so it makes sense to block them for up to 10ms in order to allow query workloads to be faster.



 Comments   
Comment by Githook User [ 22/Feb/20 ]

Author:

{'name': 'David Storch', 'username': 'dstorch', 'email': 'david.storch@mongodb.com'}

Message: SERVER-46161 Increase number of PlanExecutor iterations before yielding to 1000.

The new behavior is for queries to yield every 10ms or 1000
iterations, whichever comes first. This change improves
performance on many of our workloads, since yielding is
expensive and happens more frequently than needed on those
workloads.
Branch: master
https://github.com/mongodb/mongo/commit/460acb836445e42960346bdb95a83553db8df018

Comment by Geert Bosch [ 14/Feb/20 ]

I have done experiments with the yield frequency at various points during the 3.4-3.6 timeframe, because I wondered whether this would help for simple queries, such as collection scans with simple predicates and whether we should have different defaults for MMAPv1 and WiredTiger. I did not see any significant benefits from yielding less often, so didn't pursue this. Of course this may have changed. How much time does a yield take compared to executing a work cycle?

There is a balance between the cost of yielding that you described and the cost of not yielding, especially in update-heavy workloads where reads may block on I/O. So, I would advocate for stepwise/gradual change of the query knobs if supported by concrete benchmarking evidence. So, you might want to increase the number of work cycles to 256 or up to 1000.

I'm not sure it would really help to check for interruption at every work. Checking every 10ms seems fine-grained enough.

Generated at Thu Feb 08 05:10:39 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.