[SERVER-64978] Add metrics to track if a plan was recovered from the plan cache Created: 28/Mar/22  Updated: 29/Oct/23  Resolved: 10/Oct/22

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: 6.2.0-rc0

Type: Improvement Priority: Major - P3
Reporter: Anton Korshunov Assignee: Maddie Zechar
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
related to SERVER-40863 Add a serverStatus metric to track qu... Backlog
Backwards Compatibility: Fully Compatible
Participants:

 Description   

Currently there is no an easy way to tell if a specific plan has been recovered from the cache or not. The closest match would be to obtain a plan cache key (via explain, slow logs or profiler), find a corresponding plan cache entry with $planCacheStats and see if the cached entry is active or not. If yes, then most likely the plan was recovered from the cache. But this is sort of unreliable way to check it due to a timing window when a plan cache entry has been created and got activated. That is, a plan could be created not from the cache and an inactive cache entry was added to the cache. In the same time a few other queries of a similar shape could be execute, turning the cached plan into active. When the user tries to follow the above procedure to figure out if the first plan was recovered from the cache, they would receive the wrong answer.

Instead, we should consider adding a new flag similar to fromMultiPlanner which could honestly indicate if the plan has been recovered from the cache.

On a similar note, it would be very handy to track plan cache hits and misses via serverStatus metrics.

CC christopher.harris kateryna.kamenieva



 Comments   
Comment by Githook User [ 10/Oct/22 ]

Author:

{'name': 'Maddie Zechar', 'email': 'mez2113@columbia.edu', 'username': 'madelinezec'}

Message: SERVER-64978 Add metrics to track if a plan was recovered from the plan cache
Branch: master
https://github.com/mongodb/mongo/commit/dbd868972c766d7a272e7cb3fcd2f3c792b2dd83

Comment by Maddie Zechar [ 14/Sep/22 ]

Hi kateryna.kamenieva@mongodb.com christopher.harris@mongodb.com, Anton has suggested we also add two new serverStatus metrics counters to accumulate plan cache hits and misses. Is this something either of you would find useful/valuable? If it would, I could perhaps file that as another quick win and do it after this ticket is closed.

Comment by Maddie Zechar [ 12/Sep/22 ]

draft PR for a POC of this ticket: https://github.com/10gen/mongo/pull/7490/files

Comment by David Storch [ 13/Jul/22 ]

christopher.harris@mongodb.com your understanding sounds more or less correct. However, I do think it's not ideal to infer whether or not a plan was recovered from the cache using the fromMultiPlanner flag. This is because of the single-solution case: it may not be evident from the log line whether the query circumvented the multi-planned because it was recovered from the cache, or whether it was not multi-planned because there was only one query solution available.

One important change that happened as part of the SBE plan cache project is that we now cache single-solution plans (see SERVER-64315 and related tickets). Therefore, when fromMultiPlanner is absent and there is just one query solution, it is impossible to tell from the log line whether or not the query used a cached plan. You might not use a cached plan if 1) this is the first instance of the query shape since the plan cache was last cleared, 2) the plan cache entry has since been invalidated or evicted due to DDL operations or LRU eviction, or 3) the plan cache exists but is inactive.

I think this flag would be useful because it would provide incontrovertible proof that a query used the plan cache. This seems better than trying to guess based on the value of the fromMultiPlanner flag. You currently have definitive information when fromMultiPlanner:true appears, but lack the necessary information when it does not appear.

Comment by Chris Harris [ 11/Jul/22 ]

I'm "for" this idea in general.  While I'm pretty sure there is something here, I must admit that I'm not sure I fully understand all of the details.  It may be that I just have a stale understanding of things based on the classic engine which is no longer accurate.

My impression was that fromMultiPlanner is the flag to tell if a plan was recovered from the cache or not. Assuming my definition of "recovered from the cache" is correct (which is basically 'used the entry in the cache and did not go through multiplanning), that's definitely how we've been using it in the field.  There's always been an asterisk here in that single solution plans aren't cached, but that definitionally means that the multiplanner wasn't used there either which is where the pain points usually are. 

So if the slow operation is captured (log, profiler), then the presence or absence of fromMultiPlanner is the indicator of plan cache usage today. The explain command bypasses the plan cache, so apart from the suggested workflow to obtain a plan cache key I don't think it is particularly relevant.  

If there is something more here in terms of logging that would be helpful given either a misunderstanding of my part or associated with changes for SBE (or further upcoming changes), then let's schedule this.  

Regarding the serverStatus aspect of this - I linked SERVER-40863.  I may be biased, but I'm all for getting that one added!  

Comment by David Storch [ 25/May/22 ]

christopher.harris@mongodb.com do you have an update on this? My feeling is that this is worth scheduling, perhaps even as part of the SBE plan cache project.

Comment by Ana Meza [ 05/Apr/22 ]

We would like to get an opinion from Product if we wort doing and  timeline/priority for this, without waiting for Quick Wins timelines

Generated at Thu Feb 08 06:01:35 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.