[SERVER-64978] Add metrics to track if a plan was recovered from the plan cache Created: 28/Mar/22 Updated: 29/Oct/23 Resolved: 10/Oct/22 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | 6.2.0-rc0 |
| Type: | Improvement | Priority: | Major - P3 |
| Reporter: | Anton Korshunov | Assignee: | Maddie Zechar |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||
| Participants: | |||||||||
| Description |
|
Currently there is no an easy way to tell if a specific plan has been recovered from the cache or not. The closest match would be to obtain a plan cache key (via explain, slow logs or profiler), find a corresponding plan cache entry with $planCacheStats and see if the cached entry is active or not. If yes, then most likely the plan was recovered from the cache. But this is sort of unreliable way to check it due to a timing window when a plan cache entry has been created and got activated. That is, a plan could be created not from the cache and an inactive cache entry was added to the cache. In the same time a few other queries of a similar shape could be execute, turning the cached plan into active. When the user tries to follow the above procedure to figure out if the first plan was recovered from the cache, they would receive the wrong answer. Instead, we should consider adding a new flag similar to fromMultiPlanner which could honestly indicate if the plan has been recovered from the cache. On a similar note, it would be very handy to track plan cache hits and misses via serverStatus metrics. CC christopher.harris kateryna.kamenieva |
| Comments |
| Comment by Githook User [ 10/Oct/22 ] |
|
Author: {'name': 'Maddie Zechar', 'email': 'mez2113@columbia.edu', 'username': 'madelinezec'}Message: |
| Comment by Maddie Zechar [ 14/Sep/22 ] |
|
Hi kateryna.kamenieva@mongodb.com christopher.harris@mongodb.com, Anton has suggested we also add two new serverStatus metrics counters to accumulate plan cache hits and misses. Is this something either of you would find useful/valuable? If it would, I could perhaps file that as another quick win and do it after this ticket is closed. |
| Comment by Maddie Zechar [ 12/Sep/22 ] |
|
draft PR for a POC of this ticket: https://github.com/10gen/mongo/pull/7490/files |
| Comment by David Storch [ 13/Jul/22 ] |
|
christopher.harris@mongodb.com your understanding sounds more or less correct. However, I do think it's not ideal to infer whether or not a plan was recovered from the cache using the fromMultiPlanner flag. This is because of the single-solution case: it may not be evident from the log line whether the query circumvented the multi-planned because it was recovered from the cache, or whether it was not multi-planned because there was only one query solution available. One important change that happened as part of the SBE plan cache project is that we now cache single-solution plans (see I think this flag would be useful because it would provide incontrovertible proof that a query used the plan cache. This seems better than trying to guess based on the value of the fromMultiPlanner flag. You currently have definitive information when fromMultiPlanner:true appears, but lack the necessary information when it does not appear. |
| Comment by Chris Harris [ 11/Jul/22 ] |
|
I'm "for" this idea in general. While I'm pretty sure there is something here, I must admit that I'm not sure I fully understand all of the details. It may be that I just have a stale understanding of things based on the classic engine which is no longer accurate. My impression was that fromMultiPlanner is the flag to tell if a plan was recovered from the cache or not. Assuming my definition of "recovered from the cache" is correct (which is basically 'used the entry in the cache and did not go through multiplanning), that's definitely how we've been using it in the field. There's always been an asterisk here in that single solution plans aren't cached, but that definitionally means that the multiplanner wasn't used there either which is where the pain points usually are. So if the slow operation is captured (log, profiler), then the presence or absence of fromMultiPlanner is the indicator of plan cache usage today. The explain command bypasses the plan cache, so apart from the suggested workflow to obtain a plan cache key I don't think it is particularly relevant. If there is something more here in terms of logging that would be helpful given either a misunderstanding of my part or associated with changes for SBE (or further upcoming changes), then let's schedule this. Regarding the serverStatus aspect of this - I linked SERVER-40863. I may be biased, but I'm all for getting that one added! |
| Comment by David Storch [ 25/May/22 ] |
|
christopher.harris@mongodb.com do you have an update on this? My feeling is that this is worth scheduling, perhaps even as part of the SBE plan cache project. |
| Comment by Ana Meza [ 05/Apr/22 ] |
|
We would like to get an opinion from Product if we wort doing and timeline/priority for this, without waiting for Quick Wins timelines |