[SERVER-32539] Suboptimal cached query plan may not be evicted for multi-key sort Created: 04/Jan/18 Updated: 06/Dec/22 |
|
| Status: | Backlog |
| Project: | Core Server |
| Component/s: | Querying |
| Affects Version/s: | 3.4.10 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Minor - P4 |
| Reporter: | Jonathan Roy | Assignee: | Backlog - Query Execution |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | query-44-grooming | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: |
Server 3.4.10 on Ubuntu 16.04.3 LTS |
||
| Issue Links: |
|
||||||||
| Assigned Teams: |
Query Execution
|
||||||||
| Operating System: | ALL | ||||||||
| Participants: | |||||||||
| Description |
Original SummarySorting used multi-key in one direction only, explain() provides wrong info Original DescriptionI was looking at the slow query log tonight and decided to try to optimize some of them. One simple query was:
to get the most recent itme with that IP address. Turns out this was taking 4-10 seconds. So I added a multi-index key for ip,_id so it could use that to optimize the search/sort. After much hair pulling, I finally realized the sort _id: 1 direction was instant, but the _id: -1 was the slow one. I tried changing my index to be ip:1,_id:-1 and ip:-1,_id:-1 with no help. Sort '1' was always fast, '-1' always slow. I wrote about this all here as well when at my wit's end: https://www.reddit.com/r/mongodb/comments/7nyx9b/using_indexes_to_reverse_sort_a_query/ What I found is that the optimizer was only using the proper key when sorting with '1' and never when sorting with '-1'. explain() mislead me however. Explain would say this:
Only changing "direction" to "backwards" if I tried the query in the opposite sort direction. However, I turned on profiling and ran the slow sort direction, and I get this:
Which revealed it wasn't using the multi-key index at all. I vaguely remembered the hint command, and making my query more like this:
solved it. Results are now instant. I realize the query optimizer can't make every query perfect and that is why hint() exists, but it seemed odd it'd work for one direction but not the other, and that explain() would lie to me about what was going to happen vs what the profiling actually showed. |
| Comments |
| Comment by David Storch [ 16/Aug/19 ] | |||
|
roy7, I'm returning to this report as part of a sweep of the MongoDB Server Query Team's ticket backlog. As Kelsey mentioned above, the log lines you have provided to us suggest that the bad plan is being generated from the plan cache. It doesn't look like we have enough information in this ticket to determine how that bad plan was cached in the first place, or why it was not evicted. Do you have any further details about how we might be able to reproduce this issue, possibly with an artificial data set? Finally, the 4.2 GA release includes | |||
| Comment by Jonathan Roy [ 04/Jan/18 ] | |||
|
We'd prefer not to share the database itself since it's full of contributor's IP addresses, but if you need any more detailed logs/etc I'm happy to help if you explain the exact things you want me to do. | |||
| Comment by Jonathan Roy [ 04/Jan/18 ] | |||
|
Here's some example log items from /var/log/mongodb/mongod.log before using the hint. I'll sanitize the IP addresses out. I'm checking with the developer if we can securely give you the data file in question so you can reproduce this locally. When you say mongod logs at loglevel 1, did you mean the below or something else? Thanks.
| |||
| Comment by Kelsey Schubert [ 04/Jan/18 ] | |||
|
Hi roy7, Thanks for reporting this issue. It appears that the behavior you are observing is the result of suboptimal plan being cached. The output from Explain bypasses the cache, which is why you're observing different query plans from the profiler and explain. We'll need to dig deeper to understand why the query plan is not being reevaluated/evicted in favor of the query solution you hint at. It would greatly speed our investigation if you were able to to provide a dataset and query that reproduces this issue. Is it possible to to provide the data files from the affected mongod as well as the complete mongod logs at loglevel 1? I would be happy to create a secure upload portal to ensure that your files would only be visible to MongoDB employees investigating this issue. Thank you, |