[SERVER-77901] the replan works can not exceed the number of works that is set to be the fraction of the total collection size Created: 08/Jun/23 Updated: 23/Oct/23 Resolved: 06/Jul/23 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Improvement | Priority: | Major - P3 |
| Reporter: | y yz | Assignee: | Backlog - Query Optimization |
| Resolution: | Won't Fix | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Assigned Teams: |
Query Optimization
|
| Participants: |
| Description |
|
the replan works can not exceed the number of works that is set to be the fraction of the collection size. in extreme cases, if there are no restrictions, Probably more than the total collection size, this replan may lose effectiveness. it may lead that the query use wrong index. |
| Comments |
| Comment by y yz [ 23/Oct/23 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
hi, steve.tarzia@mongodb.com david.storch@mongodb.com I was from a cloud companly, and there were a lot of mongodb clusters on the cloud. After we first received the user's trouble ticket a year ago, our own kernel code added this PR code and added log printing when works beyond the collection count, but many of our Inventory mongodb cluster in the cloud are still using the open source version. Recently, we received another user fault which mongo kernel use the open source version. Through analysis, the reason is the same as this problem. They stored the history trade data, because the index did not replan, which caused the performance failure,The glitch had a big impact. In recent days, I have collected the mongodb instance logs after our PR optimization (in the last year or more, new instances have used the function of our PR), and through analyzing the logs, we found that about dozens of tables in the last year will trigger the problem. Therefore, may be we should evaluate this PR again. it should not be ignored because you have not encountered it. In fact, many problems may be ignored because the impact of the results is not serious, But it's actually real. Even though 7.0 has the new plancache algorithm, history existing versions of mongo should also eliminate this risk.
thanks. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by y yz [ 07/Jul/23 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
got it, thanks | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Steve Tarzia [ 06/Jul/23 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Hi 1147952115@qq.com , thanks for the report and for the PR. I am closing this ticket and declining your PR for two reasons:
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by y yz [ 03/Jul/23 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
sometimes the _decisionWorks may is greater than numUpperLimitWorks, even greater than the collection size. for example: We deleted a lot of data in the process of querying the data. I perfect the code, see detail: https://github.com/mongodb/mongo/pull/1553 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by David Storch [ 28/Jun/23 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Thanks for the details 1147952115@qq.com! The Query Optimization team will review this in more detail during an upcoming triage meeting. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by y yz [ 26/Jun/23 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
The following is an example encountered in an online environment, The simulation process is as follows:
get the db.test.getPlanCache().list() as following:
As you can see from the above information, the works = 300000, when use CachedPlanStage::pickBestPlan to determine Whether to replan, the maxWorksBeforeReplan will be 300000 * internalQueryCacheEvictionRatio(default 10), so it may Exceed the collection size. this replan will lose effectiveness.
we use a new sql:
the slow log:
in fact, the sql db.test.find({a:1, b:123456, c:1})'s best index is {b:1} | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by y yz [ 23/Jun/23 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
hi, David Storch, Thank you for your reply. I will make up a copy of the test data for you in a few days, thank you. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by David Storch [ 22/Jun/23 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Hi 1147952115@qq.com! Could you tell us more about the problem that this PR is fixing? Do you have a repro script or some other example in which this can cause a performance problem in practice? The change would make replanning more likely in some contexts – did you experience a bad plan getting stuck in the plan cache? It would be hard for us to justify a change to our replanning algorithm without a clear motivation for how this improves the performance of the system. I suppose the fact that the calculated replanning threshold can exceed the number of documents in the collection is somewhat nonsensical, but it's not something that has come up before as a problem in user environments to my knowledge. Please let us know if you have any further information about the motivation for this change, thanks! | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Eric Sedor [ 09/Jun/23 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Apologies 1147952115@qq.com, I must have overlooked. We'll pass this on for consideration. Thank you! | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by y yz [ 09/Jun/23 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Hi, Eric Sedor, As in thanks | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Eric Sedor [ 08/Jun/23 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Can you please sign our [Contributor's Agreement](https://www.mongodb.com/legal/contributor-agreement) as part of our considering this PR? | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by y yz [ 08/Jun/23 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
query plancache replan improve, the pushed code address: https://github.com/mongodb/mongo/pull/1553
|