[SERVER-77901] the replan works can not exceed the number of works that is set to be the fraction of the total collection size Created: 08/Jun/23  Updated: 23/Oct/23  Resolved: 06/Jul/23

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Improvement Priority: Major - P3
Reporter: y yz Assignee: Backlog - Query Optimization
Resolution: Won't Fix Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Assigned Teams:
Query Optimization
Participants:

 Description   

the replan works can not exceed the number of works that is set to be the fraction of the collection size.

in extreme cases, if there are no restrictions, Probably more than the total collection size, this replan may lose effectiveness. it may lead that the query use wrong index.



 Comments   
Comment by y yz [ 23/Oct/23 ]

hi, steve.tarzia@mongodb.com  david.storch@mongodb.com 

I was from a cloud companly, and there were a lot of mongodb clusters on the cloud.
In fact, the above test data that I gave is the real situation of a online cluster we encountered online before (but I used the test data to simulate, to avoid contact with user data, so I used simulated data).

After we first received the user's trouble ticket a year ago, our own kernel code added this PR code and added log printing when works beyond the collection count, but many of our   Inventory mongodb cluster in the cloud are still using the open source version.

Recently, we received another user fault which mongo kernel use the open source version. Through analysis, the reason is the same as this problem. They stored the history trade data, because the index did not replan, which caused the performance failure,The glitch had a big impact.

In recent days, I have collected the mongodb instance logs after our PR optimization (in the last year or more, new instances have used the function of our PR), and through analyzing the logs, we found that about dozens of tables in the last year will trigger the problem.

Therefore,  may be we should evaluate this PR again. it should not be ignored because you have not encountered it. In fact, many problems may be ignored because the impact of the results is not serious, But it's actually real. Even though 7.0 has the new plancache algorithm,  history existing versions of mongo should also eliminate this risk.

 

thanks.

Comment by y yz [ 07/Jul/23 ]

got it, thanks

Comment by Steve Tarzia [ 06/Jul/23 ]

Hi 1147952115@qq.com , thanks for the report and for the PR.

I am closing this ticket and declining your PR for two reasons:

  1. We do not see this as a significant performance issue in realistic scenarios.
  2. The fix you provided is on a plan cache that has is being replaced by a different plan cache.  Most queries in v7.0 and later will use this plan cache: https://github.com/mongodb/mongo/blob/a04e1c1812a28ebfb9a2684859097ade649a1184/src/mongo/db/query/sbe_plan_cache.h#L224-L229 
Comment by y yz [ 03/Jul/23 ]

hi, david.storch@mongodb.com

sometimes the _decisionWorks may is greater than numUpperLimitWorks, even greater than the collection size. for example: We deleted a lot of data in the process of querying the data.
 so we should force adjust the cache entry works and perfect the replan reason.

I perfect the code, see detail:   https://github.com/mongodb/mongo/pull/1553

Comment by David Storch [ 28/Jun/23 ]

Thanks for the details 1147952115@qq.com! The Query Optimization team will review this in more detail during an upcoming triage meeting.

Comment by y yz [ 26/Jun/23 ]

hi, david.storch@mongodb.com

 

The following is an example encountered in an online environment,  The simulation process is as follows:

step1 : for (i = 1; i <= 500000; i++) {db.test.insert({a:1,b:1,c:i })}
step2 : for (i = 1; i <= 500000; i++) {db.test.insert({a:i,b:i,c:i })}
step3 : db.test.find({a:1, b:1, c:300000})
step4:  db.test.find({a:1, b:1, c:300001}) step5:   db.test.createIndex({a:1})step6:   db.test.createIndex({b:1})

get the db.test.getPlanCache().list() as following:

mongodb_5.0_shard1:PRIMARY> db.test.getPlanCache().clear()
mongodb_5.0_shard1:PRIMARY> 
mongodb_5.0_shard1:PRIMARY> db.test.find({a:1, b:1, c:300000})
{ "_id" : ObjectId("649900bb102d460adfbc9241"), "a" : 1, "b" : 1, "c" : 300000 }
mongodb_5.0_shard1:PRIMARY> db.test.find({a:1, b:1, c:300001})
{ "_id" : ObjectId("649900bb102d460adfbc9242"), "a" : 1, "b" : 1, "c" : 300001 }
mongodb_5.0_shard1:PRIMARY> 
mongodb_5.0_shard1:PRIMARY> 
mongodb_5.0_shard1:PRIMARY> db.test.getPlanCache().list()
[
        {
                "queryHash" : "6DC218E7",
                "planCacheKey" : "970136C7",
                "isActive" : true,
                "works" : NumberLong(300000),
                "timeOfCreation" : ISODate("2023-06-26T03:11:24.062Z"),
                "createdFromQuery" : {
                        "query" : {
                                "a" : 1,
                                "b" : 1,
                                "c" : 300001
                        },
                        "sort" : {                        },
                        "projection" : {                        }
                },
                "cachedPlan" : {
                        "stage" : "FETCH",
                        "filter" : {
                                "$and" : [
                                        {
                                                "b" : {
                                                        "$eq" : 1
                                                }
                                        },
                                        {
                                                "c" : {
                                                        "$eq" : 300001
                                                }
                                        }
                                ]
                        },
                        "inputStage" : {
                                "stage" : "IXSCAN",
                                "keyPattern" : {
                                        "a" : 1
                                },
                                "indexName" : "a_1",
                                "isMultiKey" : false,
                                "multiKeyPaths" : {
                                        "a" : [ ]
                                },
                                "isUnique" : false,
                                "isSparse" : false,
                                "isPartial" : false,
                                "indexVersion" : 2,
                                "direction" : "forward",
                                "indexBounds" : {
                                        "a" : [
                                                "[1.0, 1.0]"
                                        ]
                                }
                        }
                },
                "creationExecStats" : [
                        {
                                "nReturned" : 0,
                                "executionTimeMillisEstimate" : 61,
                                "totalKeysExamined" : 300000,
                                "totalDocsExamined" : 300000,
                                "executionStages" : {
                        xxxxxxxxxxxxxxxx
} 

       As you can see from the above information, the works = 300000, when use CachedPlanStage::pickBestPlan to determine Whether to replan, the maxWorksBeforeReplan will be 300000 * internalQueryCacheEvictionRatio(default 10), so it may Exceed the collection size.

this replan will lose effectiveness.

 

we use a new sql: 

mongodb_5.0_shard1:PRIMARY> db.test.insert({a:1, b:123456, c:1})
WriteResult({ "nInserted" : 1 })
mongodb_5.0_shard1:PRIMARY> 
mongodb_5.0_shard1:PRIMARY> db.test.find({a:1, b:123456, c:1})
{ "_id" : ObjectId("6499022836cd1cfa57cde103"), "a" : 1, "b" : 123456, "c" : 1 }
mongodb_5.0_shard1:PRIMARY>  

 

the slow log:

{"t":
{"$date":"2023-06-26T11:42:53.788+08:00"}
,"s":"I",  "c":"COMMAND",  "id":51803,   "ctx":"conn45","msg":"Slow query","attr":{"type":"command","ns":"test.test","appName":"MongoDB Shell","command":{"find":"test","filter":
{"a":1,"b":123456,"c":1}
,"lsid":{"id":{"$uuid":"a558a71d-d148-415c-9aed-3ba7d461ab0c"}},"$clusterTime":{"clusterTime":{"$timestamp":{"t":1687749681,"i":1}},"signature":{"hash":{"$binary":{"base64":"AAAAAAAAAAAAAAAAAAAAAAAAAAA=","subType":"0"}},"keyId":0}},"$db":"test"},"planSummary":"IXSCAN { a: 1 }","keysExamined":500002,"docsExamined":500002,"cursorExhausted":true,"numYields":500,"nreturned":1,"queryHash":"6DC218E7","planCacheKey":"970136C7","reslen":451,"locks":{"FeatureCompatibilityVersion":{"acquireCount":{"r":501}},"Global":{"acquireCount":{"r":501}},"Mutex":{"acquireCount":
{"r":1}
}},"storage":{"data":{"bytesRead":4820441,"timeReadingMicros":1747}},"remote":"127.0.0.1:33522","protocol":"op_msg","durationMillis":664}}    

 

in fact, the sql db.test.find({a:1, b:123456, c:1})'s best index is {b:1}

Comment by y yz [ 23/Jun/23 ]

hi, David Storch, Thank you for your reply.

I will make up a copy of the test data for you in a few days, thank you.

Comment by David Storch [ 22/Jun/23 ]

Hi 1147952115@qq.com! Could you tell us more about the problem that this PR is fixing? Do you have a repro script or some other example in which this can cause a performance problem in practice? The change would make replanning more likely in some contexts – did you experience a bad plan getting stuck in the plan cache? It would be hard for us to justify a change to our replanning algorithm without a clear motivation for how this improves the performance of the system. I suppose the fact that the calculated replanning threshold can exceed the number of documents in the collection is somewhat nonsensical, but it's not something that has come up before as a problem in user environments to my knowledge.

Please let us know if you have any further information about the motivation for this change, thanks!

Comment by Eric Sedor [ 09/Jun/23 ]

Apologies 1147952115@qq.com, I must have overlooked. We'll pass this on for consideration. Thank you!

Comment by y yz [ 09/Jun/23 ]

Hi, Eric Sedor, As in SERVER-71627, I have signed the [Contributor's Agreement].

thanks

Comment by Eric Sedor [ 08/Jun/23 ]

Hi 1147952115@qq.com,

Can you please sign our [Contributor's Agreement](https://www.mongodb.com/legal/contributor-agreement) as part of our considering this PR?

Comment by y yz [ 08/Jun/23 ]

query plancache replan improve, the pushed code address:

     https://github.com/mongodb/mongo/pull/1553

 

Generated at Thu Feb 08 06:36:55 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.