[SERVER-71542] Investigate changepoints in bestbuy_4_analytics Created: 22/Nov/22  Updated: 29/Oct/23  Resolved: 29/Nov/22

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: 6.3.0-rc0

Type: Task Priority: Major - P3
Reporter: Steve Tarzia Assignee: Steve Tarzia
Resolution: Fixed Votes: 0
Labels: pm2646-m4
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
Backwards Compatibility: Fully Compatible
Sprint: QE 2022-11-28, QE 2022-12-12
Participants:

 Description   

Performance improved drastically (up to 50,000x) on several tests in bestbuy_4_analytics due to a change sometime between Nov 16th and 18th. Most likely causes are: SERVER-70617 and SERVER-70972.  Note that these are collection scan tests without any indexes defined at all.

The results have been reproduced several times in evergreen, as you can see in the trend charts here: https://spruce.mongodb.com/version/637bb59cc9ec4412eb36da6a/tasks?sorts=STATUS%3AASC%3BBASE_STATUS%3ADESC

Query MQL is listed here: https://github.com/10gen/workloads/blob/master/workloads/bestbuy_analytics.js 



 Comments   
Comment by Steve Tarzia [ 22/Nov/22 ]

I found the issue. A recent DSI change removed the --noIndexRestore from test_control.bestbuy_analytics.yml. We have several test control files for bestbuy and someone copy-pasted the same commands to all of them, but that's not appropriate because some tests have indexes restored and others do not:
https://github.com/10gen/dsi/commit/7dad66e940d6c98137f6cd0fcf036689b8cdf0a2

I'll put up a simple DSI patch to revert this change.

Comment by Steve Tarzia [ 22/Nov/22 ]

ian.boros@mongodb.com good point! Now that I look at my test_control.bestbuy_analytics.yml I see a comment claiming that we don't restore indexes, but the actual mongorestore command is missing the desired --noIndexRestore. I'll investigate this and check the local behavior with indexes.

Comment by Ian Boros [ 22/Nov/22 ]

The error theory sounds plausible. One other thought: Is it at all possible an index is now being used "by accident"? I realize the test doesn't create indexes other than the CSI, but do the indexes created as part of the `mongorestore` stick around?

Comment by Steve Tarzia [ 22/Nov/22 ]

When I run the 1_col_nomatch_scalar query locally (on a scale-1 bestbuy collection), I get a runtime of about 800ms. This involves a full collection scan. There is no conceivable way that the query running in Evergreen should be seeing 1600 operations per second. It previously was getting 0.03 ops per second in Evergreen.

My assumption at this point is that the query is somehow silently failing. I'll add an assertion to the workload and run again in evergreen.

Comment by Steve Tarzia [ 22/Nov/22 ]

Here are a couple of sample queries that changed:

1_col_nomatch_scalar in bestbuy_4_analytics showing a 50,000x speedup:

query:
----------------
[{$match: {'type': 'asdf'}},
 {$group: {_id: '$type', count: {$sum: 1}}}]
 
SBE plan on a recent master build:
----------------
[2] mkbson s11 [_id = s9, count = s10] true false 
[2] group [s9] [s10 = sum(1)] 
[2] project [s9 = (s8 ?: null)] 
[2] project [s8 = s4] 
[1] filter {(traverseF(s4, lambda(l1.0) { ((l1.0 == s7) ?: false) }, false) ?: false)} 
[1] scan s5 s6 none none none none [s4 = type] @"73f5f4f9-adff-4eb0-8527-af25de3c547a" true false 
 
SBE plan on an older build:
-----------------
[2] mkbson s10 [_id = s8, count = s9] true false 
[2] group [s8] [s9 = sum(1)] 
[2] project [s8 = (s7 ?: null)] 
[2] project [s7 = getField(s4, "type")] 
[1] filter {(traverseF(getField(s4, "type"), lambda(l1.0) { ((l1.0 == s6) ?: false) }, false) ?: false)} 
[1] scan s4 s5 none none none none [] @"720c5fb4-50a4-4b7d-a654-af461950944c" true false 

match_3_column in bestbuy_4_analytics showing a 100x speedup:

query:
----------------
[{
    $match: {
        $and: [
            {salePrice: {$gt: 10, $lt: 100}},
            {type: {$eq: "Software"}},
            {digital: {$eq: false}}
        ]
    }
 },
 {$group: {_id: null, avgPrice: {$avg: '$salePrice'}}}]
 
SBE plan on a recent master build:
------------------
[2] mkbson s18 [_id = s13, avgPrice = s17] true false 
[2] project [s17 = 
    if (s16 == 0) 
    then null 
    else (doubleDoubleSumFinalize(s15) / s16) 
] 
[2] group [s13] [s15 = aggDoubleDoubleSum(s14), s16 = sum(
    let [
        l5.0 = s14 
    ] 
    in 
        if ((typeMatch(l5.0, 1088) ?: true) || !(isNumber(l5.0))) 
        then 0 
        else 1 
)] 
[2] project [s14 = s4] 
[2] project [s13 = (null ?: null)] 
[1] filter {(traverseF(s4, lambda(l4.0) { ((l4.0 > s12) ?: false) }, false) ?: false)} 
[1] filter {(traverseF(s4, lambda(l3.0) { ((l3.0 < s11) ?: false) }, false) ?: false)} 
[1] filter {(traverseF(s6, lambda(l2.0) { ((l2.0 == s10) ?: false) }, false) ?: false)} 
[1] filter {(traverseF(s5, lambda(l1.0) { ((l1.0 == s9) ?: false) }, false) ?: false)} 
[1] scan s7 s8 none none none none [s4 = salePrice, s5 = digital, s6 = type] @"73f5f4f9-adff-4eb0-8527-af25de3c547a" true false
 
SBE plan on an old build:
---------------------
[2] mkbson s15 [_id = s10, avgPrice = s14] true false 
[2] project [s14 = 
    if (s13 == 0) 
    then null 
    else (doubleDoubleSumFinalize(s12) / s13) 
] 
[2] group [s10] [s12 = aggDoubleDoubleSum(s11), s13 = sum(
    let [
        l5.0 = s11 
    ] 
    in 
        if ((typeMatch(l5.0, 1088) ?: true) || !(isNumber(l5.0))) 
        then 0 
        else 1 
)] 
[2] project [s11 = getField(s4, "salePrice")] 
[2] project [s10 = (null ?: null)] 
[1] filter {(traverseF(getField(s4, "salePrice"), lambda(l4.0) { ((l4.0 > s9) ?: false) }, false) ?: false)} 
[1] filter {(traverseF(getField(s4, "salePrice"), lambda(l3.0) { ((l3.0 < s8) ?: false) }, false) ?: false)} 
[1] filter {(traverseF(getField(s4, "type"), lambda(l2.0) { ((l2.0 == s7) ?: false) }, false) ?: false)} 
[1] filter {(traverseF(getField(s4, "digital"), lambda(l1.0) { ((l1.0 == s6) ?: false) }, false) ?: false)} 
[1] scan s4 s5 none none none none [] @"720c5fb4-50a4-4b7d-a654-af461950944c" true false 

Generated at Thu Feb 08 06:19:17 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.