[SERVER-84887] Investigate performance of artificial query on SSB dataset Created: 14/Oct/22  Updated: 12/Jan/24  Resolved: 27/Oct/22

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Task Priority: Major - P3
Reporter: Ian Boros Assignee: Mohammad Dashti (Inactive)
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: File performance.py     File ssb-improved-corrected.yaml     File ssb-improved.yaml     File ssb-orig-corrected.yaml     File ssb-orig.yaml     File ssb.yaml    
Issue Links:
Related
Participants:

 Description   

Here is the query:

 

db.order_lineitem.aggregate(
[
    {
        $match: {
            $and: [
                { "order.o_orderdate": { $gte: ISODate("1992-01-01") } },
                { "order.o_orderdate": { $lte: ISODate("1996-01-01") } },
                { l_discount: { $gte: 0.1 } },
                { l_discount: { $lte: 0.3 } }
            ],
            l_quantity: {
                $lte: 25
            }
        },
    },
    {
        $group: {
                _id: 1,
                revenue: {
                    $sum: {
                        $multiply: [
                            "$l_extendedprice",
                            "$l_discount"
                        ]
                    }
                }
            }
        }
    ]
) 



 Comments   
Comment by Mohammad Dashti (Inactive) [ 25/Oct/22 ]

As you can see in the previous comment, csi_cold is around 2x faster than collscan_cold, which is an expected good performance improvement.

Comment by Mohammad Dashti (Inactive) [ 25/Oct/22 ]

In addition, it was uncovered that the range of values for l_dicount is 0.0 to 0.1:

> db.order_lineitem.aggregate(
...    [
...      {
...        $group:
...          {
...            _id: "1",
...            minDisocunt: { $min: "$l_discount" },
...            maxDisocunt: { $max: "$l_discount" },
...            minQuantity: { $min: "$l_quantity" }, [^ssb-improved-corrected.yaml]  [^ssb-orig-corrected.yaml] 
...            maxQuantity: { $max: "$l_quantity" }
...          }
...      }
...    ]
... )
{ "_id" : "1", "minDisocunt" : 0, "maxDisocunt" : 0.1, "minQuantity" : 1, "maxQuantity" : 50 }

So, the right query (to check the range of 0.01 to 0.03 for l_discount) is:

    {
        "$match": {
            "order.o_orderdate": {
                "$gte": ISODate("1992-01-01T00:00:00.000Z"),
                "$lte": ISODate("1996-01-01T00:00:00Z")
            },
            "l_discount": {"$gte": 0.1, "$lte": 0.3},
            "l_quantity": {"$lte": 25}
        }
    },
    {
        "$group": {
            "_id": 1,
            "revenue": {"$sum": {"$multiply": ["$l_extendedprice", "$l_discount"]}}
        }
    }
]

Here are the corrected query definitions: ssb-orig-corrected.yaml and ssb-improved-corrected.yaml

name csi_cold csi_hot collscan_cold collscan_hot
ssb.qWRITING-12145.orig.corrected 5.996114253997803 4.855523586273193 13.617784976959229 4.068851470947266
ssb.qWRITING-12145.improved.corrected 6.080265760421753 4.894550323486328 13.490277767181396 3.9808552265167236
Comment by Mohammad Dashti (Inactive) [ 25/Oct/22 ]

Next, I went ahead and changed the way the given query was written to see if it has any impact. The new query is in ssb-improved.yaml:

[
    {
        "$match": {
            "order.o_orderdate": {
                "$gte": ISODate("1992-01-01T00:00:00.000Z"),
                "$lte": ISODate("1996-01-01T00:00:00Z")
            },
            "l_discount": {"$gte": 0.1, "$lte": 0.3},
            "l_quantity": {"$lte": 25}
        }
    },
    {
        "$group": {
            "_id": 1,
            "revenue": {"$sum": {"$multiply": ["$l_extendedprice", "$l_discount"]}}
        }
    }
]

However, it didn't have any noticeable impact on the performance:

name csi_cold csi_hot collscan_cold collscan_hot
ssb.qWRITING-12145.orig 8.17105507850647 7.048151016235352 14.792832851409912 5.0408830642700195
ssb.qWRITING-12145.improved 8.201274156570435 6.985919237136841 14.691941261291504 5.06222128868103
Comment by Mohammad Dashti (Inactive) [ 24/Oct/22 ]

Here is the actual measurement for the query mentioned in the body description (on master@8f2648c3):

name csi_cold csi_hot collscan_cold collscan_hot
ssb.qWRITING-12145 8.17105507850647 7.048151016235352 14.792832851409912 5.0408830642700195

To re-run the experiment:
1- You can compile the code like this:

activate && ./buildscripts/scons.py --ssl -j50 --variables-files=etc/scons/mongodbtoolchain_v3_clang.vars --cache=nolinked --modules="" ICECC=icecc CCACHE=ccache --dbg=off --ninja generate-ninja --link-model=static && ninja -j500 install-devcore

2- Run the server:

numactl --interleave=all --physcpubind=4 mongod --wiredTigerCacheSizeGB 16 --oplogSize 100000 --syncdelay 3600 --setParameter ttlMonitorEnabled=false --setParameter diagnosticDataCollectionEnabled=false --setParameter maxIndexBuildMemoryUsageMegabytes=16000 --setParameter featureFlagSbeFull=true --setParameter featureFlagColumnstoreIndexes=true

3- Download dump.zip
4- Unzip contents to reveal dump folder
5- Load the data like this:

cd ~/ssb
mongorestore --gzip dump
cd ~/mongo
mongo ssb --eval 'db.order_lineitem.createIndex({"$**": "columnstore"})'

6- place ssb-orig.yaml in the test_scripts directory, which contains the query definition:

[
    {
        "$match": {
            "$and": [
                {"order.o_orderdate": {"$gte": ISODate("1992-01-01T00:00:00.000Z")}},
                {"order.o_orderdate": {"$lte": ISODate("1996-01-01T00:00:00.000Z")}},
                {"l_discount": {"$gte": 0.1}},
                {"l_discount": {"$lte": 0.3}}
            ],
            "l_quantity": {"$lte": 25}
        }
    },
    {
        "$group": {
            "_id": 1,
            "revenue": {"$sum": {"$multiply": ["$l_extendedprice", "$l_discount"]}}
        }
    }
]

7- place performance.py in the test_scripts directory
8- run the experiment:

(killall mongod || true)
cd ~/mongo
activate
cd test_scripts
python3 performance.py /home/ubuntu/mongo/test_scripts/ssb.yaml

Generated at Thu Feb 08 06:56:15 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.