[SERVER-68677] Skip row store projection in column index plans when possible Created: 09/Aug/22  Updated: 29/Oct/23  Resolved: 23/Sep/22

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: 6.2.0-rc0

Type: Task Priority: Major - P3
Reporter: Ian Boros Assignee: Steve Tarzia
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
is depended on by SERVER-69105 Achieve column index performance acce... Closed
Problem/Incident
Backwards Compatibility: Fully Compatible
Participants:
Linked BF Score: 69

 Description   

In the row store fallback path of column index plans, we apply a projection to the row store document to ensure that it has the same "schema" as the documents that would have been returned by the column index path. This is fairly expensive, and can often be skipped (including any time there is a $group above the column scan).

This would build off of the analysis done in SERVER-66061 where we detect cases where a projection is not necessary. In the same cases (unless I'm forgetting an edge case), we should be able to skip adding this expression which projects the full document retrieved from the row store. That is added [here](https://github.com/10gen/mongo/blob/21b7dfd8f85224445da4f6098f22ac463ee18f72/src/mongo/db/query/sbe_stage_builder.cpp#L764) unconditionally. We should be able to attach some boolean to the ColumnIndexScanNode which can say whether or not a projection is necessary. That same ColumnIndexScanNode should be available and in scope in [this analysis from SERVER-66061](https://github.com/10gen/mongo/blob/21b7dfd8f85224445da4f6098f22ac463ee18f72/src/mongo/db/query/planner_analysis.cpp#L424). There we could add a boolean flag to say "I don't need an exact schema", in case there is a subsequent $group stage. ian.boros@mongodb.com suggested calling this "isStrict" or something like that.

We should be able to test this code path by inspecting the SBE plan in the explain output, where we should be able to tell if there is a row store expression or not.



 Comments   
Comment by Githook User [ 23/Sep/22 ]

Author:

{'name': 'Steve Tarzia', 'email': 'steve.tarzia@mongodb.com', 'username': 'starzia'}

Message: SERVER-68677 Skip row store projection in column scan plans when possible
Branch: master
https://github.com/mongodb/mongo/commit/538c5d15bf6f84fcfab9328fdf9857e120321e00

Comment by Steve Tarzia [ 22/Sep/22 ]

I measured the performance impact of this change in the warm cache case and I saw a 14% speedup (143ms vs 163ms).  I used the following query on the TPCH denormalized 1gb dataset:

db.customer_csi_mod.explain('executionStats').aggregate([
    {$match:{"c_mktsegment":"BUILDING"}},
    {$group: {_id:"$nation", balance:{"$sum":"$c_acctbal"}}
}]) 

Because $nation is an object, this will use the row store fallback every single time.  I don't think there will be a measurable impact in the cold cache scenario.

I was holding off merging my change until I saw a performance change.  This looks significant enough to warrant the merge so I will merge it soon, after addressing the latest minor feedback in the code review.

Generated at Thu Feb 08 06:11:26 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.