[SERVER-69955] Optimize expressions generated for column filter pushdown Created: 24/Sep/22  Updated: 29/Nov/22  Resolved: 29/Nov/22

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Task Priority: Major - P3
Reporter: Ian Boros Assignee: Dianna Hohensee (Inactive)
Resolution: Won't Do Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Participants:

 Description   

As Irina mentioned on SERVER-68743:

Also, the refactor from SERVER-68713 preserved the defensive wrappers, such as makeFillEmptyFalse, which are not needed when generating per-path filters in CSI. It would be nice to avoid injecting them.

There is also a question of whether we want to do this by simply passing a flag to the code which generates the expressions, or if we want to optimize the expressions/bytecode given the constraint that the slot holding the column's value is not Nothing.



 Comments   
Comment by Dianna Hohensee (Inactive) [ 10/Nov/22 ]

Query: db.events.aggregate({$project: {_id: '$_id', a: '$a'}}, {$match: {'a': /^x/}})
Expected result:
[ { "_id" : 0, "a" : "x" } ]
New result:
[ ]
Collection: db.events.find().pretty()
{ "_id" : 0, "a" : "x" }
{ "_id" : 1, "a" : "yx" }

[2] traverse s12 s13 s5 {} {} 
from 
    [2] project [s5 = s5] 
    [1] columnscan s5 none paths["_id", "a"] outputs["_id", "a"] pathFilters["a":  s9, traverseCsiCellValues(s9, lambda(l1.0) { ((l1.0 == s10) || regexMatch(s11, l1.0)) })] rowStoreExpr[] @"88bbb4cc-e5ab-4056-be7e-442e714c562e" @"*_columnstore" 
in 
    [2] project [s13 = makeBsonObj(MakeObjSpec(keep, [], ["_id", "a"]), s5, getField(s5, "_id"), getField(s5, "a"))] 
    [2] limit 1 
    [2] coscan 

-----------------------------------------------------------------------------------------
Query:
db.mine.find({a: /foo/}, {_id: 1})
Expected result:
[ { "_id" : 13 }, { "_id" : 14 } ]
New result:
[\{ "_id" : 14 \}]
Collection:
{ "_id" : 13, "a" : "foo", "c" : "foo" }
{ "_id" : 14, "a" : /foo/, "c" : "foo" }

[2] mkbson s12 s5 [_id] keep [] true false 
[1] columnscan s5 none paths["_id", "a"] outputs["_id"] pathFilters["a":  s9, traverseCsiCellValues(s9, lambda(l1.0) { ((l1.0 == s10) || regexMatch(s11, l1.0)) })] rowStoreExpr[] @"e4b7537f-c18e-4f59-ab4d-e66ab4f7035b" @"*_columnstore" 

"msg":"~~~ByteCode::runInternal fillEmptyImm","attr":{"lhsTag":0,"lhsVal":0}}
"msg":"~~~ByteCode::runInternal fillEmptyImm running on Nothing type"}
"msg":"~~~ByteCode::runInternal fillEmptyImm","attr":{"lhsTag":6,"lhsVal":1}}
"msg":"~~~ByteCode::runInternal fillEmptyImm","attr":{"lhsTag":6,"lhsVal":1}}
"msg":"~~~ByteCode::runInternal fillEmptyImm","attr":{"lhsTag":6,"lhsVal":1}}
"msg":"~~~ByteCode::runInternal fillEmptyImm","attr":{"lhsTag":6,"lhsVal":1}}

-----------------------------------------------------------------------------------------
generateRegexExpr can produce a Nothing in this code part. If the makeFillEmptyFalse is removed, then the above queries miss a document.
The important parts of the plans above that need a fillEmpty are

(l1.0 == s10)

and

(l1.0 == s10)

Comment by Dianna Hohensee (Inactive) [ 09/Nov/22 ]

regexMatch can return a Nothing type. Does that mean that makeFillEmptyFalse() needs to be present? This Instruction::fillEmptyImm code runs in the query.

"conn3","msg":"~~~ByteCode::runInternal fillEmptyImm","attr":{"lhsTag":0,"lhsVal":0}} // Nothing
"conn3","msg":"~~~ByteCode::runInternal fillEmptyImm running on Nothing type"}
"conn3","msg":"~~~ByteCode::runInternal fillEmptyImm","attr":{"lhsTag":6,"lhsVal":1}} // Boolean
"conn3","msg":"~~~ByteCode::runInternal fillEmptyImm","attr":{"lhsTag":0,"lhsVal":0}} // Nothing
"conn3","msg":"~~~ByteCode::runInternal fillEmptyImm running on Nothing type"}
"conn3","msg":"~~~ByteCode::runInternal fillEmptyImm","attr":{"lhsTag":6,"lhsVal":0}} // Boolean

Comment by Dianna Hohensee (Inactive) [ 09/Nov/22 ]

WIP notes:

I ran a patch build on removing makeFillEmptyFalse calls from generatePerColumnPredicate(). A test jstests/aggregation/bugs/match.js is failing. So here are the plans I get running the failing query locally

Here's the contents of the collection

rs0:PRIMARY> db.events.find().pretty()
{ "_id" : 0, "a" : "x" }
{ "_id" : 1, "a" : "yx" }

Here's the document the master branch returns

rs0:PRIMARY> db.events.aggregate({$project: {_id: '$_id', a: '$a'}}, {$match: {'a': /^x/}}).toArray()
[ { "_id" : 0, "a" : "x" } ]

And here's what my patch returns

rs0:PRIMARY> db.events.aggregate({$project: {_id: '$_id', a: '$a'}}, {$match: {'a': /^x/}}).toArray()
[ ]

Here's the query plan on the master branch with a column index.

[2] traverse s12 s13 s5 {} {} 
from 
    [2] project [s5 = s5] 
    [1] columnscan s5 none paths["_id", "a"] outputs["_id", "a"] pathFilters["a":  s9, traverseCsiCellValues(s9, lambda(l1.0) { (fillEmpty((l1.0 == s10), false) || fillEmpty(regexMatch(s11, l1.0), false)) })] rowStoreExpr[] @"88bbb4cc-e5ab-4056-be7e-442e714c562e" @"*_columnstore" 
in 
    [2] project [s13 = makeBsonObj(MakeObjSpec(keep, [], ["_id", "a"]), s5, getField(s5, "_id"), getField(s5, "a"))] 
    [2] limit 1 
    [2] coscan 

Here's the query plan with my patch changes

[2] traverse s12 s13 s5 {} {} 
from 
    [2] project [s5 = s5] 
    [1] columnscan s5 none paths["_id", "a"] outputs["_id", "a"] pathFilters["a":  s9, traverseCsiCellValues(s9, lambda(l1.0) { ((l1.0 == s10) || regexMatch(s11, l1.0)) })] rowStoreExpr[] @"88bbb4cc-e5ab-4056-be7e-442e714c562e" @"*_columnstore" 
in 
    [2] project [s13 = makeBsonObj(MakeObjSpec(keep, [], ["_id", "a"]), s5, getField(s5, "_id"), getField(s5, "a"))] 
    [2] limit 1 
    [2] coscan 

And for yucks here's SBE without a column index

[2] traverse s9 s10 s5 {} {} 
from 
    [2] project [s5 = s5] 
    [1] filter {fillEmpty(traverseF(getField(s5, "a"), lambda(l1.0) { (fillEmpty((l1.0 == s7), false) || fillEmpty(regexMatch(s8, l1.0), false)) }, false), false)} 
    [1] scan s5 s6 none none none none [] @"88bbb4cc-e5ab-4056-be7e-442e714c562e" true false 
in 
    [2] project [s10 = makeBsonObj(MakeObjSpec(keep, [], ["_id", "a"]), s5, getField(s5, "_id"), getField(s5, "a"))] 
    [2] limit 1 
    [2] coscan 

So, for some reason, my patch appears to be correctly removing the fillEmpty expressions, but no longer returns documents.

Generated at Thu Feb 08 06:14:53 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.