[SERVER-79700] Improve UnpackBucketNode stage builder so that it does not generate duplicated meta slot when bucket-level filter has dependency on the meta field Created: 04/Aug/23  Updated: 28/Nov/23  Resolved: 29/Sep/23

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: 7.2.0-rc0

Type: Task Priority: Major - P3
Reporter: Yoon Soo Kim Assignee: Irina Yatsenko (Inactive)
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
depends on SERVER-79697 Add a QuerySolutionNode to lower $_in... Closed
Related
related to SERVER-83621 Consider adding metaField to UnpackTs... Open
Assigned Teams:
Query Integration
Backwards Compatibility: Fully Compatible
Sprint: QI 2023-10-02
Participants:

 Description   

> db.ts.aggregate([{$match: {tag: "a"}}, {$project: {f: 1, time: 1, tag: 1}}], {collation: {locale: "en", strength: 1}});
[2] mkbson s13 [_id = s10, f = s11, time = s12, tag = s6] true false
[2] block_to_row blocks[s7, s8, s9] vals[s10, s11, s12]
[2] ts_bucket_to_cellblock s2 paths[s7  =  _id, s8  =  f, s9  =  time] meta =  s6
[1] filter {traverseF(s1, lambda(l1.0) { ((l1.0 ==[s5] s4) ?: false) }, false)}
[1] scan s2 s3 none none none none none none lowPriority [s1 = meta] @"2238077b-8893-4b6c-bbb4-21b5f72c55de" true false

Note that in the above example, both s1 and s6 holds the 'meta' field.

> db.ts.aggregate([{$match: {tag: {$in: ["a", "c"]}}}, {$project: {f: 1, time: 1, tt: {$concat: ["$tag", "-", "abc"]}, ss: {$concat: ["$tag", "XYZ"]}}}], {collation: {locale: "en", strength: 1}});
[3] mkbson s15 [_id = s12, f = s13, time = s14, ss = s7, tt = s8] true false
[3] block_to_row blocks[s9, s10, s11] vals[s12, s13, s14]
[3] ts_bucket_to_cellblock s6 paths[s9  =  _id, s10  =  f, s11  =  time]
[2] project [s7 = getField(s6, "ss"), s8 = getField(s6, "tt")]
[2] project [s6 = makeBsonObj(MakeObjSpec(drop, [], ["tt", "ss"]), s2,
    if (typeMatch(s1, 1088ll) ?: true)
    then null
    else
        if isString(s1)
        then concat(s1, "-abc")
        else fail(7158201, "$concat supports only strings")
,
    if (typeMatch(s1, 1088ll) ?: true)
    then null
    else
        if isString(s1)
        then concat(s1, "XYZ")
        else fail(7158201, "$concat supports only strings")
)]
[1] filter {traverseF(s1, lambda(l1.0) { isMember(l1.0, s4) }, false)}
[1] scan s2 s3 none none none none none none lowPriority [s1 = meta] @"2238077b-8893-4b6c-bbb4-21b5f72c55de" true false

Also we should be able to avoid materialize the temporary BSON object for the computed meta fields.



 Comments   
Comment by Githook User [ 28/Sep/23 ]

Author:

{'name': 'Irina Yatsenko', 'email': 'irina.yatsenko@mongodb.com', 'username': 'IrinaYatsenko'}

Message: SERVER-79700 Create the 'meta' slot once in SBE plans for TS queries
Branch: master
https://github.com/mongodb/mongo/commit/dd441c0b8cabfa1165405910a6e0966233ec6afd

Comment by Irina Yatsenko (Inactive) [ 14/Sep/23 ]

For the first issue, it is specific to the TS plans and will likely be quite common. Unfortunately, I believe it falls into the same category of not being able to optimize the legacy stage builder across stages, generated from different query solution nodes: as far as I know there is no way to "peek" into the child tree for which slots/fields it might produce.

I'll use this ticket to update the comments.

UPD: we cannot condition unpacking of the meta slot on what the child tree does but we can request the meta field from it and use that rather than recreating the slot during unpacking

Comment by Irina Yatsenko (Inactive) [ 13/Sep/23 ]

I'd strongly suggest to do nothing for the second issue (avoiding materialization of the projected computed field) as it's not specific to the TS lowering. The same inefficiency happens in non-TS pipelines with stages that follow the $project that computes a field:

db.non_ts.explain().aggregate([{$match: {b: "Lucy"}},{$project: {x: {$concat: ["$b", " - cat"]}, a: 1}},{$group: {_id: "$x", c: {$max: "$a"}}}]).queryPlanner.winningPlan.slotBasedPlan.stages
 
// the SBE stages for $group, including the final materialization because this is the last stage in the pipeline
[4] mkbson s18 [_id = s14, c = s17] true false
[4] project [s17 = (s15 ?: null)]
[4] group [s14] [s15 = max(
    let [
        l3.0 = s12
    ]
    in
        if (typeMatch(l3.0, 1088) ?: true)
        then Nothing
        else l3.0
)] spillSlots[s16] mergingExprs[max(
    let [
        l4.0 = s16
    ]
    in
        if (typeMatch(l4.0, 1088) ?: true)
        then Nothing
        else l4.0
)]
[4] project [s14 = (s13 ?: null)]
 
// the SBE stages generated by the $project, ideally, shouldn't be materializing the intermediate object
[3] project [s12 = getField(s11, "a"), s13 = getField(s11, "x")]
[3] project [s11 = makeBsonObj(MakeObjSpec(["_id", "a", "x" = Arg(0)], Closed), s8,
    if (typeMatch(s10, 1088) ?: true)
    then null
    else
        if isString(s10)
        then concat(s10, " - cat")
        else fail(7158201, "$concat supports only strings")
)]
 
// SBE stages for the collection access (non_ts has an "b_1_a_1" index to imitate ts collections with "a" being the time field and "b" -- the meta)
[2] nlj inner [] [s1, s4, s5, s6, s7]  
    left
        [1] cfilter {(exists(s2) && exists(s3))}
        [1] ixseek s2 s3 s6 s1 s4 s5 [] @"7faef4c2-2192-45ee-b66b-4a648250f068" @"b_1_a_1" true
    right
        [2] limit 1
        [2] seek s1 s8 s9 s4 s5 s6 s7 none none [s10 = b] @"7faef4c2-2192-45ee-b66b-4a648250f068" true false

Generated at Thu Feb 08 06:41:40 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.