[SERVER-83975] Limited support for $lookup on block values Created: 07/Dec/23  Updated: 08/Dec/23

Status: Backlog
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Task Priority: Major - P3
Reporter: Alberto Massari Assignee: Backlog - Query Execution
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
depends on SERVER-83970 Extend HashLookupStage to lookup from... Backlog
depends on SERVER-83974 Create stage equivalent to block-base... Backlog
Assigned Teams:
Query Execution
Participants:

 Description   

For the case where $lookup can be implemented using an hash lookup, a pipeline like

db.tt.explain().aggregate([{$project: {time: 1, a: 1}},{$group: {_id: "$a"}},{$lookup: {from: "coll", localField: "a", foreignField:"n1", as: "blah"}}])

(where 'tt' is a timeseries collection and 'coll' is the external collection), the plan is in the form

[4] hash_lookup [s27 = addToArray(s12)] 
    outer s19 
        [4] nlj inner [s11] [s11] 
            left 
                [3] mkobj s11 [_id = s10] true false 
                [3] group [s10] [] 
                [3] project [s10 = (s8 ?: null)] 
                [2] block_to_row blocks[s4, s5, s6] row[s7, s8, s9] 
                [2] ts_bucket_to_cellblock s2 pathReqs[s4 = Get(_id)/Id, s5 = Get(a)/Id, s6 = Get(time)/Id] 
                [1] scan s2 s3 none none none none none none lowPriority [] @\"711564fb-1a4b-48ee-9ab7-02b1a62761c9\" true false 
            right 
                [4] project [s19 = 
                    if isArrayEmpty(s17) 
                    then [null] 
                    else s17 
               ] 
                [4] group [] [s17 = addToSet(s15)] spillSlots[s18] mergingExprs[aggSetUnion(s18)] 
                [4] unwind s15 s16 s14 true 
                [4] project [s14 = getField(s11, \"a\")] 
                [4] limit 1 
                [4] coscan 
    inner s25 [s12] 
        [4] nlj inner [s12] [s12] 
            left 
                [4] scan s12 s13 none none none none none none [] @\"99aad589-95ef-45ee-a798-f1a4fe541b80\" true false 
            right 
                [4] group [] [s25 = addToSet(s24)] spillSlots[s26] mergingExprs[aggSetUnion(s26)] 
                [4] nlj inner [] [s20] 
                    left 
                        [4] project [s20 = (getField(s12, \"n1\") ?: null)] 
                        [4] limit 1 
                        [4] coscan 
                    right 
                        [4] branch {isArray(s20)} [s24] 
                        [s23] [4] union [s23] 
                            branch0 [s21] 
                                [4] unwind s21 s22 s20 true 
                                [4] limit 1 
                                [4] coscan 
                            branch1 [s20] 
                                [4] limit 1 
                                [4] coscan 
                        
                        [s20] [4] limit 1 
                        [4] coscan 

This task would change the stage builder so that:
1. the left branch of the first nlj stage (the one coming from processing the query solution node [3]) projects the block values rather than doing the block_to_row + mkobj
2. the right branch of the same nlj uses the ApplyPipelineStage from SERVER-83974 to run the subpipeline on top of each item and organize the results in a new block value having the same number of items of the block values generated by the left branch

The hash_lookup stage (modified by SERVER-83970) would at this point consume the inner branch as usual into a hash table, and recognize that the outer branch has produced block values and process its items, generating a block value holding the probed values in the matching positions


Generated at Thu Feb 08 06:53:41 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.