Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-90335

Remove need for NLJ in $lookup

    • Type: Icon: Task Task
    • Resolution: Fixed
    • Priority: Icon: Major - P3 Major - P3
    • 8.1.0-rc0
    • Affects Version/s: None
    • Component/s: None
    • None
    • Query Execution
    • Fully Compatible
    • 0

      In order to convert the lookup key into an array of unique values, the stage builder currently constructs a NLJ node to reuse the $group aggregation function addToSet

              [4] nlj inner [s11] [s11] 
                  left 
                      [3] mkobj s11 [_id = s10] true false 
                      [3] group [s10] [] 
                      [3] project [s10 = (s8 ?: null)] 
                      [2] block_to_row blocks[s4, s5, s6] row[s7, s8, s9] 
                      [2] ts_bucket_to_cellblock s2 pathReqs[s4 = Get(_id)/Id, s5 = Get(a)/Id, s6 = Get(time)/Id] 
                      [1] scan s2 s3 none none none none none none lowPriority [] @\"711564fb-1a4b-48ee-9ab7-02b1a62761c9\" true false 
                  right 
                      [4] project [s19 = 
                          if isArrayEmpty(s17) 
                          then [null] 
                          else s17 
                     ] 
                      [4] group [] [s17 = addToSet(s15)] spillSlots[s18] mergingExprs[aggSetUnion(s18)] 
                      [4] unwind s15 s16 s14 true 
                      [4] project [s14 = getField(s11, \"a\")] 
                      [4] limit 1 
                      [4] coscan  

       Apart being inefficient, it makes harder to keep the document in its shredded format.

      The proposal is to have a new internal function (tentatively named removeDuplicates) that can remove duplicate values from an array, to obtain a simpler plan

        [4] project [s19 = if isArray(s14)
                           then if isArrayEmpty(s14)
                                then [null] 
                                else removeDuplicates(s14)
                           else newArray(s14 ?: null) ] 
        [4] project [s14 = s10] 
        [3] group [s10] [] 
        [3] project [s10 = (s8 ?: null)] 
        [2] block_to_row blocks[s4, s5, s6] row[s7, s8, s9] 
        [2] ts_bucket_to_cellblock s2 pathReqs[s4 = Get(_id)/Id, s5 = Get(a)/Id, s6 = Get(time)/Id] 
        [1] scan s2 s3 none none none none none none lowPriority [] @\"711564fb-1a4b-48ee-9ab7-02b1a62761c9\" true false 

            Assignee:
            alberto.massari@mongodb.com Alberto Massari
            Reporter:
            alberto.massari@mongodb.com Alberto Massari
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

              Created:
              Updated:
              Resolved: