Loading...

XML

Word

Printable

JSON

Type: Task
Resolution: Fixed
Priority: Major - P3
Fix Version/s: 8.1.0-rc0
Affects Version/s: None
Component/s: None
Labels:
None

Assigned Teams:

Query Execution
Backwards Compatibility:
Fully Compatible
Linked BF Score:
0
Confidence Status:
None
Work Order:
3
CAR Domain/s:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

In order to convert the lookup key into an array of unique values, the stage builder currently constructs a NLJ node to reuse the $group aggregation function addToSet

        [4] nlj inner [s11] [s11] 
            left 
                [3] mkobj s11 [_id = s10] true false 
                [3] group [s10] [] 
                [3] project [s10 = (s8 ?: null)] 
                [2] block_to_row blocks[s4, s5, s6] row[s7, s8, s9] 
                [2] ts_bucket_to_cellblock s2 pathReqs[s4 = Get(_id)/Id, s5 = Get(a)/Id, s6 = Get(time)/Id] 
                [1] scan s2 s3 none none none none none none lowPriority [] @\"711564fb-1a4b-48ee-9ab7-02b1a62761c9\" true false 
            right 
                [4] project [s19 = 
                    if isArrayEmpty(s17) 
                    then [null] 
                    else s17 
               ] 
                [4] group [] [s17 = addToSet(s15)] spillSlots[s18] mergingExprs[aggSetUnion(s18)] 
                [4] unwind s15 s16 s14 true 
                [4] project [s14 = getField(s11, \"a\")] 
                [4] limit 1 
                [4] coscan

Apart being inefficient, it makes harder to keep the document in its shredded format.

The proposal is to have a new internal function (tentatively named removeDuplicates) that can remove duplicate values from an array, to obtain a simpler plan

  [4] project [s19 = if isArray(s14)
                     then if isArrayEmpty(s14)
                          then [null] 
                          else removeDuplicates(s14)
                     else newArray(s14 ?: null) ] 
  [4] project [s14 = s10] 
  [3] group [s10] [] 
  [3] project [s10 = (s8 ?: null)] 
  [2] block_to_row blocks[s4, s5, s6] row[s7, s8, s9] 
  [2] ts_bucket_to_cellblock s2 pathReqs[s4 = Get(_id)/Id, s5 = Get(a)/Id, s6 = Get(time)/Id] 
  [1] scan s2 s3 none none none none none none lowPriority [] @\"711564fb-1a4b-48ee-9ab7-02b1a62761c9\" true false

Assignee:: Alberto Massari
Reporter:: Alberto Massari
Participants:: Alberto Massari, Githook User
Votes:: 0 Vote for this issue
Watchers:: 4 Start watching this issue

Created:: May 08 2024 04:37:45 PM UTC
Updated:: Sep 13 2024 08:26:41 AM UTC
Resolved:: Sep 02 2024 10:02:56 AM UTC
Confidence Status Last Update:: 21/May/24 3:48 PM

Details

Description

Attachments

Activity

People

Dates