-
Type: Task
-
Resolution: Fixed
-
Priority: Major - P3
-
Affects Version/s: None
-
Component/s: None
-
None
-
Query Execution
-
Fully Compatible
-
0
In order to convert the lookup key into an array of unique values, the stage builder currently constructs a NLJ node to reuse the $group aggregation function addToSet
[4] nlj inner [s11] [s11] left [3] mkobj s11 [_id = s10] true false [3] group [s10] [] [3] project [s10 = (s8 ?: null)] [2] block_to_row blocks[s4, s5, s6] row[s7, s8, s9] [2] ts_bucket_to_cellblock s2 pathReqs[s4 = Get(_id)/Id, s5 = Get(a)/Id, s6 = Get(time)/Id] [1] scan s2 s3 none none none none none none lowPriority [] @\"711564fb-1a4b-48ee-9ab7-02b1a62761c9\" true false right [4] project [s19 = if isArrayEmpty(s17) then [null] else s17 ] [4] group [] [s17 = addToSet(s15)] spillSlots[s18] mergingExprs[aggSetUnion(s18)] [4] unwind s15 s16 s14 true [4] project [s14 = getField(s11, \"a\")] [4] limit 1 [4] coscan
Apart being inefficient, it makes harder to keep the document in its shredded format.
The proposal is to have a new internal function (tentatively named removeDuplicates) that can remove duplicate values from an array, to obtain a simpler plan
[4] project [s19 = if isArray(s14) then if isArrayEmpty(s14) then [null] else removeDuplicates(s14) else newArray(s14 ?: null) ] [4] project [s14 = s10] [3] group [s10] [] [3] project [s10 = (s8 ?: null)] [2] block_to_row blocks[s4, s5, s6] row[s7, s8, s9] [2] ts_bucket_to_cellblock s2 pathReqs[s4 = Get(_id)/Id, s5 = Get(a)/Id, s6 = Get(time)/Id] [1] scan s2 s3 none none none none none none lowPriority [] @\"711564fb-1a4b-48ee-9ab7-02b1a62761c9\" true false