[SERVER-40635] Introduce $lookupOne as an optimzable aggregation step Created: 13/Apr/19  Updated: 16/Nov/21  Resolved: 15/Apr/19

Status: Closed
Project: Core Server
Component/s: Aggregation Framework
Affects Version/s: None
Fix Version/s: None

Type: Improvement Priority: Major - P3
Reporter: Roman Lehnert Assignee: Eric Sedor
Resolution: Duplicate Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Duplicate
duplicates SERVER-22384 Add $lookUpOne to retrieve single Obj... Open
Participants:

 Description   

Given 2 big collections: vouchers and voucher_transactions (referencing the vouchers via their voucher_id), and let's say, you are operating on the voucher_transactions and want the following:

  1. For every voucher_transaction, lookup the voucher that is references via the voucher_id:
    {"$lookup":{"as":"__joined_vouchers","foreignField":"_id","from":"vouchers","localField":"voucher_id"}}
  2. Then you usually do not want to have the array so you utilize 
    {"$unwind":{"path":"$__joined_vouchers","preserveNullAndEmptyArrays":true,}}
  3. And finally, you want to to sort and limit them.
    {"$sort":{"booked_at":-1,}}
    {"$limit":NumberLong('10')}

There is one thing that prevents the optimizer in my eyes from moving the $limit and $sort before the $lookup: He can not know if the number of docuents after the $lookup and $unwind is the same like before, since $lookup creates and array. 

But actually, there is a special case that happens very often: when the foreign key 'voucher_transactions.voucher_id' is really only pointing to exactly one 'voucher', since the 'voucher._id' is uniq. 

In that case, it would make totally sense (in my eyes) to move the $sort and $limit steps befor the $lookup stages and increse the performance of the aggregation a lot. 

So my suggetion is to introduce sth. like a $lookupOne aggregation function, that works like $lookup but attaches only one looked up document. 

If you have control over the query, you might reorder the pipeline for yourself. But if you use the bi connector (/mongosqld), a JOIN operations is always translated to $lookup and $unwind steps and afterwards, the ORDERs and LIMITs  are applied as $sort and $limit stages. With the BI connector, there is no way to take influence in the order of the aggregation steps, afaik. However, with a 1:n relationship like that, the BI connector would be able to decide to use a potentially existing $lookupOne stage. 



 Comments   
Comment by Eric Sedor [ 15/Apr/19 ]

Thanks roman_lehnert for this. $lookupOne has been requested in SERVER-22384 so we'd like to ask that you watch that ticket for updates on your request.

Generated at Thu Feb 08 04:55:35 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.