[SERVER-42738] Slow $lookup on $expr match with null field Created: 09/Aug/19  Updated: 06/Dec/22  Resolved: 19/Aug/19

Status: Closed
Project: Core Server
Component/s: Index Maintenance
Affects Version/s: 4.0.12
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Ben Rotz Assignee: Backlog - Query Team (Inactive)
Resolution: Duplicate Votes: 0
Labels: query-44-grooming
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Duplicate
duplicates SERVER-40362 expressive $lookup "let" with missing... Backlog
Assigned Teams:
Query
Operating System: ALL
Steps To Reproduce:

Since the problem is with $lookup, and it is difficult (impossible?) to see the performance of $lookup in explain, we need to create enough records in a collection to see the problem firsthand:

use test;
db.human.drop();
db.human.createIndex({indexField: 1});
for (var i = 1; i <= 200000; i++) { db.human.insert( { indexField: "name"+i } ); }
 

 

Now, let's create a primary collection, where a few records have a reference:

db.alien.drop();
db.alien.insert({refId: 'name1'});
db.alien.insert({});
db.alien.insert({refId: 'name2'});
db.alien.insert({});
db.alien.insert({});
db.alien.insert({});
db.alien.insert({});
db.alien.insert({});
db.alien.insert({});
db.alien.insert({});
 

 

Now, let's run a $lookup:

db.alien.aggregate([
  {$lookup: {
    from: 'human',
    localField: 'refId',
    foreignField: 'indexField',
    as: 'human',
  }}
])

This lookup appears to run just fine.

By way of information, my understanding is that the lookup happening here is the equivalent of db.human.find({indexField: 'name1'});

 

Now, let's run a $lookup using pipeline:

db.alien.aggregate([
  {$lookup: {
    from: 'human',
    let: {'refId' : '$refId'},
    pipeline: [
      {$match: {
        $expr: {$eq: ['$indexField', '$$refId']}
      }},
    ],
    as: 'human',
  }}
])

This lookup runs slow. If you were to put a $match in the first stage and filter records by {refId: {$exists: true}}, it would run fast. Which leads me to believe the problem is with the records for which refId does not exist.

By way of information, my understanding is that the lookup happening here is the equivalent of db.human.find({$expr: {$eq: ['$indexField', 'name1']}});

 

Interestingly enough, the query

db.human.find({$expr: {$eq: ['$indexField', null]}});

Seems to run just fine, so I'm not sure what's happening in the $lookup to make it run so slowly. Slowly here is obviously relative. I'm on a 2017 high-end macbook pro, and slowly means about 1.5 seconds.

Participants:

 Description   

When doing a $lookup, if the primary collection has null/non-existing values for localField, and is using a pipeline, the $lookup appears to be doing full table scans for the records with null values.

 



 Comments   
Comment by Craig Homa [ 19/Aug/19 ]

Resolving this as a duplicate of SERVER-40362.

Comment by Danny Hatcher (Inactive) [ 16/Aug/19 ]

Thanks for your report. I'll pass this on to the Query team to determine whether this is expected or not.

Generated at Thu Feb 08 05:01:17 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.