Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-42738

Slow $lookup on $expr match with null field

    • Query
    • ALL
    • Hide

      Since the problem is with $lookup, and it is difficult (impossible?) to see the performance of $lookup in explain, we need to create enough records in a collection to see the problem firsthand:

      use test;
      db.human.drop();
      db.human.createIndex({indexField: 1});
      for (var i = 1; i <= 200000; i++) { db.human.insert( { indexField: "name"+i } ); }
      
      
      

       

      Now, let's create a primary collection, where a few records have a reference:

      db.alien.drop();
      db.alien.insert({refId: 'name1'});
      db.alien.insert({});
      db.alien.insert({refId: 'name2'});
      db.alien.insert({});
      db.alien.insert({});
      db.alien.insert({});
      db.alien.insert({});
      db.alien.insert({});
      db.alien.insert({});
      db.alien.insert({});
      
      
      

       

      Now, let's run a $lookup:

      db.alien.aggregate([
        {$lookup: {
          from: 'human',
          localField: 'refId',
          foreignField: 'indexField',
          as: 'human',
        }}
      ])
      

      This lookup appears to run just fine.

      By way of information, my understanding is that the lookup happening here is the equivalent of db.human.find({indexField: 'name1'});

       

      Now, let's run a $lookup using pipeline:

      db.alien.aggregate([
        {$lookup: {
          from: 'human',
          let: {'refId' : '$refId'},
          pipeline: [
            {$match: {
              $expr: {$eq: ['$indexField', '$$refId']}
            }},
          ],
          as: 'human',
        }}
      ])
      

      This lookup runs slow. If you were to put a $match in the first stage and filter records by {refId: {$exists: true}}, it would run fast. Which leads me to believe the problem is with the records for which refId does not exist.

      By way of information, my understanding is that the lookup happening here is the equivalent of db.human.find({$expr: {$eq: ['$indexField', 'name1']}});

       

      Interestingly enough, the query

      db.human.find({$expr: {$eq: ['$indexField', null]}});
      

      Seems to run just fine, so I'm not sure what's happening in the $lookup to make it run so slowly. Slowly here is obviously relative. I'm on a 2017 high-end macbook pro, and slowly means about 1.5 seconds.

      Show
      Since the problem is with $lookup, and it is difficult (impossible?) to see the performance of $lookup in explain, we need to create enough records in a collection to see the problem firsthand: use test; db.human.drop(); db.human.createIndex({indexField: 1}); for ( var i = 1; i <= 200000; i++) { db.human.insert( { indexField: "name" +i } ); }   Now, let's create a primary collection, where a few records have a reference: db.alien.drop(); db.alien.insert({refId: 'name1' }); db.alien.insert({}); db.alien.insert({refId: 'name2' }); db.alien.insert({}); db.alien.insert({}); db.alien.insert({}); db.alien.insert({}); db.alien.insert({}); db.alien.insert({}); db.alien.insert({});   Now, let's run a $lookup: db.alien.aggregate([ {$lookup: { from: 'human' , localField: 'refId' , foreignField: 'indexField' , as: 'human' , }} ]) This lookup appears to run just fine. By way of information, my understanding is that the lookup happening here is the equivalent of db.human.find({indexField: 'name1'});   Now, let's run a $lookup using pipeline: db.alien.aggregate([ {$lookup: { from: 'human' , let: { 'refId' : '$refId' }, pipeline: [ {$match: { $expr: {$eq: [ '$indexField' , '$$refId' ]} }}, ], as: 'human' , }} ]) This lookup runs slow. If you were to put a $match in the first stage and filter records by {refId: {$exists: true}}, it would run fast. Which leads me to believe the problem is with the records for which refId does not exist. By way of information, my understanding is that the lookup happening here is the equivalent of db.human.find({$expr: {$eq: ['$indexField', 'name1'] }});   Interestingly enough, the query db.human.find({$expr: {$eq: [ '$indexField' , null ]}}); Seems to run just fine, so I'm not sure what's happening in the $lookup to make it run so slowly. Slowly here is obviously relative. I'm on a 2017 high-end macbook pro, and slowly means about 1.5 seconds.

      When doing a $lookup, if the primary collection has null/non-existing values for localField, and is using a pipeline, the $lookup appears to be doing full table scans for the records with null values.

       

            Assignee:
            backlog-server-query Backlog - Query Team (Inactive)
            Reporter:
            ben@ethika.com Ben Rotz
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

              Created:
              Updated:
              Resolved: