Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-42738

Slow $lookup on $expr match with null field

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major - P3
    • Resolution: Duplicate
    • Affects Version/s: 4.0.12
    • Fix Version/s: None
    • Component/s: Index Maintenance
    • Operating System:
      ALL
    • Steps To Reproduce:
      Hide

      Since the problem is with $lookup, and it is difficult (impossible?) to see the performance of $lookup in explain, we need to create enough records in a collection to see the problem firsthand:

      use test;
      db.human.drop();
      db.human.createIndex({indexField: 1});
      for (var i = 1; i <= 200000; i++) { db.human.insert( { indexField: "name"+i } ); }
       
      
      

       

      Now, let's create a primary collection, where a few records have a reference:

      db.alien.drop();
      db.alien.insert({refId: 'name1'});
      db.alien.insert({});
      db.alien.insert({refId: 'name2'});
      db.alien.insert({});
      db.alien.insert({});
      db.alien.insert({});
      db.alien.insert({});
      db.alien.insert({});
      db.alien.insert({});
      db.alien.insert({});
       
      
      

       

      Now, let's run a $lookup:

      db.alien.aggregate([
        {$lookup: {
          from: 'human',
          localField: 'refId',
          foreignField: 'indexField',
          as: 'human',
        }}
      ])
      

      This lookup appears to run just fine.

      By way of information, my understanding is that the lookup happening here is the equivalent of db.human.find({indexField: 'name1'});

       

      Now, let's run a $lookup using pipeline:

      db.alien.aggregate([
        {$lookup: {
          from: 'human',
          let: {'refId' : '$refId'},
          pipeline: [
            {$match: {
              $expr: {$eq: ['$indexField', '$$refId']}
            }},
          ],
          as: 'human',
        }}
      ])
      

      This lookup runs slow. If you were to put a $match in the first stage and filter records by {refId: {$exists: true}}, it would run fast. Which leads me to believe the problem is with the records for which refId does not exist.

      By way of information, my understanding is that the lookup happening here is the equivalent of db.human.find({$expr: {$eq: ['$indexField', 'name1']}});

       

      Interestingly enough, the query

      db.human.find({$expr: {$eq: ['$indexField', null]}});
      

      Seems to run just fine, so I'm not sure what's happening in the $lookup to make it run so slowly. Slowly here is obviously relative. I'm on a 2017 high-end macbook pro, and slowly means about 1.5 seconds.

      Show
      Since the problem is with $lookup, and it is difficult (impossible?) to see the performance of $lookup in explain, we need to create enough records in a collection to see the problem firsthand: use test; db.human.drop(); db.human.createIndex({indexField: 1 }); for (var i = 1 ; i <= 200000 ; i++) { db.human.insert( { indexField: "name" +i } ); }     Now, let's create a primary collection, where a few records have a reference: db.alien.drop(); db.alien.insert({refId: 'name1' }); db.alien.insert({}); db.alien.insert({refId: 'name2' }); db.alien.insert({}); db.alien.insert({}); db.alien.insert({}); db.alien.insert({}); db.alien.insert({}); db.alien.insert({}); db.alien.insert({});     Now, let's run a $lookup: db.alien.aggregate([ {$lookup: { from: 'human' , localField: 'refId' , foreignField: 'indexField' , as: 'human' , }} ]) This lookup appears to run just fine. By way of information, my understanding is that the lookup happening here is the equivalent of db.human.find({indexField: 'name1'});   Now, let's run a $lookup using pipeline: db.alien.aggregate([ {$lookup: { from: 'human' , let: { 'refId' : '$refId' }, pipeline: [ {$match: { $expr: {$eq: [ '$indexField' , '$$refId' ]} }}, ], as: 'human' , }} ]) This lookup runs slow. If you were to put a $match in the first stage and filter records by {refId: {$exists: true}}, it would run fast. Which leads me to believe the problem is with the records for which refId does not exist. By way of information, my understanding is that the lookup happening here is the equivalent of db.human.find({$expr: {$eq: ['$indexField', 'name1'] }});   Interestingly enough, the query db.human.find({$expr: {$eq: [ '$indexField' , null ]}}); Seems to run just fine, so I'm not sure what's happening in the $lookup to make it run so slowly. Slowly here is obviously relative. I'm on a 2017 high-end macbook pro, and slowly means about 1.5 seconds.

      Description

      When doing a $lookup, if the primary collection has null/non-existing values for localField, and is using a pipeline, the $lookup appears to be doing full table scans for the records with null values.

       

        Attachments

          Issue Links

            Activity

              People

              Assignee:
              backlog-server-query Backlog - Query Team (Inactive)
              Reporter:
              ben@ethika.com Ben Rotz
              Participants:
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved: