Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-44789

30 times speed difference between `find()` and `aggregate($match)` against `_id`

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major - P3
    • Resolution: Works as Designed
    • Affects Version/s: 4.2.0
    • Fix Version/s: None
    • Component/s: Performance
    • Labels:
      None
    • Operating System:
      ALL
    • Steps To Reproduce:
      Hide

      This test inserts 10K documents with just the _id field and searches for the first 5 of them using find and aggregate/match. The test shows 30 times difference between these two searches.

      let stime = new Date().getTime();
       
      let testdocs = 1000;
      let findcount = 10000;
      let ids = [];
       
      for(let k=0; k < testdocs; k++) {
         let id = ObjectId();
       
         // collect first 5 IDs for look-up
         if(ids.length < 5)
            ids.push(id);
       
         db.test_collection.insertOne({_id: id});
      }
       
      print("Created " + db.test_collection.count() + " docs in " + ((new Date().getTime() - stime)/1000).toFixed(3) + " seconds");
       
      stime = new Date().getTime();
       
      for(let k=0; k < findcount; k++) {
         for(let i in ids)
            db.test_collection.find({_id: ids[i]});
      }
       
      print("Query: " + ((new Date().getTime() - stime)/1000).toFixed(3) + " seconds")
       
      stime = new Date().getTime();
       
      for(let k=0; k < findcount; k++) {
         for(let i in ids)
            db.test_collection.aggregate([{$match: {_id: ids[i]}}]);
      }
       
      print("Aggregate: " + ((new Date().getTime() - stime)/1000).toFixed(3) + " seconds")
       
      db.test_collection.drop();
      

      Show
      This test inserts 10K documents with just the _id field and searches for the first 5 of them using find and aggregate/match . The test shows 30 times difference between these two searches. let stime = new Date().getTime();   let testdocs = 1000; let findcount = 10000; let ids = [];   for (let k=0; k < testdocs; k++) { let id = ObjectId();   // collect first 5 IDs for look-up if (ids.length < 5) ids.push(id);   db.test_collection.insertOne({_id: id}); }   print( "Created " + db.test_collection.count() + " docs in " + (( new Date().getTime() - stime)/1000).toFixed(3) + " seconds" );   stime = new Date().getTime();   for (let k=0; k < findcount; k++) { for (let i in ids) db.test_collection.find({_id: ids[i]}); }   print( "Query: " + (( new Date().getTime() - stime)/1000).toFixed(3) + " seconds" )   stime = new Date().getTime();   for (let k=0; k < findcount; k++) { for (let i in ids) db.test_collection.aggregate([{$match: {_id: ids[i]}}]); }   print( "Aggregate: " + (( new Date().getTime() - stime)/1000).toFixed(3) + " seconds" )   db.test_collection.drop();

      Description

      I am seeing a huge difference between db.col.find({_id: someid}) and db.col.aggregate([{$match: {_id: someid}}]) running against the same _id values. In the simplest case demonstrated with the test script below, the documents contain nothing but the _id field. In the actual application, I use $lookup stage after $match.

      Since it's the _id field, there's no table scan and other usual things to look at. I checked profiling and logs and see nothing of interest - each of the slow queries is shorter than 1 ms, so nothing catches it, but with real data, this difference is significant and cannot be unnoticed to the point when it's cheaper to just abandon the aggregate framework, which is undesirable for obvious benefits.

      All tests below are against a MongoDB server 4.2.0.

      The test below inserts 10K documents with just _id fields and searches for first 5 of them 10K times using find and aggregate/match. The search times are:

      Query      :  0.544 seconds
      Aggregate: 15.551 seconds
      

      I ran this on Windows and Linux with the same outcome. The hardware has plenty of RAM and an SSD disk.

        Attachments

          Activity

            People

            Assignee:
            backlog-server-triage Backlog - Triage Team
            Reporter:
            cis74633@bell.net Andre M
            Participants:
            Votes:
            0 Vote for this issue
            Watchers:
            8 Start watching this issue

              Dates

              Created:
              Updated:
              Resolved: