[SERVER-44789] 30 times speed difference between `find()` and `aggregate($match)` against `_id` Created: 22/Nov/19  Updated: 27/Oct/23  Resolved: 25/Nov/19

Status: Closed
Project: Core Server
Component/s: Performance
Affects Version/s: 4.2.0
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Andre M Assignee: Backlog - Triage Team
Resolution: Works as Designed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: File mongo-10k-objs.js    
Assigned Teams:
Server Triage
Operating System: ALL
Steps To Reproduce:

This test inserts 10K documents with just the _id field and searches for the first 5 of them using find and aggregate/match. The test shows 30 times difference between these two searches.

let stime = new Date().getTime();
 
let testdocs = 1000;
let findcount = 10000;
let ids = [];
 
for(let k=0; k < testdocs; k++) {
   let id = ObjectId();
 
   // collect first 5 IDs for look-up
   if(ids.length < 5)
      ids.push(id);
 
   db.test_collection.insertOne({_id: id});
}
 
print("Created " + db.test_collection.count() + " docs in " + ((new Date().getTime() - stime)/1000).toFixed(3) + " seconds");
 
stime = new Date().getTime();
 
for(let k=0; k < findcount; k++) {
   for(let i in ids)
      db.test_collection.find({_id: ids[i]});
}
 
print("Query: " + ((new Date().getTime() - stime)/1000).toFixed(3) + " seconds")
 
stime = new Date().getTime();
 
for(let k=0; k < findcount; k++) {
   for(let i in ids)
      db.test_collection.aggregate([{$match: {_id: ids[i]}}]);
}
 
print("Aggregate: " + ((new Date().getTime() - stime)/1000).toFixed(3) + " seconds")
 
db.test_collection.drop();

Participants:

 Description   

I am seeing a huge difference between db.col.find({_id: someid}) and db.col.aggregate([{$match: {_id: someid}}]) running against the same _id values. In the simplest case demonstrated with the test script below, the documents contain nothing but the _id field. In the actual application, I use $lookup stage after $match.

Since it's the _id field, there's no table scan and other usual things to look at. I checked profiling and logs and see nothing of interest - each of the slow queries is shorter than 1 ms, so nothing catches it, but with real data, this difference is significant and cannot be unnoticed to the point when it's cheaper to just abandon the aggregate framework, which is undesirable for obvious benefits.

All tests below are against a MongoDB server 4.2.0.

The test below inserts 10K documents with just _id fields and searches for first 5 of them 10K times using find and aggregate/match. The search times are:

Query      :  0.544 seconds
Aggregate: 15.551 seconds

I ran this on Windows and Linux with the same outcome. The hardware has plenty of RAM and an SSD disk.



 Comments   
Comment by Daniel Gottlieb (Inactive) [ 25/Nov/19 ]

No worries cis74633@bell.net. I've made the same mistake before regarding performance testing find. FWIW, I too was able to reproduce the performance difference between find and aggregate. The fact there is a difference for simple queries where data is in memory makes sense to me (where querying overhead doesn't get dominated by I/O), though I suspect there's probably some room for improvement as well.

Comment by Andre M [ 25/Nov/19 ]

Just to close this off, I ran the tests I initially intended and a single match stage is about 14% slower than a single find call, but when joining documents, match/lookup is about 44% faster than two find calls for linked documents. The latter is expected, since the join is performed on the server, but the former is a bit unfortunate, as performance-conscious applications need to maintain two different sets of client methods - one for find-based queries and one for joins using the aggregation framework. It would simplify applications if a single match stage would run just as fast as a find call with equivalent criteria.

I'm attaching the script for those numbers I quoted. Thanks again for looking into this so quickly and my apologies for the false alarm.

Test script: mongo-10k-objs.js

Comment by Andre M [ 23/Nov/19 ]

Thank you for a quick response, Daniel.

I didn't want the test to be affected by possibly-different ways to get result sets for these different query mechanisms, but didn't think that the query itself would be delayed until the query result is evaluated. Adding next, as you suggested, or even hasNext will do that and query results are similar now:

Query    : 13.601 seconds
Aggregate: 15.418 seconds 

My bad. Please, go ahead and close this bug.

 

 

Comment by Daniel Gottlieb (Inactive) [ 23/Nov/19 ]

Hi cis74633@bell.net,

Thanks for filing a ticket with a clear reproducer! Can you try making a modification and seeing if there's still a discrepancy? I'm curious if anything changes if you add a .next() to your find statement:

db.test_collection.find({_id: ids[i]}).next();

I suspect the find(...) function is just returning a "query object" that doesn't send a request over the wire. This is so the query can be augmented with a sort or other query options.

Comment by Andre M [ 22/Nov/19 ]

I just tried this with v4.2.1 and the result is the same as described above.

My apologies for messing up formatting in the description. I cannot edit the description, so if you can remove the formatting in the first paragraph, so it doesn't look like a link, it would be great.

I also misspoke about the number of documents - I started with 10K, but then reduced to 1K, but still typed 10K above. Just for clarity, the test inserts 1K documents and searches for 5 of them 10K times.

Generated at Thu Feb 08 05:06:59 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.