[SERVER-27076] Query Planner uses inefficient plan for regular expression + sort with aggregations Created: 16/Nov/16  Updated: 17/Nov/16  Resolved: 17/Nov/16

Status: Closed
Project: Core Server
Component/s: Aggregation Framework
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Jacob Jewell Assignee: Kelsey Schubert
Resolution: Duplicate Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Duplicate
duplicates SERVER-7568 Aggregation framework favors non-bloc... Closed
Operating System: ALL
Steps To Reproduce:

var bulk = db.items.initializeUnorderedBulkOp();
 
for (var i = 0; i < 10000000; i++) {
  bulk.insert({ name: i.toString(), created: new Date() });
}
 
bulk.execute();
 
db.items.createIndex({ name: 1, created: 1 }, { background: true });
db.items.createIndex({ created: 1 }, { background: true });
 
// This command runs instantly
db.things
    .find({ name: /^999999/ })
    .sort({ created: 1 });
 
// This command takes several seconds to complete
db.things.aggregate([
    { $match: { name: /^999999/ } },
    { $sort: { created: 1 } }
]);

Participants:

 Description   

If a collection has a compound index on a string field + a sort field and another index on just the sort field then an aggregation using a regular expression match will only use the index with just the sort field. This does not happen with the equivalent find command. Adding an index on sort field + string field doesn't improve performance. Changing the location of the sort in the aggregation also has no effect.

The reason for both indexes is that some queries need to be able to do a find based just on the sort key.



 Comments   
Comment by Charlie Swanson [ 17/Nov/16 ]

Hi jakesjews,

Unfortunately that is the only known workaround at this time. Note the problem here is that if your regex happened to match 90% of the collection, then a sort using the {created: 1} index would be the better choice. Only when the result set is small enough does it make sense to favor an in-memory (blocking) sort. Unfortunately, our aggregation optimizer is not sophisticated enough to determine the point where one becomes faster than the other, so it instead always chooses non-blocking, index-provided sorts when they are available.

Please vote for SERVER-7568 and continue to watch that issue for updates. Thanks!

Comment by Jacob Jewell [ 16/Nov/16 ]

Sorry about the duplicate issue you can close this one. Is the only way to work around this to add a $project? We have a pattern throughout our web app where users have an optional search textbox for serial numbers above reports. We don't want to hurt performance when the text search is empty so we need the ability to sort collections with or without a text search. Adding a conditional $project to all our queries based on the presence of a wildcard would be a pretty difficult task.

Generated at Thu Feb 08 04:14:06 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.