-
Type:
Bug
-
Resolution: Unresolved
-
Priority:
Major - P3
-
None
-
Affects Version/s: 2.4.5
-
Component/s: Querying
-
Query Execution
-
ALL
-
-
None
-
None
-
None
-
None
-
None
-
None
-
None
Observed very long running queries involving $in matching a large array of regexps:
{ "foo" : { "$in" : [ /^1000000000/, /^1000000001/, /^1000000002/, /^1000000003/, /^1000000004/, /^1000000005/, /^1000000006/, ... /^1000009998/, /^1000009999/ ] } }
Attached script reproduces the issue. Time to run the query is quadratic or worse in the number of regexps; in following table n is number of regexps. Note that this run time is independent of the number of documents in the collection; note for example that in the reproducing script there is only a single document in the collection, and it does not match any of the regexps.
n | time |
5000 | 1.6 s |
10000 | 6.6s |
20000 | 34.0s |
Simple gdb-based sampling profiler shows a single thread spending most of its time for the duration of the query in code that appears to be related to turning the regexps (all of which start with ^) into a set of ranges. Could this algorithm be quadratic in the number of regexps?
Here's the portion of the call tree showing where the thread is spending its time; first column is average # of threads in that node of the call tree.
1.00 mongo::PortMessageServer::handleIncomingMsg
1.00 mongo::MyMessageHandler::process
1.00 mongo::assembleResponse
1.00 mongo::runQuery
1.00 mongo::queryWithQueryOptimizer
1.00 mongo::NamespaceDetailsTransient::getCursor
1.00 mongo::CursorGenerator::generate
0.99 mongo::CursorGenerator::setMultiPlanScanner
0.99 mongo::MultiPlanScanner::make
0.99 mongo::MultiPlanScanner::init
0.99 mongo::FieldRangeSet::FieldRangeSet
0.99 mongo::FieldRangeSet::init
0.99 mongo::FieldRangeSet::handleMatchField
0.99 mongo::FieldRangeSet::handleOp
0.99 mongo::FieldRangeSet::intersectMatchField
0.99 mongo::FieldRange::FieldRange
0.96 mongo::FieldRange::operator|=
Note that writes are blocked while this is happening.
Also please note that this has nothing to do with indexes: all the time is being spent preparing the query for execution, as per the above profile, and removing the index (which was in fact used) in the attached reproducing script does not change the behavior (as expected, since in the reproducing script no documents are in fact being matched.)
Similar query using a large number of strings in a $in match did not have the same issue (see query_string in script).