[SERVER-12287] Query containing $regex operator: performance regression Created: 08/Jan/14  Updated: 11/Jul/16  Resolved: 08/Jan/14

Status: Closed
Project: Core Server
Component/s: Performance, Querying
Affects Version/s: 2.5.4
Fix Version/s: 2.5.5

Type: Task Priority: Major - P3
Reporter: Davide Italiano Assignee: Davide Italiano
Resolution: Done Votes: 0
Labels: query
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Linux ip-10-0-0-15 3.11.0-12-generic #19-Ubuntu SMP Wed Oct 9 16:20:46 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux


Attachments: PNG File Screenshot 2014-01-08 14.26.25.png    
Issue Links:
Related
related to SERVER-12952 Regex query performance regression in... Closed
Participants:

 Description   

Some queries using $regex operator (e.g. "starts with") are slower in 2.6 than in 2.4. This has been noticeable on the mongo-perf suite regex test.
$explain suggests the number of scanned objects for all the query plans in 2.6 is double than in 2.4.

> for (var i=0; i < 1000;++i) { db.goo.insert({"_id":i.toString()}); }
Insert WriteResult({ "ok" : 1, "n" : 1 })
> db.goo.find( { "_id": { $regex: "^1" }}).explain()

$explain output for 2.4.8

{
        "cursor" : "BtreeCursor _id_ multi",
        "isMultiKey" : false,
        "n" : 111,
        "nscannedObjects" : 111,
        "nscanned" : 112,
        "nscannedObjectsAllPlans" : 111,
        "nscannedAllPlans" : 112,
        "scanAndOrder" : false,
        "indexOnly" : false,
        "nYields" : 0,
        "nChunkSkips" : 0,
        "millis" : 0,
        "indexBounds" : {
                "_id" : [
                        [
                                "1",
                                "2"
                        ],
                        [
                                /^1/,
                                /^1/
                        ]
                ]
        },
        "server" : "ip-10-0-0-15:27017"
}

$explain output for trunk (hash 5432e4836aec87fe9b53efe19d3bfff90ef0f6ef)

{
        "cursor" : "BtreeCursor _id_",
        "isMultiKey" : false,
        "n" : 111,
        "nscannedObjects" : 111,
        "nscanned" : 112,
        "nscannedObjectsAllPlans" : 210,
        "nscannedAllPlans" : 211,
        "scanAndOrder" : false,
        "indexOnly" : false,
        "nYields" : 1,
        "nChunkSkips" : 0,
        "millis" : 0,
        "indexBounds" : {
                "_id" : [
                        [
                                "1",
                                "2"
                        ],
                        [
                                /^1/,
                                /^1/
                        ]
                ]
        },
        "server" : "ip-10-0-0-15:27017"
}



 Comments   
Comment by Scott Hernandez (Inactive) [ 06/Feb/14 ]

Please file a new issue. Include some sample data (or your dumped data), your indexes and a full explain (pass in true to explain).

Comment by rgpublic [ 06/Feb/14 ]

Um, is this really fixed with 2.5.5? I recently tried to upgrade from 2.4.9 to 2.5.5 and I'm experiencing catastrophic regex query times with 2.5.5 so I had to downgrade again. My query simply is:

{
"folder":/^media/,
"type":"file"
}

2.4.9 gives me this query plan:

"cursor": "BtreeCursor type_folder multi",
"nscanned": NumberInt(2583),

2.5.5 gives this query plan:

"cursor": "Complex Plan",
"nscanned": NumberInt(8470527),

This plan is then running for literally ages
I wonder why 2.5.5 isn't even considering my "type_folder" index... Is this the same bug or should I rather file a new one?

Comment by Daniel Pasette (Inactive) [ 08/Jan/14 ]

Fixed by enabling query plan cache. See: SERVER-10564

Comment by Davide Italiano [ 08/Jan/14 ]

The problem is gone away now that cache for query plans. Apparently the time spent to generate all the query plans overcomes the benefits, as you guessed. 2.5.5 as per today is up to about 14% faster than baseline (2.4.8), see attachment.

Comment by Daniel Pasette (Inactive) [ 08/Jan/14 ]

can you try this same test against HEAD?
Query plan cache is now in, so the subsequent run of that query should appear the same as in 2.4.8.

commit 0d818f66f1a2429ce65b19afd9b5e7b64076732b
Author: Benety Goh <benety@mongodb.com>
Date: Thu Jan 2 16:59:36 2014 -0500

SERVER-5470 extended plan cache to hold all solutons from planning process

Generated at Thu Feb 08 03:28:08 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.