[SERVER-17054] Index not being utilized, on occasion. Created: 26/Jan/15  Updated: 30/Jun/15  Resolved: 22/May/15

Status: Closed
Project: Core Server
Component/s: Index Maintenance, Performance, Querying
Affects Version/s: 2.6.5
Fix Version/s: None

Type: Bug Priority: Critical - P2
Reporter: Kamil Keski Assignee: J Rassi
Resolution: Duplicate Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: File dataset.json.bz2    
Issue Links:
Duplicate
duplicates SERVER-15225 CachedPlanStage should execute for tr... Closed
Related
related to SERVER-17362 ClientCursor::staticYield can't unloc... Closed
is related to SERVER-18507 A query will take very long time in a... Closed
Operating System: Linux
Steps To Reproduce:

Sporadic issue that happens during normal and low load. We have the following query against a user collection.

{ "$query" : { "a_token" : "*******************"} , "$orderby" : { "_id" : 1}}

The orderby is generated by Mongoid, not needed but shouldn't affect the outcome.

Here are the indexes on this collection.

[
        {
                "v" : 1,
                "key" : {
                        "_id" : 1
                },
                "name" : "_id_",
                "ns" : "database.user"
        },
        {
                "v" : 1,
                "key" : {
                        "a_token" : 1
                },
                "name" : "a_token_1",
                "ns" : "database.user",
                "background" : true
        }
]

Most of the time the query returns in under 1ms. On occasion we run up to 2000+ms. running the query manually during these times shows the same. If I remove the orderby the query returns as expected.

What is going on here? The DB is not being stressed and it seems the query is being run out of order (perhaps trying to return all _id's, ordering, then selecting the a_token). The sporadic nature is most worrying, we are having a very hard time tracking down the cause.

Participants:

 Description   

Sporadic performance issue affecting a simple query.

{ "$query" : { "a_token" : "*******************"} , "$orderby" : { "_id" : 1}}

No index misses are being reported. Most of the time the query returns in under 1ms. On occasion we run up to 2000+ms. running the query manually during these times shows the same. If I remove the orderby the query returns as expected. Seems to be running out of order.



 Comments   
Comment by Ramon Fernandez Marina [ 03/Jun/15 ]

ole, you may be running into a different issue. Can you please open a separate SERVER ticket with the information above?

The first request from us will be to post explain(true) output for your queries as well mongod logs at loglevel 1, so it will help us diagnose your issue if you could gather this information as well.

Thanks,
Ramón.

Comment by Ole Langbehn [ 03/Jun/15 ]

Please excuse me commenting on a closed ticket. I'd like to ask a question:

We're seeing this behaviour, but on a complex query like this:

{
  "$and" : [
    {
      "$or": [
        {
	  "a" : "RhKCH4QbCczpiV-E7ybJ8Q",
	  "created" : \{
	    "$gte" : ISODate("2013-08-01T00:00:00Z"),
	    "$lte" : ISODate("2015-06-03T20:18:52.112Z")
          }
	},
        {
	  "b" : "DWfdYSTJL3KGnU4i4bkvCQ",
	  "created" : \{
	    "\$gte" : ISODate("2013-08-01T00:00:00Z"),
	    "\$lte" : ISODate("2015-06-03T20:18:52.112Z")
          }
	}
      ]
 
    },
    {
      "$or" : [
	{
	  "t" : false
	},
	{
	  "v" : "test"
	}
      ]
    }
  ]
}

We have compound indexes on {a:1,created:1}
and {b:1,created:1}, as well as an index on {created:1}. Sometimes, the index on created is used for the subqueries on a,created and b,created and queries perform poorly.

As far as we can see, we can't use hinting, since hinting only allows to provide one index.

SERVER-1649 could help, or SERVER-17625 (we don't want to use index filters if they don't persist).

Do you see any existing workaround for our case?

Comment by Ramon Fernandez Marina [ 22/May/15 ]

enigmnaphoenix, after examining the information you sent us we've been able to understand the root cause of the behavior you're seeing and even reproduce it on our end. Here's what's happening:

  1. A findAndModify({query: {a_token: ***}, sort:{_id:1}, ...) is issued, and the plan ranker considers both the "a_token_1" and "_id" indexes to satisfy the query part of this command
  2. For this particular query, the "_id" index happens to be faster (this is not always the case, and depends on factors like data distribution, etc.)
  3. The findAndModify() command is executed, and a query plan for this query shaped is cached. The plan says 'use the "_id"' index for queries of this shape
  4. Another findAndModify command with the same query shape is issued
  5. As per the query plan cache, the "_id" index is used again, but this time the "_id" index turns out to be sub-optimal for this query; for example, if the value in "a_token" does not exist in your collection the query needs to examine the full "_id" index – not knowing that using the "a_token_1" index would be a lot faster.

Cached query plans get re-evaluated over time so this situation is only temporary, but it may re-occur as well. Some users will never run into it and others may be impacted like yourself.

We've improved this area of plan ranking and the plan cache as part of SERVER-15225, but it's not clear as of yet if SERVER-15225 can be safely backported to older versions. Until this is determined, you may want to consider a workaround: hinting on the "a_token_1" index to ensure that an optimal index is always used; note that even if the "_id" index may be faster for some of these queries on some occasions it is not by much, and hinting on "a_token_1" should give you the performance you need.

I'm going to resolve this ticket as a duplicate of SERVER-15225, but feel free to post here on your observations when using the proposed workaround or any other thoughts you may have about this issue.

Regards,
Ramón.

Comment by Ramon Fernandez Marina [ 21/May/15 ]

enigmnaphoenix, thanks for uploading the additional information we requested, and apologies for the radio silence. I still haven't been able to reproduce this issue locally, but we're still investigating what could be causing it. We'll post updates in this ticket as we have them.

Comment by Michael Schmatz [ 21/Feb/15 ]

We are also seeing this behavior.

An example event is

2015-02-21T02:34:25.874+0000 I QUERY [conn154] query coco.level.sessions query: { creator: "54e7ec96842b93d60683e87d", level: { original: "544a98f62d002f0000fe331a", majorVersion: 0 } } planSummary: IXSCAN { level: 1.0 } ntoskip:0 nscanned:648252 nscannedObjects:648252 keyUpdates:0 writeConflicts:0 numYields:13216 nreturned:1 reslen:1678 locks:{} 132315ms

I have complete index coverage over all of those fields (individual indexes, not compound.) If I execute the query myself, it returns instantaneously. We're seeing this issue a couple of times a day.

Comment by Kamil Keski [ 02/Feb/15 ]

Checking in Ramon. We keep having sporadic incidents of this behavior with another period today spanning 10 minutes. Should I continue to capture more logs from these events and submit or do you feel the data gathered last week was sufficient?

Thank you,
K

Comment by Kamil Keski [ 27/Jan/15 ]

Much appreciated Ramon. Info has been uploaded, please let me know if there is anything else we can provide to help track this down.

Comment by Ramon Fernandez Marina [ 27/Jan/15 ]

enigmnaphoenix, you can now upload logs securely and privately as follows:

scp -P 722 -r <compressed_logs> SERVER-17054@www.mongodb.com:

Any data uploaded here will only be accessible to MongoDB staff to track down this issue. When prompted for a password just hit enter.

Comment by Kamil Keski [ 27/Jan/15 ]

We have captured during an event. Is there a more secure way I can get the logs to you? I will sanitize them to an extent, but still do not feel comfortable posting them here for a large audience.

Thank you,
K

Comment by Ramon Fernandez Marina [ 26/Jan/15 ]

Hi enigmnaphoenix, we'll need some more information to diagnose this behavior. Can you please run the commands below in the shell and post the output to this ticket? Please include the logs logs produced by running these commands as well:

// Save current logLevel, bump logLevel to 2
var oldLogLevel = db.adminCommand({getParameter:1, logLevel:1}).logLevel
db.adminCommand( { setParameter: 1, logLevel: 2 } ) 
 
db.col.getPlanCache().listQueryShapes()
db.col.find({"a_token" : "*******************"}).explain(true)
db.col.getPlanCache().getPlansByQuery({"a_token" : "*******************"})
 
db.col.find({"a_token" : "*******************"}, {}).sort({ _id : 1}).explain(true)
db.col.getPlanCache().getPlansByQuery({"a_token" : "*******************"}, {}, { _id : 1})
 
// Reset logLevel
db.adminCommand( { setParameter: 1, logLevel: oldLogLevel } ) 

Generated at Thu Feb 08 03:43:09 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.