[SERVER-15802] Query optimizer should always use equality predicate over unique index when possible Created: 24/Oct/14  Updated: 25/Jun/15  Resolved: 16/Jan/15

Status: Closed
Project: Core Server
Component/s: Querying
Affects Version/s: 2.6.5
Fix Version/s: 2.6.8, 3.0.0-rc6

Type: Bug Priority: Major - P3
Reporter: Migalin Danila Assignee: David Storch
Resolution: Done Votes: 8
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
Duplicate
is duplicated by SERVER-16350 Wrong index choice when request conta... Closed
Related
related to SERVER-1599 support ability to pass hint to update Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Completed:
Participants:

 Description   

After upgrading sharded cluster (shard key {uid:1}) from 2.4.9 to 2.6.5 I've began to notice periodic and sudden significant performance drops caused by huge read queue and lock on different shards. Logs shows slow queries like this:

<skpd> query mpfs.hidden_data query: { _id: "5120d6d1c7624f319c35fdff9e596890", uid: "230984277" } planSummary: IXSCAN { uid: 1.0 } ntoskip:0 nscanned:28184 nscannedObjects:28184 keyUpdates:0 numYields:220 locks(micros) r:797191 nreturned:1 reslen:374 7134ms

This collection has indexes {uid: 1} and {_id: 1}, and in normal situation, queries like that uses "_id" index that is the fastest one.

This problem seems to be very similar to SERVER-13675 but it's marked as fixed in 2.6.2.



 Comments   
Comment by Githook User [ 06/Feb/15 ]

Author:

{u'username': u'dstorch', u'name': u'David Storch', u'email': u'david.storch@10gen.com'}

Message: SERVER-15802 if an equality over a single-field unique index can be used to index an AND, ignore all other indices

(cherry picked from commit c3da2fc6642f143111d104c871f420d523f949b5)

Conflicts:
src/mongo/db/query/get_executor.cpp
src/mongo/db/query/index_entry.h
src/mongo/db/query/query_planner_test.cpp
src/mongo/s/chunk.cpp
Branch: v2.6
https://github.com/mongodb/mongo/commit/d148fd1b0cbaf6d19b91658598d0b7d4c2eb0a97

Comment by Githook User [ 16/Jan/15 ]

Author:

{u'username': u'dstorch', u'name': u'David Storch', u'email': u'david.storch@10gen.com'}

Message: SERVER-15802 if an equality over a single-field unique index can be used to index an AND, ignore all other indices
Branch: master
https://github.com/mongodb/mongo/commit/c3da2fc6642f143111d104c871f420d523f949b5

Comment by David Storch [ 13/Jan/15 ]

Hi miga,

We appreciate that there are index selection problems in MongoDB's query optimizer that affect non-unique indices. The assessment of our development team about this case was that

  1. the scenario reported involving a unique index was severe enough to require a fix, and
  2. the special case of a unique index has a safe and efficient solution.

If the fix for the unique index case is not sufficient, feel free to raise a new ticket describing a separate scenario.

Best,
Dave

Comment by Migalin Danila [ 13/Jan/15 ]

One small remark: original ticket was named "Wrong index selection", and non-unique indexes are affected by this issue too.

Comment by Migalin Danila [ 12/Nov/14 ]

Thank you, Ramon, that's it.
Bad thing is that updates to collections are affected by subj too, but there is no hints for write operations (SERVER-1599).
In case of updates, slow-log looks like this:

2014-11-12T22:48:16.206+0300 [conn8625] update mpfs.user_data query: { _id: "d3e90e0b09f14b98ada9831763f982aa", uid: "197337546" } update: { ...... } nscanned:22066 nscannedObjects:0 nMatched:1 nModified:1 keyUpdates:1 numYields:172 locks(micros) w:215817 20844ms

Comment by Ramon Fernandez Marina [ 12/Nov/14 ]

miga, the behavior you're observing arises in the following scenario:

  • indices on {_id:1} and {uid:1}
  • the following query is run:

    db.user_data.find({ _id: "7277e1122d67768b4a0a1292b1a244ee", uid: "208450197" })

  • while there's a document with {_id: "7277e1122d67768b4a0a1292b1a244ee"}, there are no documents with {uid:208450197}, so the {uid:1} plan is cached because the query planner observes it's faster than the {_id:1} plan (the former returns nscanned:0, the latter nscanned:1)
  • subsequent queries with the {_id:1, uid:1} shape use the the cached {uid:1} plan instead of the better {_id:1} plan; in your case this query showcases the issue:

    db.user_data.find({ _id: "5120d6d1c7624f319c35fdff9e596890", uid: "230984277" })

The way to address this in the server is not immediately clear, as a fix for this particular scenario may have undersired side-effects in other cases, so we're investigating a general solution.

Until this issue is addressed, the workaround is to hint() on the desired index. Feel free to watch this ticket for progress in this matter.

Comment by Andrey Godin [ 11/Nov/14 ]

Hi Ramon!
What is the state of this ticket. Do I need some kind of debug?

Comment by Ofer Cohen [ 11/Nov/14 ]

See also this thread in the forum:
https://groups.google.com/forum/#!topic/mongodb-user/XaX7CAkJUG4

Comment by Migalin Danila [ 30/Oct/14 ]

That's great.

Anyway, I've already collected information you asked, you can check it here (logs are pretty big) https://yadi.sk/d/gXGjgvglcPxhB

Comment by Ramon Fernandez Marina [ 30/Oct/14 ]

Hi miga, I think I'm able to reproduce the same behavior you're observing, so no need to provide additional information for now. Please watch this ticket for status updates.

Comment by Ramon Fernandez Marina [ 29/Oct/14 ]

Sorry, my mistake – please try again

Comment by Migalin Danila [ 29/Oct/14 ]

Ramon, ugh, sorry, but your link leads to this ticket and I don't see any of your comments before this one:

Hi Migalin Danila; please see my previous post with detailed instructions on what additional information to collect. The purpose of using explain() is not to force the right index to be used (that would be hint()), but to gather more information on why the query is using the wrong index altogether.

Mb your comment is under lock?

Comment by Ramon Fernandez Marina [ 29/Oct/14 ]

miga, here's the link with the instructions on what additional information to provide.

I agree having to use hints is not a good general solution, but we need this addition information to make progress investigating this ticket.

Thanks,
Ramón.

Comment by Migalin Danila [ 29/Oct/14 ]

Ramon, sorry, can you post a link to instructions?
I've used explain() to check hypothesis that problem was caused by wrong cached index (similar to SERVER-13675), docs say:

query optimizer deletes the query plan and re-evaluates after any of the following events:
...
You run a query with :method:`~cursor.explain().

I've just deployed hotfix that uses hints for this kind of queries, but it doesn't looks like a good general solution.

Comment by Ramon Fernandez Marina [ 29/Oct/14 ]

Hi miga; please see my previous post with detailed instructions on what additional information to collect. The purpose of using explain() is not to force the right index to be used (that would be hint()), but to gather more information on why the query is using the wrong index altogether.

Comment by Migalin Danila [ 29/Oct/14 ]

"Status:Waiting For User Input Waiting For User Input"
What additional information can I provide?

I can confirm that explain() doesn't fix the problem, mongodb continues to use wrong index; the only way is to stepDown() master - mb just because most of queries moves to another mongodb instance that is not affected by subj.

Comment by Ramon Fernandez Marina [ 29/Oct/14 ]

Hi miga,

we'll need more information to diagnose this problem. Could you please follow the steps below and upload the resulting information? Note that your logs show two different collections, user_data and hidden_data – I've chosen one of them as the problem seems to appear in both, but please adjust if needed.

  • Increase the log level by running:

    db.adminCommand( { setParameter: 1, logLevel: 2 } )

  • List of indexes in your collection:

    db.user_data.getIndexes()

  • Display query shapes from the plan cache:

    db.user_data.getPlanCache().listQueryShapes()

  • Run getPlansByQuery on the offending query:

    db.user_data.getPlanCache().getPlansByQuery({ _id: "7277e1122d67768b4a0a1292b1a244ee", uid: "208450197" })

  • Run the query with explain(true):

    db.user_data.find({ _id: "7277e1122d67768b4a0a1292b1a244ee", uid: "208450197" }).explain(true)

  • Re-set the log level to the previous value (probably 0 if you didn't change this before) to prevent logs from taking too much space:

    db.adminCommand( { setParameter: 1, logLevel: 0 } )

You can put each piece of information in a file, zip all files along with the mongod.log log files from your cluster and post it to this ticket. If the resulting file is too large (hundreds of megabytes) please let me know before and I'll send you different upload instructions.

Thanks,
Ramón.

Comment by Migalin Danila [ 27/Oct/14 ]

Well, looks like something is completely wrong - while observing subj on some shard, I've got this in slow-logs:

2014-10-27T16:28:35.062+0300 [conn1849900] query mpfs.user_data query: { _id: "7277e1122d67768b4a0a1292b1a244ee", uid: "208450197" } planSummary: IXSCAN { uid: 1.0, data.mt: 1.0 } ntoskip:0 nscanned:206915 nscannedObjects:206915
keyUpdates:0 numYields:1596 locks(micros) r:97662182 nreturned:0 reslen:20 249690ms

For some reason,

{uid:1, data.mt:1}

index was choosen to serve {_id: xxxx, uid: yyyy} queries.

I've also tried to run db.coll.find(problem-query).explain() to re-evaluate query-plan as described in docs, but that didn't change anything.

Generated at Thu Feb 08 03:39:02 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.