Loading...

XML

Word

Printable

JSON

Type: Improvement
Resolution: Duplicate
Priority: Major - P3
Fix Version/s: None
Affects Version/s: None
Component/s: Querying, Sharding
Labels:
None

Confidence Status:
None
Work Order:
3
CAR Domain/s:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

I'm going to try and describe an IXSCAN performance optimization.

Imagine a shard collection scenario. This collection is using 2TB of storage on disk. All queries to this collection will do index range scans on the shard key. These range scans will very often query 10, 20, etc, multiple chunks on disk. These queries will also have some regex or filters on parts of the document that are not in the shard key. Basically once the document is found to have meet the shard key bounds, we always have to inspect the contents of the document to know if it should be returned or not.

In scenarios like this one, mongo's query optimizer will have each replica set execute a IXSCAN operation to find and filter on the documents. For performance reasons, I believe in scenarios like this one Mongo should always full collection scan as the chunk shard key bounds effectively make doing IXSCAN operations unnecessary. We already know every document or a large portion of the documents in the chunk are going to have to be scanned. In cases like this a COLLSCAN operation is far more efficient.

I've seen this behavior happen on range scans on shard keys on small percentages of the documents in a collection. I've seen the optimizer pick this behavior when the query bounds would target every document//chunk in a sharded collection as well. In both of these cases full collection scanning is the best option.

Ideally what I think should happen is:
1. Mongos figures out what data chunks have data for the range bounds of the query on the shard key like it currently does
2. Mongos sends the query down to mongod
3. Mongod's optimizer recognizes that a collection scan is more efficient and does that instead of an index scan

If option 3 can't happen maybe a special query hint that isn't a full collection scan query hint, but a query hint that says, do a full data chunk scan on anything that is left after we filter out all the unnecessary data chunks using the bounds provided on the shard key.

If you need more info or don't understand what I'm trying to describe, I'm happy to go into even more detail.

duplicates

SERVER-13065 Consider a collection scan even if indexed plans are available

Backlog

Assignee:: Kyle Suarez (Inactive)
Reporter:: Matthew Kruse
Participants:: Kelsey Schubert, Kyle Suarez, Matthew Kruse
Votes:: 0 Vote for this issue
Watchers:: 11 Start watching this issue

Created:: Feb 28 2018 09:13:08 PM UTC
Updated:: Apr 23 2018 09:59:56 PM UTC
Resolved:: Mar 26 2018 01:53:41 PM UTC

Details

Description

Attachments

Issue Links

Activity

People

Dates