I'm going to try and describe an IXSCAN performance optimization.
Imagine a shard collection scenario. This collection is using 2TB of storage on disk. All queries to this collection will do index range scans on the shard key. These range scans will very often query 10, 20, etc, multiple chunks on disk. These queries will also have some regex or filters on parts of the document that are not in the shard key. Basically once the document is found to have meet the shard key bounds, we always have to inspect the contents of the document to know if it should be returned or not.
In scenarios like this one, mongo's query optimizer will have each replica set execute a IXSCAN operation to find and filter on the documents. For performance reasons, I believe in scenarios like this one Mongo should always full collection scan as the chunk shard key bounds effectively make doing IXSCAN operations unnecessary. We already know every document or a large portion of the documents in the chunk are going to have to be scanned. In cases like this a COLLSCAN operation is far more efficient.
I've seen this behavior happen on range scans on shard keys on small percentages of the documents in a collection. I've seen the optimizer pick this behavior when the query bounds would target every document//chunk in a sharded collection as well. In both of these cases full collection scanning is the best option.
Ideally what I think should happen is:
1. Mongos figures out what data chunks have data for the range bounds of the query on the shard key like it currently does
2. Mongos sends the query down to mongod
3. Mongod's optimizer recognizes that a collection scan is more efficient and does that instead of an index scan
If option 3 can't happen maybe a special query hint that isn't a full collection scan query hint, but a query hint that says, do a full data chunk scan on anything that is left after we filter out all the unnecessary data chunks using the bounds provided on the shard key.
If you need more info or don't understand what I'm trying to describe, I'm happy to go into even more detail.
- duplicates
-
SERVER-13065 Consider a collection scan even if indexed plans are available
- Backlog