[SERVER-23406] index scan is slower than full collection scan in some scenarios Created: 29/Mar/16  Updated: 15/Aug/23

Status: Backlog
Project: Core Server
Component/s: Querying
Affects Version/s: None
Fix Version/s: None

Type: Improvement Priority: Major - P3
Reporter: Asya Kamsky Assignee: Backlog - Query Execution
Resolution: Unresolved Votes: 0
Labels: asya, bi-performance
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: PNG File Screenshot 2016-04-12 15.51.00.png     PNG File Screenshot 2016-04-12 16.35.16.png     PNG File Screenshot 2016-04-12 17.53.37.png     PNG File Screenshot 2016-04-12 17.53.59.png     PNG File flame-graph-screenshot.png     PNG File flame-graph-screenshot.png     File ixscan-flamegraph.svg     File time-ixscan-vs-collscan.js    
Issue Links:
Depends
is depended on by SERVER-69257 COUNT_SCAN plan not selected when unf... Blocked
Duplicate
is duplicated by SERVER-44057 Find on non-indexed and sort on index... Closed
Problem/Incident
causes SERVER-59341 Query planner choses collscan over co... Backlog
Related
is related to SERVER-29227 $nin operation choose IXSCAN but not ... Closed
is related to SERVER-20066 Query planner should consider index s... Closed
Assigned Teams:
Query Execution
Participants:
Case:
Linked BF Score: 0

 Description   

Tested with WiredTiger which seems to exhibit this more than MMAP.

Covered index query which has to return all the documents from a collection is slower than just a full collection scan even in cases where index is quite small and collection is very large.

Looking at where we spend time, it appears for each "document" we need to "synthesize" from the index we allocate and free some memory. We apparently have some optimization for avoiding that in collection scan but not full index scan. This is to track the same optimization for index scan.



 Comments   
Comment by Kyle Suarez [ 16/May/17 ]

Here it is: time-ixscan-vs-collscan.js

It doesn't populate any data – I did that separately as I wanted to run this script many times. The data I used for the profiling was pretty simple:

populate-ixscan-vs-collscan.js

Random.setRandomSeed();
let bulk = db.getCollection("ixscanVsCollscan").initializeUnorderedBullkOp();
for (let i = 0; i < 3000000; i++) {
    bulk.insert({_id: i, data: Random.rand()});
}
assert.writeOK(bulk.execute());

If you prefer copying and pasting over downloading files:

time-ixscan-vs-collscan.js

"use strict";
 
const trials = 20;
const query = {};
const proj = {_id: 0, data: 1};
const collscan = {$natural: 1};
const ixscan = {data: 1};
 
let coll = db.getCollection("ixscanVsCollscan");
 
print("Creating indexes...");
coll.createIndex({data: 1});
 
print("Testing ixscan...");
let ixscanTrials = [];
for (let i = 0; i < trials; i++) {
    let data = coll.explain("executionStats").find(query, proj).hint(ixscan).next();
    ixscanTrials.push(data.executionStats.executionTimeMillis);
}
ixscanTrials.sort();
const ixscanAverage = ixscanTrials.reduce((acc, val) => acc + val) / trials;
const ixscanMedian = ixscanTrials[trials / 2];
print("Average ixscan execution time (milliseconds): " + ixscanAverage);
print("Median ixscan execution time (milliseconds): " + ixscanMedian);
 
print("Testing collscan...");
let collscanTrials = [];
for (let i = 0; i < trials; i++) {
    let data = coll.explain("executionStats").find(query, proj).hint(collscan).next();
    collscanTrials.push(data.executionStats.executionTimeMillis);
}
collscanTrials.sort();
const collscanMedian = collscanTrials[trials / 2];
const collscanAverage = collscanTrials.reduce((acc, val) => acc + val) / trials;
print("Average collscan execution time (milliseconds): " + collscanAverage);
print("Median collscan execution time (milliseconds): " + collscanMedian);

Might want to discard the first trial of each for timing purposes to let the data move into cache.

Comment by Geert Bosch [ 16/May/17 ]

kyle.suarez, could you attach your time-ixscan-vs-collscan.js? The main issue is that we should do matching directly on KeyString values, rather than converting KeyString to BSON and then doing woCompare on that. All the extra copying/allocating memory is what really matters here: no amount of optimization of toBSON (which is already quite well optimized) will help here.

Comment by Kyle Suarez [ 16/May/17 ]

I've done a fresh analysis on a recent version of 3.5.x as a part of my work on SERVER-20066. I went for a workload that makes the WiredTiger collection scan shine in comparison to a covered index scan: 3 million small documents (two float fields), all in memory, with an empty query and a projection.

Some timing analysis shows that the ixscan execution time is roughly 1.7x slower than the collection scan. Each invocation of the script calculates an average of 20 trials:

$ mongo code/time-ixscan-vs-collscan.js
Testing ixscan...
Average ixscan execution time (milliseconds): 5118.95
Median ixscan execution time (milliseconds): 5141
Testing collscan...
Average collscan execution time (milliseconds): 2864.75
Median collscan execution time (milliseconds): 2902
 
$ mongo time-ixscan-vs-collscan.js
Testing ixscan...
Average ixscan execution time (milliseconds): 4907.85
Median ixscan execution time (milliseconds): 4758
Testing collscan...
Average collscan execution time (milliseconds): 2735.45
Median collscan execution time (milliseconds): 2727
 
$ mongo time-ixscan-vs-collscan.js
Testing ixscan...
Average ixscan execution time (milliseconds): 5386.5
Median ixscan execution time (milliseconds): 5422
Testing collscan...
Average collscan execution time (milliseconds): 2937.5
Median collscan execution time (milliseconds): 2966

Here is a sample of the output of perf report --stdio -g graph,1,caller:

# Total Lost Samples: 0
#
# Samples: 325K of event 'cycles'
# Event count (approx.): 221950040783
#
# Children      Self  Command          Shared Object       Symbol                                                                                         
# ........  ........  ...............  ..................  .................................................................................................................................................................................................................................................................................................................................................................
#
    99.82%     0.00%  conn3            mongod              [.] std::_Function_handler<void (std::shared_ptr<mongo::transport::Session> const&), mongo::ServiceEntryPointImpl::startSession(std::shared_ptr<mongo::transport::Session>)::{lambda(std::shared_ptr<mongo::transport::Session> const&)#1}>::_M_invoke(std::_Any_data const&, std::shared_ptr<mongo::transport::Session> const&)                                                                                                                                                                                                  
                      |
                      ---std::_Function_handler<void (std::shared_ptr<mongo::transport::Session> const&), mongo::ServiceEntryPointImpl::startSession(std::shared_ptr<mongo::transport::Session>)::{lambda(std::shared_ptr<mongo::transport::Session> const&)#1}>::_M_invoke(std::_Any_data const&, std::shared_ptr<mongo::transport::Session> const&)
                         |          
                         |--99.82%-- mongo::ServiceEntryPointImpl::_sessionLoop(std::shared_ptr<mongo::transport::Session> const&)
                         |          |          
                         |          |--99.82%-- mongo::ServiceEntryPointMongod::handleRequest(mongo::OperationContext*, mongo::Message const&, mongo::HostAndPort const&)
                         |          |          mongo::assembleResponse(mongo::OperationContext*, mongo::Message const&, mongo::HostAndPort const&)
                         |          |          |          
                         |          |          |--99.82%-- mongo::runCommands(mongo::OperationContext*, mongo::rpc::RequestInterface const&, mongo::rpc::ReplyBuilderInterface*)
                         |          |          |          mongo::Command::execCommand(mongo::OperationContext*, mongo::Command*, mongo::rpc::RequestInterface const&, mongo::rpc::ReplyBuilderInterface*)
                         |          |          |          mongo::execCommandDatabase(mongo::OperationContext*, mongo::Command*, mongo::rpc::RequestInterface const&, mongo::rpc::ReplyBuilderInterface*)
                         |          |          |          mongo::Command::run(mongo::OperationContext*, mongo::rpc::RequestInterface const&, mongo::rpc::ReplyBuilderInterface*)
                         |          |          |          mongo::(anonymous namespace)::CmdExplain::run(mongo::OperationContext*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, mongo::BSONObj&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&, mongo::BSONObjBuilder&)
                         |          |          |          mongo::FindCmd::explain(mongo::OperationContext*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, mongo::BSONObj const&, mongo::ExplainOptions::Verbosity, mongo::BSONObjBuilder*) const
                         |          |          |          |          
                         |          |          |          |--99.81%-- mongo::Explain::explainStages(mongo::PlanExecutor*, mongo::Collection const*, mongo::ExplainOptions::Verbosity, mongo::BSONObjBuilder*)
                         |          |          |          |          |          
                         |          |          |          |          |--99.79%-- mongo::PlanExecutor::executePlan()
                         |          |          |          |          |          |          
                         |          |          |          |          |          |--99.70%-- mongo::PlanExecutor::getNext(mongo::BSONObj*, mongo::RecordId*)
                         |          |          |          |          |          |          |          
                         |          |          |          |          |          |          |--81.07%-- mongo::PlanExecutor::getNextImpl(mongo::Snapshotted<mongo::BSONObj>*, mongo::RecordId*)
                         |          |          |          |          |          |          |          |          
                         |          |          |          |          |          |          |          |--72.70%-- mongo::PlanStage::work(unsigned long*)
                         |          |          |          |          |          |          |          |          |          
                         |          |          |          |          |          |          |          |          |--71.95%-- mongo::ProjectionStage::doWork(unsigned long*)
                         |          |          |          |          |          |          |          |          |          |          
                         |          |          |          |          |          |          |          |          |          |--43.54%-- mongo::PlanStage::work(unsigned long*)
                         |          |          |          |          |          |          |          |          |          |          |          
                         |          |          |          |          |          |          |          |          |          |          |--42.40%-- mongo::IndexScan::doWork(unsigned long*)
                         |          |          |          |          |          |          |          |          |          |          |          |          
                         |          |          |          |          |          |          |          |          |          |          |          |--27.70%-- mongo::(anonymous namespace)::WiredTigerIndexCursorBase::next(mongo::SortedDataInterface::Cursor::RequestedInfo)
                         |          |          |          |          |          |          |          |          |          |          |          |          |          
                         |          |          |          |          |          |          |          |          |          |          |          |          |--10.33%-- mongo::KeyString::toBson(char const*, unsigned long, mongo::Ordering, mongo::KeyString::TypeBits const&)
                         |          |          |          |          |          |          |          |          |          |          |          |          |          |          
                         |          |          |          |          |          |          |          |          |          |          |          |          |          |--5.70%-- mongo::mongoMalloc(unsigned long)

I visualized the data with a handy flame graph:. Here is a screenshot of the graph zoomed into the interesting part:
flame-graph-screenshot.png

I think there's a significant amount of work being done in KeyString::toBson() that might be able to be optimized. I can't see much else fat to trim. I have spoken about this casually with geert.bosch, and we briefly talked about allowing WorkingSetMembers to do direct comparisons/computation on key strings, rather than unconditionally converting it into BSON. However, this would require a storage engine API change and it's not clear that it would win us much.

(Sorry if any attachments look weird. I can re-upload if they're not rendering well.)

Comment by Mathias Stearn [ 12/Apr/16 ]

asya Would you mind reposting the profile as generated by perf report -g graph,1,caller? The top-down report is generally more useful than the bottom-up one.

Comment by Asya Kamsky [ 12/Apr/16 ]

Here's profiling for index only scan (file ending 15.51.00) and for full collection scan (ending 16.35.16).

Index-only scan took 111 seconds, full collection scan took 87 seconds.

Since there doesn't seem to be a smoking gun for any particular single thing dominating time taken maybe this is a case where we have to consider some other possible optimization for this sort of group (the query was grouping on an indexed fiend getting count for each value, emulating the same via distinct command followed by count for each returned value takes 24 seconds).

Comment by Asya Kamsky [ 29/Mar/16 ]

note that I can make covered index and collection scan closer in performance time by making each document significantly bigger, but covered index scan is never faster than collection scan, unless it's significantly filtering down the total number of documents).

Comment by Asya Kamsky [ 29/Mar/16 ]

Here's code that generates data and runs the queries, alter the counter in loop to increase size which will show the performance gap bigger (especially once collection is way bigger than RAM, it's still faster even though index is way smaller).

var batch=[]; 
db.repro.drop();
for (i=1;i<1000001; i++) { 
  var doc={devtype:(i%10), speed: (i%10*1000), othertype:0, size:20};
  batch.push(doc); 
  if (i%1000==0) { db.repro.insert(batch); batch=[]; }
}
db.repro.ensureIndex({devtype:1,speed:1});
/* I run the aggregations like this: index ones first since index should be in RAM having just been built.*/
o1=new Date(); db.repro.aggregate({$match:{devtype:{$gte:0,$lt:20}}}, {$group:{_id:"$devtype",speed:{$sum:"$speed"},count:{$sum:1}}}).toArray() ; o2=new Date(); print((o2-o1)/1000);
o1=new Date(); db.repro.aggregate({$match:{devtype:{$gte:0,$lt:20}}}, {$group:{_id:"$devtype",speed:{$sum:"$speed"},count:{$sum:1}}}).toArray() ; o2=new Date(); print((o2-o1)/1000);
o1=new Date(); db.repro.aggregate({$group:{_id:"$devtype",speed:{$sum:"$speed"},count:{$sum:1}}}).toArray() ; o2=new Date(); print((o2-o1)/1000);
o1=new Date(); db.repro.aggregate({$group:{_id:"$devtype",speed:{$sum:"$speed"},count:{$sum:1}}}).toArray() ; o2=new Date(); print((o2-o1)/1000);
o1=new Date(); db.repro.aggregate({$match:{othertype:{$gte:0,$lt:20}}},{$group:{_id:"$devtype",speed:{$sum:"$speed"},count:{$sum:1}}}).toArray() ; o2=new Date(); print((o2-o1)/1000);
o1=new Date(); db.repro.aggregate({$match:{othertype:{$gte:0,$lt:20}}},,{$group:{_id:"$devtype",speed:{$sum:"$speed"},count:{$sum:1}}}).toArray() ; o2=new Date(); print((o2-o1)/1000);

I'm counting only second of two runs for each:

covered index: 1.09
collection scan: 0.881
unindexed non-selective filter collection scan: 0.861

In fact, I can only get indexed covered to be faster if I'm querying for 1/3rd less data than unindexed unfiltered equivalent.

o1=new Date(); db.repro.aggregate({$match:{othertype:0}}, {$group:{_id:"$devtype",speed:{$sum:"$speed"},count:{$sum:1}}}).toArray() ; o2=new Date(); print((o2-o1)/1000);
0.965
o1=new Date(); db.repro.aggregate({$match:{devtype:{$gte:0,$lt:7}}}, {$group:{_id:"$devtype",speed:{$sum:"$speed"},count:{$sum:1}}}).toArray() ; o2=new Date(); print((o2-o1)/1000);
0.843

Comment by Mathias Stearn [ 29/Mar/16 ]

It isn't possible to use the same technique we used for collection scans for index scans. We were able to avoid copying out during collection scans because the documents have the same format in Mongo and in the WT cache. For indexes, we convert the keys from BSON to KeyString when writing and back to BSON on reads, so we can't just use the view into WT cache.

Comment by Asya Kamsky [ 29/Mar/16 ]

let me look at how to generate the data - I have scripts that reproduce, but the data was generated manually - would generating script be more useful or dump of collection itself?

Comment by Geert Bosch [ 29/Mar/16 ]

Could you attach a JS test that reproduces this issue?

Generated at Thu Feb 08 04:03:17 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.