|
Here it is: time-ixscan-vs-collscan.js
It doesn't populate any data – I did that separately as I wanted to run this script many times. The data I used for the profiling was pretty simple:
|
populate-ixscan-vs-collscan.js
|
Random.setRandomSeed();
|
let bulk = db.getCollection("ixscanVsCollscan").initializeUnorderedBullkOp();
|
for (let i = 0; i < 3000000; i++) {
|
bulk.insert({_id: i, data: Random.rand()});
|
}
|
assert.writeOK(bulk.execute());
|
If you prefer copying and pasting over downloading files:
|
time-ixscan-vs-collscan.js
|
"use strict";
|
|
const trials = 20;
|
const query = {};
|
const proj = {_id: 0, data: 1};
|
const collscan = {$natural: 1};
|
const ixscan = {data: 1};
|
|
let coll = db.getCollection("ixscanVsCollscan");
|
|
print("Creating indexes...");
|
coll.createIndex({data: 1});
|
|
print("Testing ixscan...");
|
let ixscanTrials = [];
|
for (let i = 0; i < trials; i++) {
|
let data = coll.explain("executionStats").find(query, proj).hint(ixscan).next();
|
ixscanTrials.push(data.executionStats.executionTimeMillis);
|
}
|
ixscanTrials.sort();
|
const ixscanAverage = ixscanTrials.reduce((acc, val) => acc + val) / trials;
|
const ixscanMedian = ixscanTrials[trials / 2];
|
print("Average ixscan execution time (milliseconds): " + ixscanAverage);
|
print("Median ixscan execution time (milliseconds): " + ixscanMedian);
|
|
print("Testing collscan...");
|
let collscanTrials = [];
|
for (let i = 0; i < trials; i++) {
|
let data = coll.explain("executionStats").find(query, proj).hint(collscan).next();
|
collscanTrials.push(data.executionStats.executionTimeMillis);
|
}
|
collscanTrials.sort();
|
const collscanMedian = collscanTrials[trials / 2];
|
const collscanAverage = collscanTrials.reduce((acc, val) => acc + val) / trials;
|
print("Average collscan execution time (milliseconds): " + collscanAverage);
|
print("Median collscan execution time (milliseconds): " + collscanMedian);
|
Might want to discard the first trial of each for timing purposes to let the data move into cache.
|
|
kyle.suarez, could you attach your time-ixscan-vs-collscan.js? The main issue is that we should do matching directly on KeyString values, rather than converting KeyString to BSON and then doing woCompare on that. All the extra copying/allocating memory is what really matters here: no amount of optimization of toBSON (which is already quite well optimized) will help here.
|
|
I've done a fresh analysis on a recent version of 3.5.x as a part of my work on SERVER-20066. I went for a workload that makes the WiredTiger collection scan shine in comparison to a covered index scan: 3 million small documents (two float fields), all in memory, with an empty query and a projection.
Some timing analysis shows that the ixscan execution time is roughly 1.7x slower than the collection scan. Each invocation of the script calculates an average of 20 trials:
$ mongo code/time-ixscan-vs-collscan.js
|
Testing ixscan...
|
Average ixscan execution time (milliseconds): 5118.95
|
Median ixscan execution time (milliseconds): 5141
|
Testing collscan...
|
Average collscan execution time (milliseconds): 2864.75
|
Median collscan execution time (milliseconds): 2902
|
|
$ mongo time-ixscan-vs-collscan.js
|
Testing ixscan...
|
Average ixscan execution time (milliseconds): 4907.85
|
Median ixscan execution time (milliseconds): 4758
|
Testing collscan...
|
Average collscan execution time (milliseconds): 2735.45
|
Median collscan execution time (milliseconds): 2727
|
|
$ mongo time-ixscan-vs-collscan.js
|
Testing ixscan...
|
Average ixscan execution time (milliseconds): 5386.5
|
Median ixscan execution time (milliseconds): 5422
|
Testing collscan...
|
Average collscan execution time (milliseconds): 2937.5
|
Median collscan execution time (milliseconds): 2966
|
Here is a sample of the output of perf report --stdio -g graph,1,caller:
# Total Lost Samples: 0
|
#
|
# Samples: 325K of event 'cycles'
|
# Event count (approx.): 221950040783
|
#
|
# Children Self Command Shared Object Symbol
|
# ........ ........ ............... .................. .................................................................................................................................................................................................................................................................................................................................................................
|
#
|
99.82% 0.00% conn3 mongod [.] std::_Function_handler<void (std::shared_ptr<mongo::transport::Session> const&), mongo::ServiceEntryPointImpl::startSession(std::shared_ptr<mongo::transport::Session>)::{lambda(std::shared_ptr<mongo::transport::Session> const&)#1}>::_M_invoke(std::_Any_data const&, std::shared_ptr<mongo::transport::Session> const&)
|
|
|
---std::_Function_handler<void (std::shared_ptr<mongo::transport::Session> const&), mongo::ServiceEntryPointImpl::startSession(std::shared_ptr<mongo::transport::Session>)::{lambda(std::shared_ptr<mongo::transport::Session> const&)#1}>::_M_invoke(std::_Any_data const&, std::shared_ptr<mongo::transport::Session> const&)
|
|
|
|--99.82%-- mongo::ServiceEntryPointImpl::_sessionLoop(std::shared_ptr<mongo::transport::Session> const&)
|
| |
|
| |--99.82%-- mongo::ServiceEntryPointMongod::handleRequest(mongo::OperationContext*, mongo::Message const&, mongo::HostAndPort const&)
|
| | mongo::assembleResponse(mongo::OperationContext*, mongo::Message const&, mongo::HostAndPort const&)
|
| | |
|
| | |--99.82%-- mongo::runCommands(mongo::OperationContext*, mongo::rpc::RequestInterface const&, mongo::rpc::ReplyBuilderInterface*)
|
| | | mongo::Command::execCommand(mongo::OperationContext*, mongo::Command*, mongo::rpc::RequestInterface const&, mongo::rpc::ReplyBuilderInterface*)
|
| | | mongo::execCommandDatabase(mongo::OperationContext*, mongo::Command*, mongo::rpc::RequestInterface const&, mongo::rpc::ReplyBuilderInterface*)
|
| | | mongo::Command::run(mongo::OperationContext*, mongo::rpc::RequestInterface const&, mongo::rpc::ReplyBuilderInterface*)
|
| | | mongo::(anonymous namespace)::CmdExplain::run(mongo::OperationContext*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, mongo::BSONObj&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&, mongo::BSONObjBuilder&)
|
| | | mongo::FindCmd::explain(mongo::OperationContext*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, mongo::BSONObj const&, mongo::ExplainOptions::Verbosity, mongo::BSONObjBuilder*) const
|
| | | |
|
| | | |--99.81%-- mongo::Explain::explainStages(mongo::PlanExecutor*, mongo::Collection const*, mongo::ExplainOptions::Verbosity, mongo::BSONObjBuilder*)
|
| | | | |
|
| | | | |--99.79%-- mongo::PlanExecutor::executePlan()
|
| | | | | |
|
| | | | | |--99.70%-- mongo::PlanExecutor::getNext(mongo::BSONObj*, mongo::RecordId*)
|
| | | | | | |
|
| | | | | | |--81.07%-- mongo::PlanExecutor::getNextImpl(mongo::Snapshotted<mongo::BSONObj>*, mongo::RecordId*)
|
| | | | | | | |
|
| | | | | | | |--72.70%-- mongo::PlanStage::work(unsigned long*)
|
| | | | | | | | |
|
| | | | | | | | |--71.95%-- mongo::ProjectionStage::doWork(unsigned long*)
|
| | | | | | | | | |
|
| | | | | | | | | |--43.54%-- mongo::PlanStage::work(unsigned long*)
|
| | | | | | | | | | |
|
| | | | | | | | | | |--42.40%-- mongo::IndexScan::doWork(unsigned long*)
|
| | | | | | | | | | | |
|
| | | | | | | | | | | |--27.70%-- mongo::(anonymous namespace)::WiredTigerIndexCursorBase::next(mongo::SortedDataInterface::Cursor::RequestedInfo)
|
| | | | | | | | | | | | |
|
| | | | | | | | | | | | |--10.33%-- mongo::KeyString::toBson(char const*, unsigned long, mongo::Ordering, mongo::KeyString::TypeBits const&)
|
| | | | | | | | | | | | | |
|
| | | | | | | | | | | | | |--5.70%-- mongo::mongoMalloc(unsigned long)
|
I visualized the data with a handy flame graph: . Here is a screenshot of the graph zoomed into the interesting part:
flame-graph-screenshot.png
I think there's a significant amount of work being done in KeyString::toBson() that might be able to be optimized. I can't see much else fat to trim. I have spoken about this casually with geert.bosch, and we briefly talked about allowing WorkingSetMembers to do direct comparisons/computation on key strings, rather than unconditionally converting it into BSON. However, this would require a storage engine API change and it's not clear that it would win us much.
(Sorry if any attachments look weird. I can re-upload if they're not rendering well.)
|
|
asya Would you mind reposting the profile as generated by perf report -g graph,1,caller? The top-down report is generally more useful than the bottom-up one.
|
|
Here's profiling for index only scan (file ending 15.51.00) and for full collection scan (ending 16.35.16).
Index-only scan took 111 seconds, full collection scan took 87 seconds.
Since there doesn't seem to be a smoking gun for any particular single thing dominating time taken maybe this is a case where we have to consider some other possible optimization for this sort of group (the query was grouping on an indexed fiend getting count for each value, emulating the same via distinct command followed by count for each returned value takes 24 seconds).
|
|
note that I can make covered index and collection scan closer in performance time by making each document significantly bigger, but covered index scan is never faster than collection scan, unless it's significantly filtering down the total number of documents).
|
|
Here's code that generates data and runs the queries, alter the counter in loop to increase size which will show the performance gap bigger (especially once collection is way bigger than RAM, it's still faster even though index is way smaller).
var batch=[];
|
db.repro.drop();
|
for (i=1;i<1000001; i++) {
|
var doc={devtype:(i%10), speed: (i%10*1000), othertype:0, size:20};
|
batch.push(doc);
|
if (i%1000==0) { db.repro.insert(batch); batch=[]; }
|
}
|
db.repro.ensureIndex({devtype:1,speed:1});
|
/* I run the aggregations like this: index ones first since index should be in RAM having just been built.*/
|
o1=new Date(); db.repro.aggregate({$match:{devtype:{$gte:0,$lt:20}}}, {$group:{_id:"$devtype",speed:{$sum:"$speed"},count:{$sum:1}}}).toArray() ; o2=new Date(); print((o2-o1)/1000);
|
o1=new Date(); db.repro.aggregate({$match:{devtype:{$gte:0,$lt:20}}}, {$group:{_id:"$devtype",speed:{$sum:"$speed"},count:{$sum:1}}}).toArray() ; o2=new Date(); print((o2-o1)/1000);
|
o1=new Date(); db.repro.aggregate({$group:{_id:"$devtype",speed:{$sum:"$speed"},count:{$sum:1}}}).toArray() ; o2=new Date(); print((o2-o1)/1000);
|
o1=new Date(); db.repro.aggregate({$group:{_id:"$devtype",speed:{$sum:"$speed"},count:{$sum:1}}}).toArray() ; o2=new Date(); print((o2-o1)/1000);
|
o1=new Date(); db.repro.aggregate({$match:{othertype:{$gte:0,$lt:20}}},{$group:{_id:"$devtype",speed:{$sum:"$speed"},count:{$sum:1}}}).toArray() ; o2=new Date(); print((o2-o1)/1000);
|
o1=new Date(); db.repro.aggregate({$match:{othertype:{$gte:0,$lt:20}}},,{$group:{_id:"$devtype",speed:{$sum:"$speed"},count:{$sum:1}}}).toArray() ; o2=new Date(); print((o2-o1)/1000);
|
I'm counting only second of two runs for each:
covered index: 1.09
collection scan: 0.881
unindexed non-selective filter collection scan: 0.861
In fact, I can only get indexed covered to be faster if I'm querying for 1/3rd less data than unindexed unfiltered equivalent.
o1=new Date(); db.repro.aggregate({$match:{othertype:0}}, {$group:{_id:"$devtype",speed:{$sum:"$speed"},count:{$sum:1}}}).toArray() ; o2=new Date(); print((o2-o1)/1000);
|
0.965
|
o1=new Date(); db.repro.aggregate({$match:{devtype:{$gte:0,$lt:7}}}, {$group:{_id:"$devtype",speed:{$sum:"$speed"},count:{$sum:1}}}).toArray() ; o2=new Date(); print((o2-o1)/1000);
|
0.843
|
|
|
It isn't possible to use the same technique we used for collection scans for index scans. We were able to avoid copying out during collection scans because the documents have the same format in Mongo and in the WT cache. For indexes, we convert the keys from BSON to KeyString when writing and back to BSON on reads, so we can't just use the view into WT cache.
|
|
let me look at how to generate the data - I have scripts that reproduce, but the data was generated manually - would generating script be more useful or dump of collection itself?
|
|
Could you attach a JS test that reproduces this issue?
|
Generated at Thu Feb 08 04:03:17 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.