Details
-
Improvement
-
Resolution: Fixed
-
Major - P3
-
None
-
Fully Compatible
-
Sharding EMEA 2022-10-31, Sharding EMEA 2022-11-14
-
3
Description
Motivation: Performance Improvement.
Description: Two different designs:
- Create a new input parameter 'dataDistribution' on $collStats to retrieve all necessary data (count, avgObjSize and numOrphanDocuments) to run the $shardedDataDistribution and other uses. It will not retrieve unnecessary data.
- The idea is to analyze the entire pipeline and see (via existing analysis) that it only references a handful of paths, and if anything else will not impact the results, it can be optimized. For example, as long as the request with $collStats or $allCollectionStats also includes the $project, we could filter the information we need. Inspect whether the $collstats stage is followed by a $project, and from there determine what output the $collstats should produce.
Actual performance without optimizing $collStats (sharded collection - seconds):
[js_test:all_collection_stats] performance test $shardedDataDistribution: 1000 00:00:05 |
[js_test:all_collection_stats] performance test $shardedDataDistribution: 2000 00:00:10 |
[js_test:all_collection_stats] performance test $shardedDataDistribution: 3000 00:00:16 |
[js_test:all_collection_stats] performance test $shardedDataDistribution: 4000 00:00:23 |
[js_test:all_collection_stats] performance test $shardedDataDistribution: 5000 00:00:33 |
Performance test used:
(function() {
|
'use strict'; |
|
|
const numberOfCollections = 5000; |
|
|
// Configure initial sharding cluster
|
const st = new ShardingTest({shards: 3}); |
const mongos = st.s; |
const dbName = "test"; |
const db = mongos.getDB(dbName); |
|
|
let iterator = 1000; |
let total = 0; |
while (total < numberOfCollections) { |
// Insert data to validate the aggregation stage |
for (let i = 0; i < iterator; i++) { |
const coll = "coll" + total; |
// assert.commandWorked(db.createCollection(coll)); |
assert(st.adminCommand({shardcollection: dbName + "." + coll, key: {skey: 1}})); |
total++;
|
}
|
|
|
let it = 0; |
const start = new Date(); |
const cursor = mongos.getDB("admin").aggregate([{$shardedDataDistribution: {}}]); |
while (cursor.hasNext()) { |
const data = cursor.next(); |
it++;
|
}
|
const end = new Date(); |
|
|
const time = new Date(end - start).toISOString().slice(11, 19); |
print(`performance test $shardedDataDistribution: ` + total + ` ` + time);
|
|
|
assert.eq(it, total + 1); |
}
|
|
|
st.stop();
|
})();
|
Attachments
Issue Links
- depends on
-
SERVER-70859 Optimize collStats to not retrieve all data
-
- Closed
-
- is depended on by
-
SERVER-70859 Optimize collStats to not retrieve all data
-
- Closed
-