-
Type:
Improvement
-
Resolution: Fixed
-
Priority:
Major - P3
-
Affects Version/s: None
-
Component/s: Sharding
-
Fully Compatible
-
Sharding EMEA 2022-10-31, Sharding EMEA 2022-11-14
-
3
-
None
-
None
-
None
-
None
-
None
-
None
-
None
Motivation: Performance Improvement.
Description: Two different designs:
- Create a new input parameter 'dataDistribution' on $collStats to retrieve all necessary data (count, avgObjSize and numOrphanDocuments) to run the $shardedDataDistribution and other uses. It will not retrieve unnecessary data.
- The idea is to analyze the entire pipeline and see (via existing analysis) that it only references a handful of paths, and if anything else will not impact the results, it can be optimized. For example, as long as the request with $collStats or $allCollectionStats also includes the $project, we could filter the information we need. Inspect whether the $collstats stage is followed by a $project, and from there determine what output the $collstats should produce.
Actual performance without optimizing $collStats (sharded collection - seconds):
[js_test:all_collection_stats] performance test $shardedDataDistribution: 1000 00:00:05 [js_test:all_collection_stats] performance test $shardedDataDistribution: 2000 00:00:10 [js_test:all_collection_stats] performance test $shardedDataDistribution: 3000 00:00:16 [js_test:all_collection_stats] performance test $shardedDataDistribution: 4000 00:00:23 [js_test:all_collection_stats] performance test $shardedDataDistribution: 5000 00:00:33
Performance test used:
(function() {
'use strict';
const numberOfCollections = 5000;
// Configure initial sharding cluster
const st = new ShardingTest({shards: 3});
const mongos = st.s;
const dbName = "test";
const db = mongos.getDB(dbName);
let iterator = 1000;
let total = 0;
while (total < numberOfCollections) {
// Insert data to validate the aggregation stage
for (let i = 0; i < iterator; i++) {
const coll = "coll" + total;
// assert.commandWorked(db.createCollection(coll));
assert(st.adminCommand({shardcollection: dbName + "." + coll, key: {skey: 1}}));
total++;
}
let it = 0;
const start = new Date();
const cursor = mongos.getDB("admin").aggregate([{$shardedDataDistribution: {}}]);
while (cursor.hasNext()) {
const data = cursor.next();
it++;
}
const end = new Date();
const time = new Date(end - start).toISOString().slice(11, 19);
print(`performance test $shardedDataDistribution: ` + total + ` ` + time);
assert.eq(it, total + 1);
}
st.stop();
})();
- depends on
-
SERVER-70859 Optimize collStats to not retrieve all data
-
- Closed
-
- is depended on by
-
SERVER-70859 Optimize collStats to not retrieve all data
-
- Closed
-