Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-68855

Optimize $collStats for $shardedDataDistribution.

    • Fully Compatible
    • Sharding EMEA 2022-10-31, Sharding EMEA 2022-11-14
    • 3

      Motivation: Performance Improvement.

      Description: Two different designs:

      1. Create a new input parameter 'dataDistribution' on $collStats to retrieve all necessary data (count, avgObjSize and numOrphanDocuments) to run the $shardedDataDistribution and other uses. It will not retrieve unnecessary data.
      2. The idea is to analyze the entire pipeline and see (via existing analysis) that it only references a handful of paths, and if anything else will not impact the results, it can be optimized. For example, as long as the request with $collStats or $allCollectionStats also includes the $project, we could filter the information we need. Inspect whether the $collstats stage is followed by a $project, and from there determine what output the $collstats should produce. 

       

      Actual performance without optimizing $collStats (sharded collection - seconds):

      [js_test:all_collection_stats] performance test $shardedDataDistribution: 1000 00:00:05
      [js_test:all_collection_stats] performance test $shardedDataDistribution: 2000 00:00:10
      [js_test:all_collection_stats] performance test $shardedDataDistribution: 3000 00:00:16
      [js_test:all_collection_stats] performance test $shardedDataDistribution: 4000 00:00:23
      [js_test:all_collection_stats] performance test $shardedDataDistribution: 5000 00:00:33
      

       

      Performance test used:

      (function() {
      'use strict';
      
      const numberOfCollections = 5000;
      
      // Configure initial sharding cluster
      const st = new ShardingTest({shards: 3});
      const mongos = st.s;
      const dbName = "test";
      const db = mongos.getDB(dbName);
      
      let iterator = 1000;
      let total = 0;
      while (total < numberOfCollections) {
          // Insert data to validate the aggregation stage
          for (let i = 0; i < iterator; i++) {
              const coll = "coll" + total;
              // assert.commandWorked(db.createCollection(coll));
              assert(st.adminCommand({shardcollection: dbName + "." + coll, key: {skey: 1}}));
              total++;
          }
      
          let it = 0;
          const start = new Date();
          const cursor = mongos.getDB("admin").aggregate([{$shardedDataDistribution: {}}]);
          while (cursor.hasNext()) {
              const data = cursor.next();
              it++;
          }
          const end = new Date();
      
          const time = new Date(end - start).toISOString().slice(11, 19);
          print(`performance test $shardedDataDistribution: ` + total + ` ` + time);
      
          assert.eq(it, total + 1);
      }
      
      st.stop();
      })();
      

            Assignee:
            pol.pinol@mongodb.com Pol Pinol
            Reporter:
            pol.castuera@mongodb.com Pol Castuera (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            8 Start watching this issue

              Created:
              Updated:
              Resolved: