Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-103278

Fix explain output for sharded distinct

    • Type: Icon: Bug Bug
    • Resolution: Unresolved
    • Priority: Icon: Minor - P4 Minor - P4
    • None
    • Affects Version/s: None
    • None
    • Query Optimization
    • ALL
    • Hide

      Excerpt from jstests/core/query/distinct/distinct_index1.js:

      function getHash(num) {
         return Math.floor(Math.sqrt(num*123123)) %10;
      }
      
      function getDistinctExplainWithExecutionStats(field, query) {
         const explain=coll.explain("executionStats").distinct(field, query|| {});
         assert(explain.hasOwnProperty("executionStats"), explain);
         return explain;
      }
      
      const bulk = coll.initializeUnorderedBulkOp();
      for (let i = 0; i < 1000; i++) {
         bulk.insert({a: getHash(i*5), b: getHash(i)});
      }
      
      assert.commandWorked(bulk.execute());
      let explain = getDistinctExplainWithExecutionStats("a");
      // There are only 10 values. We use the fast distinct hack and only examine each value once.
      // ERROR: This returns 20 because it sums up the distinct values from both targeted shards, instead of performing an aggregate distinct!
      assert.eq(10, explain.executionStats.nReturned);
      
      Show
      Excerpt from jstests/core/query/distinct/distinct_index1.js : function getHash(num) {     return Math .floor( Math .sqrt(num*123123)) %10; } function getDistinctExplainWithExecutionStats(field, query) {     const explain=coll.explain( "executionStats" ).distinct(field, query|| {});     assert (explain.hasOwnProperty( "executionStats" ), explain);     return explain; } const bulk = coll.initializeUnorderedBulkOp(); for (let i = 0; i < 1000; i++) {    bulk.insert({a: getHash(i*5), b: getHash(i)}); } assert .commandWorked(bulk.execute()); let explain = getDistinctExplainWithExecutionStats( "a" ); // There are only 10 values. We use the fast distinct hack and only examine each value once. // ERROR: This returns 20 because it sums up the distinct values from both targeted shards, instead of performing an aggregate distinct! assert .eq(10, explain.executionStats.nReturned);
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      When a DISTINCT_SCAN is used for a distinct query on a sharded collection, mongos doesn't perform an aggregate "distinct" over the results returned from each targeted shard. Instead, it sums up the distinct values returned from each shard, giving us an incorrect overall nReturned result.

      Shard0: 10 distinct values on "a"
      Shard1: 10 distinct values on "a"

      This returns 20 instead of 10:

      const explain = coll.explain("executionStats").distinct("a");
      

            Assignee:
            Unassigned Unassigned
            Reporter:
            lynne.wang@mongodb.com Lynne Wang
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated: