HASH_JOIN_EMBEDDING stage is costed as if almost free

XMLWordPrintableJSON

    • Type: Bug
    • Resolution: Unresolved
    • Priority: Major - P3
    • None
    • Affects Version/s: None
    • Component/s: None
    • None
    • Query Optimization
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      The cost of the HASH_JOIN_EMBEDDING is almost zero after the cost of the inputs is accounted for. Consider the following chain of hash joins that unfairly (both in terms of execution counters and wallclock execution time)  won against a fully-indexed INLJ plan:

       

      
      {
        stage: 'HASH_JOIN_EMBEDDING',
        costEstimate: 100.45231213378906,
        cardinalityEstimate: 960,
        inputStages: [
          {
            costEstimate: 0.0176922607421875,
          },
          {
            stage: 'HASH_JOIN_EMBEDDING',
            costEstimate: 100.24011987304688,
            cardinalityEstimate: 960,
            inputStages: [
              {
                stage: 'HASH_JOIN_EMBEDDING',
                costEstimate: 98.9783231201172,
                cardinalityEstimate: 960,
                inputStages: [
                  {
                    stage: 'COLLSCAN',
                    costEstimate: 17.611469482421874,
                  },
                  {
                    stage: 'COLLSCAN',
                    costEstimate: 73.24685363769532,
                    cardinalityEstimate: 80000,
                  }
                ]
              },
              {
                stage: 'COLLSCAN',
                costEstimate: 0.9697967529296876,
                cardinalityEstimate: 1000,
              }
            ]
          }
        ]
      } 

      For the inner-most HJ, we have

      • its own cost, excluding the cost of the inputs is: 98.97 - (17.61 + 73.24) = 8.12
      • cost per lookup = 8.12 / 80000 =  0.0001015

      For the middle HJ, we have:

      • own cost: 100.24 - (98.97 + 0.96) = 0.31
      • cost per lookup: 0.31 /  960 = 0.00032

      For the outermost HJ, we have:

      • own cost: 100.45 - (100.24 + 0.017) = 0.193
      • cost per lookup: 0.193 / 960 = 0.000201042

      To reproduce:

      const pipeline = 
      [
       {"$lookup":{"from":"partsupp","localField":"p_partkey","foreignField":"ps_partkey","as":"partsupp"}},
       {"$unwind":"$partsupp"},
       {"$lookup":{"from":"supplier","localField":"partsupp.ps_suppkey","foreignField":"s_suppkey","as":"supplier"}},
       {"$unwind":"$supplier"},
       {"$lookup":{"from":"nation","localField":"supplier.s_nationkey","foreignField":"n_nationkey","as":"nation_s"}},
       {"$unwind":"$nation_s"},
       {"$match":{"p_name":/^dra/}}];
       
       
      db.adminCommand({setParameter: 1, internalEnableJoinOptimization: true});db.adminCommand({setParameter: 1, internalJoinReorderMode: "random"});
      db.adminCommand({setParameter: 1, internalRandomJoinOrderSeed: 0});
      console.log(db.part.aggregate(pipeline).explain('executionStats').executionStats.totalKeysExamined + db.part.aggregate(pipeline).explain('executionStats').executionStats.totalDocsExamined);
      console.log(db.part.aggregate(pipeline).explain('executionStats').executionStats.executionTimeMillis);db.adminCommand({setParameter: 1, internalJoinReorderMode: "bottomUp"});
      db.part.aggregate(pipeline).explain('executionStats').executionStats.totalKeysExamined + db.part.aggregate(pipeline).explain('executionStats').executionStats.totalDocsExamined;
      db.part.aggregate(pipeline).explain('executionStats').executionStats.executionTimeMillis; 

            Assignee:
            Unassigned
            Reporter:
            Philip Stoev
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated: