Join optimization: join enumeration and plan selection should not use non-deterministic data structures

XMLWordPrintableJSON

    • Type: Bug
    • Resolution: Duplicate
    • Priority: Critical - P2
    • None
    • Affects Version/s: None
    • Component/s: None
    • None
    • Query Optimization
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      If there are two plans with an identical cost, one or the other may win randomly. I believe this is because of the use of maps and other datastructures that are randomized. Instead, for reproducibility, only ordered datastructures should be used where the ordering is preserved across restarts and compiles.

      This is important not only to obtain stable golden tests but in production as well – if the plans flip randomly on customers that would be a bad user experience.

      The particular example that I have is because the costs of multiple plans are zero due to an IXSCAN that is estimated to return zero rows. But even if this particular instance is fixed, the general issue remains.

       

      To reproduce:

       

      const pipeline = [
       {"$lookup":{"from":"partsupp","localField":"p_partkey","foreignField":"ps_partkey","pipeline":[
       {"$match":{"$or":[{"ps_supplycost":{"$lt":12.1}}]}},
       {"$match":{"$nor":[{"ps_availqty":{"$gt":4038}}]}}],"as":"partsupp"}},
       {"$unwind":"$partsupp"},
       {"$lookup":{"from":"lineitem","localField":"partsupp.ps_partkey","foreignField":"l_partkey","pipeline":[
       {"$match":{"$or":[{"l_partkey":17636}, {"l_orderkey":265507}, {"l_partkey":{$eq: 11711}}]}},
       {"$match":{"$nor":[{"l_commitdate":{"$lt": new Date("1993-03-17T00:00:00.000Z")}}]}}],"as":"lineitem"}},
       {"$unwind":"$lineitem"},
       {"$match":{"$and":[{"p_brand":{"$in":["Brand#15","Brand#25","Brand#53","Brand#44"]}},{"p_type":/^LARGE/}]}},{"$limit":1000}];
       
       
       
       db.part.aggregate(pipeline).explain().stages[0]["$cursor"].queryPlanner.winningPlan.queryPlan.inputStages[0].inputStages[0].inputStage;
       

      Across restarts, this will use a different index to satisfy the first input of the `OR` .

            Assignee:
            Alya Berciu
            Reporter:
            Philip Stoev
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

              Created:
              Updated:
              Resolved: