Shuffle TPC-H dataset used in tests

XMLWordPrintableJSON

    • Type: Task
    • Resolution: Won't Do
    • Priority: Major - P3
    • None
    • Affects Version/s: None
    • Component/s: None
    • None
    • Query Optimization
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      Sequential scan sampling CE is used in test suites for its determinism. However, it is highly susceptible to partially sorted data (see SERVER-123925), which can skew results.

      Randomly shuffle the TPC-H datasets before running tests, then update the test suites to use the shuffled datasets. This combines the benefits of both sampling approaches: the randomness of random CE and the predictability of sequential scan sampling CE.

      Ad a result, TPC-H-based tests pass consistently with shuffled input data and show no regression from the partial-sort sensitivity issue.

            Assignee:
            Unassigned
            Reporter:
            Alexander Ignatyev
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

              Created:
              Updated:
              Resolved: