Bring the wtperf YCSB logic closer to MongoDB’s YCSB HVW implementation

XMLWordPrintableJSON

    • Type: Task
    • Resolution: Unresolved
    • Priority: Major - P3
    • None
    • Affects Version/s: None
    • Component/s: None
    • None
    • Storage Engines - Foundations
    • 659.031
    • None
    • None

      Currently, after completing WT-17484, we have YCSB benchmarks running in our regular CI performance testing. However, the current wtperf configurations for YCSB still has differences compared to the MongoDB ones:

      • MongoDB uses a scrambled Zipfian distribution, while wtperf uses a scrambled Pareto distribution. We should double check whether it actually cause a big difference in predicting performance changes
      • MongoDB uses the compressibility=3 parameter, which fills values with 33% random characters and 66% "a" characters, allowing the compressed value size to be predictably around 3x smaller than the original one.
        • We should not forget to increase the number of entries if we introduce compressibility, so that the total amount of disk space consumed remains similar.
      • MongoDB also has YCSB load and YCSB stepdown benchmarks, although I am not sure whether they should also be ported to WT.
      • We saw that in some configurations MongoDB average read latency in YCSB is around 500 times slower than in WT so it gives us a strong indication that we are missing some important bit that makes it so different.
        • However, we haven't had enough time to measure performance properly, so it's worth revisiting this and remeasuring performance for wtperf and mongoDB versions.

      We should either bring the existing YCSB variants closer to the MongoDB implementation where possible, or introduce new variants whose logic matches the current MongoDB YCSB behavior as closely as possible. Those variants should be marked as HVW-matching.

      Update: 

      This ticket has several linked tickets. All of them are different attempts to make wtperf YCSB configurations closer to their MongoDB counterparts. The remaining work is still described in the ticket description.

            Assignee:
            [DO NOT USE] Backlog - Storage Engines Team
            Reporter:
            Ivan Kochin
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

              Created:
              Updated: