Find root cause of short-request perf regression in MongoDB 3.6

XMLWordPrintableJSON

    • Type: Task
    • Resolution: Won't Do
    • Priority: Minor - P4
    • None
    • Affects Version/s: 3.6.0
    • Component/s: None
    • None
    • Service Arch
    • None
    • 3
    • None
    • None
    • None
    • None
    • None
    • None

      3.6 seems to take a few microseconds longer for short requests than 3.4. This can be seen by converting the single-threaded, standalone microbenchmark throughputs to latencies and comparing 3.6 and 3.4. When viewed this way, results are consistent between WT and mmap, and across many tests. The average regression is:

      InsertOne (OP_MSG command): 9.2us (13%)
      UpdateOne (OP_MSG command): 14.6us (15%)
      FindOne (legacy OP_QUERY non-command): 7.6us (11%)

      The insert regression has also been reproduced using fire-and-forget legacy OP_INSERTS. It shows ~20% drop in throughput.

      Some time has been poured into investigating insert in particular. Unfortunately other than a small increase in command parsing time (which has an easy fix that only helped top-level numbers a little bit) the profiles look VERY similar between 3.4 and 3.6. When looking at profiles bear in mind that linux perf moved where it assigned the costs of the networking send and recv calls. On my machine when I did the math to add them back into 3.4, the results were identical.

      The task here is to find the root cause of this performance regression, and identify possible mitigations.

            Assignee:
            [DO NOT USE] Backlog - Service Architecture
            Reporter:
            Andy Schwerin
            Votes:
            2 Vote for this issue
            Watchers:
            16 Start watching this issue

              Created:
              Updated:
              Resolved: