[SERVER-32383] Find root cause of short-request perf regression in MongoDB 3.6 Created: 18/Dec/17  Updated: 06/Dec/22  Resolved: 21/Dec/21

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: 3.6.0
Fix Version/s: None

Type: Task Priority: Minor - P4
Reporter: Andy Schwerin Assignee: Backlog - Service Architecture
Resolution: Won't Do Votes: 2
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
Assigned Teams:
Service Arch
Participants:
Case:

 Description   

3.6 seems to take a few microseconds longer for short requests than 3.4. This can be seen by converting the single-threaded, standalone microbenchmark throughputs to latencies and comparing 3.6 and 3.4. When viewed this way, results are consistent between WT and mmap, and across many tests. The average regression is:

InsertOne (OP_MSG command): 9.2us (13%)
UpdateOne (OP_MSG command): 14.6us (15%)
FindOne (legacy OP_QUERY non-command): 7.6us (11%)

The insert regression has also been reproduced using fire-and-forget legacy OP_INSERTS. It shows ~20% drop in throughput.

Some time has been poured into investigating insert in particular. Unfortunately other than a small increase in command parsing time (which has an easy fix that only helped top-level numbers a little bit) the profiles look VERY similar between 3.4 and 3.6. When looking at profiles bear in mind that linux perf moved where it assigned the costs of the networking send and recv calls. On my machine when I did the math to add them back into 3.4, the results were identical.

The task here is to find the root cause of this performance regression, and identify possible mitigations.


Generated at Thu Feb 08 04:30:05 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.