[SERVER-39800] Investigate the new oplog format impact in linkbench performance tests Created: 25/Feb/19  Updated: 23/Apr/19  Resolved: 23/Apr/19

Status: Closed
Project: Core Server
Component/s: Replication
Affects Version/s: None
Fix Version/s: None

Type: Task Priority: Major - P3
Reporter: Gregory McKeon (Inactive) Assignee: William Schultz (Inactive)
Resolution: Done Votes: 0
Labels: bigtxns_perf
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
is depended on by SERVER-39802 Write each applyOps in its own WUOW f... Closed
Related
Sprint: Repl 2019-03-25, Repl 2019-04-08, Repl 2019-04-22, Repl 2019-05-06
Participants:

 Description   

For the performance testing, we need:

  • Get the statistics to measure the correct ops/sec as Matthew Saltz before.
  • Change LinkBench to use the new oplog format by setting a server parameter.
  • Get the numbers from existing 1-node replset and 3-node replset tests in a patch build.
  • Update one workload to include 100,000 small updates in a transaction and test its performance with the old and new format.
  • If the performance drops, run LinkBench locally with gperftools enabled to get the profiling to pinpoint the slowdown.


 Comments   
Comment by Siyuan Zhou [ 23/Apr/19 ]

william.schultz, I updated the ticket to reflect the real work we've done here. We won't switch the format earlier than supporting large transactions by default.

Comment by William Schultz (Inactive) [ 20/Mar/19 ]

siyuan.zhou To answer your above questions:

1. Do we expect any write conflict during the link load phase? They are upsert, which seems to imply update is possible, meaning conflicts.

No, as I understand it, there should not be any write conflicts during this phase. The link loading phase does occur in parallel, across many threads, but each thread is given a subset of the node id space, and these subsets should be disjoint. Each thread will generate links for the node ids it received, and those links (edges) should go from the given node id to some other node. In other words, only a single thread will ever generate outgoing links for a particular node, and the (from, to) node ids of an edge is embedded in the document's _id in the linktable. So I do not think there should be any potential for write conflicts. You can see the data schema for the link table here.

2. If conflict is possible. Does the latency include any retry? If the latency of one transaction increases a little, retry will exaggerate the total finish time.

If there were retries on conflicts, yes, the retries should be included in the latency calculation, since the retries happen automatically, inside the link store implementation.

3. I'm wondering why "the overall link loading throughput numbers" includes node loading and whether there's a simple way to split them, since latency and throughput are related, but perhaps not inversely proportionally.

Yeah, it's a bit odd, but it's just an artifact of the way linkbench is currently written. You can see here that it starts up all loader threads (e.g. 20 link loaders + 1 node loader) at the same time and waits for them all to complete. It measures the length of that process and reports it as the total load phase duration. To accurately measure just the time taken for link loading, we would likely have to force node loading to happen separately (before?) the link loading phase even begins. Technically, I'm not sure if the load phase is traditionally meant to be used as a real benchmark, but I'm not sure.

Comment by William Schultz (Inactive) [ 20/Mar/19 ]

To double check my initial results, I ran 3 additional patch builds with the new oplog entry format against 1 node replica sets. The original patch build I referenced above had some extra log messages I added that were being printed out on certain transaction events. I ran new patch builds without these changes just to make sure the results weren't tainted by those extra log messages. The patch builds are here:

The regressions in the load phase still appear.

To expand on the above analysis, it does seem like throughput is affected even for the operation types that only do 1-2 operations per transaction. If we look at the first patch build, for the ADD_LINK metric, we can see a small regression. Confusingly, these request phase statistics are reported correctly as throughput (ops/sec). Thus, a decrease indicates things got slower. The ADD_LINK throughput number on that patch is reported as 253 ops/sec. One decent way to get a sense of the overall comparison between the patch build run and previous commits is to select many other prior commits and click "Compare", on the trend graph. This will show the performance change in percentage of the currently selected commit (the one you are hovered over) against a bunch of other commits. If we compare the patch build change against numerous prior commits, the regression appears to be, on average, around 7%. I could post all the numbers here to be more precise but I feel it is easier to take advantage of the existing performance visualization tools we have on the Evergreen page. In summary, though, it does appear that there might be a bit of a regression even for small transactions.

Comment by Siyuan Zhou [ 12/Mar/19 ]

william.schultz, the goal of this ticket is to compare the performance of both formats. I think both small transactions and large but less than 16MB transactions are needed to support our decision. LinkBench represents small transactions well.

Good point on secondary application perf test! I care more about the end-to-end performance though, since I suspect the overhead of the new format isn't critical in the code path of w: majority writes.

Comment by William Schultz (Inactive) [ 11/Mar/19 ]

siyuan.zhou Is an explicit goal of this ticket to enable the new transaction oplog format in Linkbench or is it to simply compare the performance of the new format against the old "applyOps" format? It seems that one hypothesis we would like to test is that secondary oplog application performance of the new transaction oplog format is worse than with the old format. It seems most reasonable, then, to test transactions that have ~16MB of data, but have a large number of operations, since we expect this is the case where the worst performance hits would occur. (e.g. as you said, a 16MB transaction with 100,000 operations). I am wondering if a more targeted performance test would be better to answer that question, since Linkbench seems more about trying to model a real world workload scenario. Perhaps modifying this secondary application performance test to run transactions would be useful for our purposes here. Measuring w:majority write throughput should also be a decent proxy for secondary application performance, but it feels better to measure it directly if possible. What are your thoughts?

Generated at Thu Feb 08 04:53:09 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.