[SERVER-39800] Investigate the new oplog format impact in linkbench performance tests Created: 25/Feb/19 Updated: 23/Apr/19 Resolved: 23/Apr/19 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Replication |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Task | Priority: | Major - P3 |
| Reporter: | Gregory McKeon (Inactive) | Assignee: | William Schultz (Inactive) |
| Resolution: | Done | Votes: | 0 |
| Labels: | bigtxns_perf | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||
| Sprint: | Repl 2019-03-25, Repl 2019-04-08, Repl 2019-04-22, Repl 2019-05-06 | ||||||||||||
| Participants: | |||||||||||||
| Description |
|
For the performance testing, we need:
|
| Comments |
| Comment by Siyuan Zhou [ 23/Apr/19 ] |
|
william.schultz, I updated the ticket to reflect the real work we've done here. We won't switch the format earlier than supporting large transactions by default. |
| Comment by William Schultz (Inactive) [ 20/Mar/19 ] |
|
siyuan.zhou To answer your above questions:
No, as I understand it, there should not be any write conflicts during this phase. The link loading phase does occur in parallel, across many threads, but each thread is given a subset of the node id space, and these subsets should be disjoint. Each thread will generate links for the node ids it received, and those links (edges) should go from the given node id to some other node. In other words, only a single thread will ever generate outgoing links for a particular node, and the (from, to) node ids of an edge is embedded in the document's _id in the linktable. So I do not think there should be any potential for write conflicts. You can see the data schema for the link table here.
If there were retries on conflicts, yes, the retries should be included in the latency calculation, since the retries happen automatically, inside the link store implementation.
Yeah, it's a bit odd, but it's just an artifact of the way linkbench is currently written. You can see here that it starts up all loader threads (e.g. 20 link loaders + 1 node loader) at the same time and waits for them all to complete. It measures the length of that process and reports it as the total load phase duration. To accurately measure just the time taken for link loading, we would likely have to force node loading to happen separately (before?) the link loading phase even begins. Technically, I'm not sure if the load phase is traditionally meant to be used as a real benchmark, but I'm not sure. |
| Comment by William Schultz (Inactive) [ 20/Mar/19 ] |
|
To double check my initial results, I ran 3 additional patch builds with the new oplog entry format against 1 node replica sets. The original patch build I referenced above had some extra log messages I added that were being printed out on certain transaction events. I ran new patch builds without these changes just to make sure the results weren't tainted by those extra log messages. The patch builds are here: The regressions in the load phase still appear. To expand on the above analysis, it does seem like throughput is affected even for the operation types that only do 1-2 operations per transaction. If we look at the first patch build, for the ADD_LINK metric, we can see a small regression. Confusingly, these request phase statistics are reported correctly as throughput (ops/sec). Thus, a decrease indicates things got slower. The ADD_LINK throughput number on that patch is reported as 253 ops/sec. One decent way to get a sense of the overall comparison between the patch build run and previous commits is to select many other prior commits and click "Compare", on the trend graph. This will show the performance change in percentage of the currently selected commit (the one you are hovered over) against a bunch of other commits. If we compare the patch build change against numerous prior commits, the regression appears to be, on average, around 7%. I could post all the numbers here to be more precise but I feel it is easier to take advantage of the existing performance visualization tools we have on the Evergreen page. In summary, though, it does appear that there might be a bit of a regression even for small transactions. |
| Comment by Siyuan Zhou [ 12/Mar/19 ] |
|
william.schultz, the goal of this ticket is to compare the performance of both formats. I think both small transactions and large but less than 16MB transactions are needed to support our decision. LinkBench represents small transactions well. Good point on secondary application perf test! I care more about the end-to-end performance though, since I suspect the overhead of the new format isn't critical in the code path of w: majority writes. |
| Comment by William Schultz (Inactive) [ 11/Mar/19 ] |
|
siyuan.zhou Is an explicit goal of this ticket to enable the new transaction oplog format in Linkbench or is it to simply compare the performance of the new format against the old "applyOps" format? It seems that one hypothesis we would like to test is that secondary oplog application performance of the new transaction oplog format is worse than with the old format. It seems most reasonable, then, to test transactions that have ~16MB of data, but have a large number of operations, since we expect this is the case where the worst performance hits would occur. (e.g. as you said, a 16MB transaction with 100,000 operations). I am wondering if a more targeted performance test would be better to answer that question, since Linkbench seems more about trying to model a real world workload scenario. Perhaps modifying this secondary application performance test to run transactions would be useful for our purposes here. Measuring w:majority write throughput should also be a decent proxy for secondary application performance, but it feels better to measure it directly if possible. What are your thoughts? |