[SERVER-50749] Re-loading is slow with py-tpcc Created: 03/Sep/20 Updated: 06/Dec/22 Resolved: 12/Nov/21 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | 4.4.0 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Mark Callaghan (Inactive) | Assignee: | Backlog - Storage Engines Team |
| Resolution: | Duplicate | Votes: | 0 |
| Labels: | KP44, dmd-escapes, perf-escapes | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Attachments: |
|
||||||||||||||||||||||||
| Issue Links: |
|
||||||||||||||||||||||||
| Assigned Teams: |
Storage Engines
|
||||||||||||||||||||||||
| Operating System: | ALL | ||||||||||||||||||||||||
| Backport Requested: |
v5.1
|
||||||||||||||||||||||||
| Sprint: | Storage - Ra 2021-09-20, Storage - Ra 2021-10-04 | ||||||||||||||||||||||||
| Participants: | |||||||||||||||||||||||||
| Description |
|
While loading data for py-tpcc flow control is engaged, the insert rate drops and a few inserts take 200 to 300 seconds. This is from Percona. They previously gave us the repro for Part of this is a duplicate of Summarizing what I see below in my repro attempts:
I have ftdc and mongod error logs for most of the results listed below. I can provide them if requested. There are many, so I prefer to do that on demand. |
| Comments |
| Comment by Sulabh Mahajan [ 12/Nov/21 ] | |||||||||||||||||||||||||||
|
I confirmed the problem goes away if we do not use the unique index. Actually current master performs better with tpcc load and reload than 4.2.17. Please see Given there is nothing new for this server ticket, I am going to mark it dup of | |||||||||||||||||||||||||||
| Comment by Sulabh Mahajan [ 12/Oct/21 ] | |||||||||||||||||||||||||||
|
Adding the comment describing the current state of the issue from Here is the summary of the current state of the things:
| |||||||||||||||||||||||||||
| Comment by Mark Callaghan (Inactive) [ 17/Sep/20 ] | |||||||||||||||||||||||||||
|
Repeating a comment from wt-6664. Percona, the reporter of this problem, now has a blog post. As reported above, the problem for them doesn't happen or is much less likely with a smaller WiredTiger cache. | |||||||||||||||||||||||||||
| Comment by Keith Smith [ 11/Sep/20 ] | |||||||||||||||||||||||||||
|
Based on mark.callaghan's comment that he's using a system with 60GB of RAM, I did a few more tests on a similar system using different size WT caches. It looks like this problem gets worse with larger cache sizes. See | |||||||||||||||||||||||||||
| Comment by Geert Bosch [ 10/Sep/20 ] | |||||||||||||||||||||||||||
|
Note that all of this worked before. The case with the cache stuck was not stuck due to flow control, but due to an issue with the history store. All time was spent in the history store, see the backtraces. That thread already was past the flow-control state (it had acquired the global lock) and should have run until the next interruption point, but never did. That's a WT issue. Now, I can see how in this case flow-control compounded a problem, and we can fix that, but it doesn't fix the underlying WT issue. Similarly, inserting 1 doc at a time wastes enough CPU in mongod that the load on the storage engine will drop, also avoiding the cache stuck issue. That doesn't seem a good solution in the long run. Why does mongod behave so much worse with 4.4 than with 4.2? | |||||||||||||||||||||||||||
| Comment by Keith Smith [ 10/Sep/20 ] | |||||||||||||||||||||||||||
|
bruce.lucas: Your assumption is correct. "flowControl isLagged" was zero throughout my run with a small batch size. Likewise, the WT cache was much happier. It never entered aggressive eviction mode, never used application threads for eviction, and rarely had to do in-memory restoration on evicted pages (1-2/sec).
| |||||||||||||||||||||||||||
| Comment by Bruce Lucas (Inactive) [ 10/Sep/20 ] | |||||||||||||||||||||||||||
|
keith.smith, the batch size is expected to affect both cache pinning and operation of flow control if it engages. I imagine that the beneficial effect in your test has been because of the former, and there will no longer be lag so flow control will no longer be engaging, but it may be worth checking that assumption in the ftdc by checking that the flow control isLagged metric remains 0. | |||||||||||||||||||||||||||
| Comment by Mark Callaghan (Inactive) [ 10/Sep/20 ] | |||||||||||||||||||||||||||
|
bruce.lucas - I use a c3.8xlarge host setup by DSI (16 cores, 60G of RAM, 20k EBS IOPs), benchmark client shared the host and benchmark client uses 10 connections (and 8 threads or processes) | |||||||||||||||||||||||||||
| Comment by Mark Callaghan (Inactive) [ 10/Sep/20 ] | |||||||||||||||||||||||||||
|
Added:
These include ftdc, mongod error log and test output. The directory structure is like: from "ls flow0.mo440/v1"
from "ls flow1.mo440/v2/r.w100.l10.pypyyes"
| |||||||||||||||||||||||||||
| Comment by Keith Smith [ 10/Sep/20 ] | |||||||||||||||||||||||||||
|
I reran my test with internalInsertMaxBatchSize=2. The maximum latency during the test dropped from 33.5 to 1.4 seconds. So I'll go out on a limb and assert that this parameter affects the performance I can experiment with other values, but I worry that the results would be dependent on workload and system. We'd find a value that works on a few tests here, but some customer will discover a workload where it blows up. Is there a principled way we can reason about the tradeoffs here? Does this have to be a constant, or would it make sense to dynamically adjust the batch size based on the state of the system? mark.callaghan: I'm already using pypy. | |||||||||||||||||||||||||||
| Comment by Bruce Lucas (Inactive) [ 10/Sep/20 ] | |||||||||||||||||||||||||||
|
Thanks Mark. That does look quite a bit different from the data Keith posted (although it may well be simply a more extreme manifestation of the same underlying problems).
But given the hang that appears to create unrecoverable cache pressure, I'm not sure that the details of the behavior of flow control or the latencies aside from the hang in this particular test are too interesting as the hang is by far the worse problem and can be expected to cause the other problems. In addition to the python difference you noted, your data shows a 4x larger server running mongod than Keith's data. | |||||||||||||||||||||||||||
| Comment by Mark Callaghan (Inactive) [ 10/Sep/20 ] | |||||||||||||||||||||||||||
|
bruce.lucas Added the missing ftdc and error log files I mentioned here keith.smith - please use pypy rather than python to run py-tpcc as that has much lower CPU overhead leading to more pressure on MongoDB Next up is ftdc with and without flow control enabled | |||||||||||||||||||||||||||
| Comment by Bruce Lucas (Inactive) [ 10/Sep/20 ] | |||||||||||||||||||||||||||
|
This example seems representative:
In addition to helping in the way that Eric describes, decreasing the batch size will increase the rate of lock acquisition so should make flow control more effective if it does engage; tbd whether this is a good thing or not. Another question is if flow control is not doing a lot, why is it having such a large impact on maximum latency? I seem to recall some concern about whether the o/s mechanism underlying flow control tickets is fair, or may even be intentionally unfair to favor throughput over latency. If unfair, maybe a thread could repeatedly get starved out trying to obtain a ticket? | |||||||||||||||||||||||||||
| Comment by Keith Smith [ 10/Sep/20 ] | |||||||||||||||||||||||||||
|
milkie: It should be easy enough to re-run. I'll try first with internalInsertMaxBatchSize set very low (1?), so we can test the hypothesis that this is causing the stalls. If it is, then we can experiment with a better size. geert.bosch: Flow control and history store also seem implicated in my test. I was just focused on the WT cache when I selected the stats to look at. See below:
| |||||||||||||||||||||||||||
| Comment by Eric Milkie [ 10/Sep/20 ] | |||||||||||||||||||||||||||
|
Alex, for your number 1 above, the current batch size for insert is "500" and this is run-time settable with the "internalInsertMaxBatchSize" server parameter, so we can experiment with different values to see their effect. | |||||||||||||||||||||||||||
| Comment by Alexander Gorrod [ 10/Sep/20 ] | |||||||||||||||||||||||||||
|
I believe keith.smith reproduced this. We walked through the FTDC in detail today. The issue that introduces latencies with this test is that it batches operations into large groups, and executes multiple of those batches in parallel. That results in MongoDB creating multiple large transactions running against WiredTiger at the same time. Sometimes one of those transactions falls behind which leads to the oldest timestamp and transaction ID being pinned. Since there is lots of content pinned by active transactions (the content WiredTiger still can't evict even with durable history), that leads to WiredTiger choosing inefficient pages to evict when trying to find enough space in cache to complete the lagging transaction. Our team did work to alleviate the long latencies seen in this workload coming into the 4.4.0 release, but that tuning either didn't catch enough cases or was undone by another change (which is possible - the heuristics we added to handle the case were tricky to get right). The reason we split off a separate WiredTiger ticket is that the server could change its behavior to alleviate the symptom here as well. I can see two avenues that would be worth exploring: 1) It could implement a smarter version of insertBatchAndHandleErrors. At the moment that function tries to do all operations in a batch within a single WiredTiger transaction. If that fails it falls back to doing each operation in a separate transaction - WiredTiger tries very hard not to fail a transaction due to the amount of resources it holds. Reducing the batch size in that code would help ensure transactions didn't get big enough to cause the sort of pressure being put on the storage engine by the workload. | |||||||||||||||||||||||||||
| Comment by Geert Bosch [ 10/Sep/20 ] | |||||||||||||||||||||||||||
|
Both flow-control activating (supposedly due to a lagging majority-committed point), and in the hanging case the stalling threads having backtraces in the history store seem to point to a scenario that may lead to specific operations needing access to the history store not making progress, resulting in a negative feedback loop. It doesn't look like your experiment reproduced that scenario. | |||||||||||||||||||||||||||
| Comment by Keith Smith [ 09/Sep/20 ] | |||||||||||||||||||||||||||
|
I've been reproducing this with partial success using mark.callaghan's repro recipe. He performs two runs of py-tpcc and observes the second take much longer. I do not see this. But like him I see a number of high latency insert operations. It looks like WT cache eviction is a (the?) culprit here. Looking at the FTDC data from my test, I added a pin at the time of every request that took longer than 16s. That was 15 requests in about 25 minutes of run time. These request align perfectly with spikes in various (mostly cache related) WT stats:
FWIW, I've been doing this on a VM with 16 GB of RAM and four cores. I've opened | |||||||||||||||||||||||||||
| Comment by Bruce Lucas (Inactive) [ 08/Sep/20 ] | |||||||||||||||||||||||||||
|
mark.callaghan, I'm not seeing these two files on the ticket:
| |||||||||||||||||||||||||||
| Comment by Mark Callaghan (Inactive) [ 04/Sep/20 ] | |||||||||||||||||||||||||||
|
Results similar to the previous comment, but with flow control disabled. Stalls were not as bad, but there were still many inserts that take more than 5 seconds with 4.4.0 and the worst case was ~75 seconds. No inserts take >= 5 seconds for 4.2.9. Legend:
flow control disabled
Response time histogram for slow queries from the test with 4.4.0 and flow
| |||||||||||||||||||||||||||
| Comment by Mark Callaghan (Inactive) [ 04/Sep/20 ] | |||||||||||||||||||||||||||
|
I repeated tests for 100, 150, 200, 250, 300, 350 and 400 warehouses. For all of these the database was in memory – no reads from storage. Tests were done for 4.4.0 and 4.2.9 with and without flow control enabled. First up is results with flow control enabled. Legend:
flow control enabled, v1. The X indicates that I was unable to count slow queries.
flow control enabled, v2. The inf indicates that an insert statement and mongod got stuck as described above. I ended up having to do kill -9 to stop mongod.
flow control enabled, v3. There were no inserts that took >= 5 seconds with 4.2.9, but many with 4.4.0
Response time histogram for slow queries from the test with 4.4.0 and flow
| |||||||||||||||||||||||||||
| Comment by Mark Callaghan (Inactive) [ 03/Sep/20 ] | |||||||||||||||||||||||||||
|
Have yet to reproduce this in 4.2.9. It is intermittent with 4.4.0. |