[SERVER-26804] Pause in inserts for YCSB Created: 27/Oct/16 Updated: 06/Dec/22 |
|
| Status: | Backlog |
| Project: | Core Server |
| Component/s: | Performance, Storage, WiredTiger |
| Affects Version/s: | 3.4.0-rc0, 3.4.0-rc1 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | David Daly | Assignee: | Backlog - Storage Engines Team |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | 3.7BackgroundTask | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Attachments: |
|
||||
| Issue Links: |
|
||||
| Assigned Teams: |
Storage Engines
|
||||
| Operating System: | ALL | ||||
| Steps To Reproduce: | YCSB as run in longevity regression suite. 3 shard cluster in AWS using c3.2xlarge instances. |
||||
| Sprint: | Storage 2016-11-21, Storage 2016-12-12 | ||||
| Participants: | |||||
| Linked BF Score: | 0 | ||||
| Description |
|
Running YCSB against a sharded cluster in our regression framework, we see a 10+ pause in inserts correlated with eviction in aggressive mode. The pause appears to be on the primary of the second shard of a three shard cluster. The test was run using c3.2xlarge instances, using local SSD for data and journal. The journal is on a separate device that the data. Here's a plot of some key stats during the pause: |
| Comments |
| Comment by David Daly [ 26/Jan/17 ] | |
|
I finally got something running here. For this test I dropped the YCSB collection between runs, but the server stayed up. YCSB uses the same document ids each time it runs, so if you don't drop the collection you get duplicate key errors. There is an option to change the start point of the documents, so we can work around this if needed. Looking at the results there are a few interesting things:
Does this data help confirm or deny your suspicions? It does seem to get much more stable over time, at least for the nodes that don't fall over. | |
| Comment by David Hows [ 12/Dec/16 ] | |
|
Hi David Daly, Sorry for the delay.
It would make sense to try testing the load phase twice. I understand about the physical hardware. Let me know how that goes. | |
| Comment by David Daly [ 01/Dec/16 ] | |
|
david.hows I can definitely do the pre-heating collections. What makes sense to do here? One simple experiment I could do is to run the load phase twice, either dropping the collection in between, or loading to a new collection the second time. Would that make sense? Testing this with physical hardware requires a fair amount more work to get a real cluster up and make sure it falls over again. If we want to try just standalone with oplog, I can run that locally on my machine. I don't know if it will reproduce there as it's just a different set of hardware than we're using in AWS. If you think there's something to learn from that experiment, I would be glad to do it. Thanks. | |
| Comment by David Hows [ 01/Dec/16 ] | |
|
To Henrik's comments. The issue here is not only about pre-heating the disk itself, although that could be a factor. It is about pre-heating the WiredTiger cache following a restart of the MongoDB instance. It is not unusual an instance to take some time in ramping to be able to keep up with eviction and there are other factors around the first usage of a collection, as we have to do things like initial writes. One thing worth testing here would be to heat up the instance and collection first, with an intial warm up pass to build the collection ahead of the workload. This should hopefully work around that worst case where we a large early checkpoint saturating the disk. To David Daly's comments. Understand where you are coming from. Are you able to look at doing some testing with physical hardware? Or pre-heating the collections as suggested above? Hopefully with some of those changes we can minimize the stalls. | |
| Comment by Susan LoVerso [ 02/Nov/16 ] | |
|
I looked at the FTDC from the run you showed. The statistics I'm viewing show a lot of IO going on during the stalls. The number of active write system calls in progress goes up and stays up at 31 for the duration of the stall. There is also 1 fsync and a checkpoint running for the entire time. The system (2nd line) shows it is spending its time in iowait. Coincidentally eviction is aggressive for the exact duration of the stall as well. | |
| Comment by David Daly [ 28/Oct/16 ] | |
|
Hi sue.loverso, we're running YCSB from here: https://github.com/mongodb-labs/YCSB/tree/evergreen on the evergreen branch.
And we're running it with this command line
Change the mongodb.url to the appropriate target for you. We were running the tests with a separate EC2 node (c3.2xlarge using local SSD) for the client, and each node in the cluster. I tried simplifying to 3 node repl set and standalone. The issue repeated on the 3 node repl set. The standalone shows drops in throughput, but none going to zero. I'm not sure if that is random variation, or because there's no oplog to run with. I kicked of a 1node repl set run also to see what happens there. I think it makes sense to start with a 1 Node repl set, and see if you can reproduce with that locally. If it doesn't we can start pulling apart what's different between your local environment and our test environment. | |
| Comment by David Daly [ 27/Oct/16 ] | |
|
Attaching raw timeseries.html file for the primary of the second shard. |