[SERVER-17157] Seeing pauses in YCSB performance workload with WiredTiger Created: 03/Feb/15  Updated: 11/Sep/15  Resolved: 11/Sep/15

Status: Closed
Project: Core Server
Component/s: WiredTiger
Affects Version/s: 3.0.0-rc7
Fix Version/s: 3.0.6

Type: Bug Priority: Critical - P2
Reporter: Brian Towles Assignee: Michael Cahill (Inactive)
Resolution: Done Votes: 1
Labels: 28qa, wttt
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: PNG File 79492d9-pauses.png     HTML File 79492d9_raw.html     Text File PauseStackTrace.txt     PNG File SERVER_17157-DISK-ACTIVITY.png     Text File TCMalloc-YCSB.txt     Text File YCSB-zlib.txt     Text File YCSB_pause_logs_sample.txt     HTML File YCSB_workload     PNG File checkpoint_correlation.png     File drop_out.js     PNG File from_js_replication.png     HTML File from_js_replication_raw.html     File get_pauses_with_updates.js     File load_pause_data.js     Zip Archive pause_issue_cache_full_logs.zip     HTML File pause_issue_cache_full_raw.html     Zip Archive pause_issue_cache_not_full_logs.zip     HTML File pause_issue_cache_not_full_raw.html     PNG File pause_issue_full_cache_2015-02-02 19-13-00.png     PNG File pause_issue_not_full_cache_2015-02-02 19-13-00.png     Text File pstack.txt     PNG File server-17157-rc9-long.png     PNG File server-17157-rc9-zoom.png     Text File server17157-ycsbparams.txt     PNG File server_15944_run.png     PNG File server_17157_journal_on_xvdc.png     Text File server_17157_journal_on_xvdc_pstack.txt     HTML File server_17157_journal_on_xvdc_raw.html     Text File wt-nochange.txt     Text File wt-writethrough.txt     PNG File ycsb5050-prerc9-norepl.png    
Issue Links:
Related
related to SERVER-17194 Low Throughput for YCSB 50-50 workloa... Closed
is related to SERVER-17907 B-tree eviction blocks access to coll... Closed
is related to SERVER-16575 intermittent slow inserts with WiredT... Closed
is related to SERVER-16790 Lengthy pauses associated with checkp... Closed
is related to SERVER-16938 60-second stall between checkpoints u... Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Steps To Reproduce:

mongod is run with the following command line and no additional config file:

numactl --physcpubind=0-15 --interleave=all mongod --dbpath <path to data> --logpath <path to log> --storageEngine=wiredTiger --logappend --quiet --fork

This is using thumbtack-ycsb with the following config as the YCSB workload file:

recordcount=10000000
operationcount=2147483647
workload=com.yahoo.ycsb.workloads.CoreWorkload
readallfields=true
readproportion=0.5
updateproportion=0.5
scanproportion=0
insertproportion=0
requestdistribution=zipfian
threadcount=16
maxexecutiontime=259200
exportmeasurementsinterval=30000
insertretrycount=10
ignoreinserterrors=true
readretrycount=1
updateretrycount=1
timeseries.granularity=100
reconnectionthroughput=10
reconnectiontime=1000

It is run with a YCSB load

<path to ycsb install>/bin/ycsb load mongodb -s -P <path to workload file>

immediately followed by the YCSB run phase

<path to ycsb install>/bin/ycsb run mongodb -s -P <path to workload file>

Hardware wise, this is an Amazon Linux c3-4xlarge EC2 instance with 16 cores bound to the mongod instance

The high number for the operationcount is that this particular run is target to a longevity test but the pauses start within the first 200 seconds and continues throughout the length of the run.

Participants:

 Description   

During performance tests on a Amazon linux EC2 instance (dedicated c3-4xlarge) stand alone instance, I am seeing extended pauses during a mixed 50/50 update and read run that seem to correspond to high write activity in the general but no corresponding write activity in the disk or cache.

The benchmark is running the mixed 50/50 workload of updates and reads. All the threads doing the reading are returning in a reasonable time giving low thread latency for Read threads. But the Update threads seem to be blocked. The latency when the updates start up again is showing as very high.

 388 sec: 10043995 operations; 10020.5 current ops/sec; [UPDATE AverageLatency(us)=2054.55] [READ AverageLatency(us)=1162.3]
 390 sec: 10134561 operations; 45260.37 current ops/sec; [UPDATE AverageLatency(us)=492.66] [READ AverageLatency(us)=203.15]
 392 sec: 10255505 operations; 60441.78 current ops/sec; [UPDATE AverageLatency(us)=333.89] [READ AverageLatency(us)=183.77]
 394 sec: 10329907 operations; 37201 current ops/sec; [UPDATE AverageLatency(us)=546.05] [READ AverageLatency(us)=211.6]
 396 sec: 10329907 operations; 0 current ops/sec;
 398 sec: 10329907 operations; 0 current ops/sec;
 400 sec: 10329907 operations; 0 current ops/sec;
 402 sec: 10329907 operations; 0 current ops/sec;
 404 sec: 10329907 operations; 0 current ops/sec;
 406 sec: 10329907 operations; 0 current ops/sec;
 408 sec: 10329907 operations; 0 current ops/sec;
 410 sec: 10329907 operations; 0 current ops/sec;
 412 sec: 10329907 operations; 0 current ops/sec;
 414 sec: 10329907 operations; 0 current ops/sec;
 416 sec: 10329907 operations; 0 current ops/sec;
 418 sec: 10329907 operations; 0 current ops/sec;
 420 sec: 10329907 operations; 0 current ops/sec;
 422 sec: 10358050 operations; 14071.5 current ops/sec; [UPDATE AverageLatency(us)=32062.47] [READ AverageLatency(us)=178.26]
 424 sec: 10409153 operations; 25538.73 current ops/sec; [UPDATE AverageLatency(us)=1044.98] [READ AverageLatency(us)=193.84]
 426 sec: 10520720 operations; 55755.62 current ops/sec; [UPDATE AverageLatency(us)=375.93] [READ AverageLatency(us)=186.22]
 428 sec: 10645124 operations; 62202 current ops/sec; [UPDATE AverageLatency(us)=319.98] [READ AverageLatency(us)=183.11]
 430 sec: 10759784 operations; 57301.35 current ops/sec; [UPDATE AverageLatency(us)=364] [READ AverageLatency(us)=182.66]

Looking at the metrics is shows a large plateau in general write activity but there does not seem to be any noticeable uptick in write activity in WiredTiger, on disk (xvdb is data only disk) or in the cache. I have tested in full and not full cache modes with the same result.

This appears similar to SERVER-16269,SERVER-16662 and is the same as what I was seeing in CAP-1822.

This was actually tested with the b70d96e build (post rc7), but has been showing up in multiple RCs.



 Comments   
Comment by Michael Cahill (Inactive) [ 11/Sep/15 ]

Agreed, let's resolve this one and open new tickets if we see more pauses during workloads.

Comment by Alexander Gorrod [ 11/Sep/15 ]

I reran the test as defined in the description of this issue with the 3.0.6 release on a c3.4xlarge (as described in the ticket). I don't see any pauses during the 20 minutes of running the test (the initial report stated that pauses started within the first 3 minutes). I don't see any pauses at all - the lowest throughput is 1348 ops/sec.

Can we call this one done? If there are other configurations that trigger throughput pauses they deserve their own tickets.

Comment by Eitan Klein [ 28/Apr/15 ]

I followed up on asya concern related to zlib with this YCSB workload.

I've ran the test w/ my write_through and TCMalloc private binary w/ zlib compression enabled.

I noticed the pauses and took a dump file when i captured 0 throughput , apparently TCMalloc seems to the primary suspect – See attached threads dumps

CC: mark.benvenuto

Comment by Eitan Klein [ 26/Apr/15 ]

WT-No Change - BaseLine YCSB output
WT-WriteThrough - No Flush and Write-through flag enabled

Comment by Eitan Klein [ 26/Apr/15 ]

On windows (AWS environment w/ the repro as David mentioned above), i took a dump file and have noticed all threads waiting on the FlushFileBuffers call (see attached)

I've decided to remove the FlushFileBuffers w/ the experiment described in the table below (check point will happen but the idea to achieve durability w/o FlushFileBuffers ).
Note: I initially observed slow performance which happened b/c of the known heap contention, so i performed the experiment on top of the TCMalloc fix

Experiment Results Impression
No change (baseline w/ TCmalloc) I've observed the stalls Fast
No FlushFileBuffers with FILE_FLAG_WRITE_THROUGH I didn't observe pauses, and the system slowed down during checkpoint but significantly above 0 a bit slower then original
FlushFileBuffers with FILE_FLAG_WRITE_THROUGH N/A seems relatively too slow
No FlushFileBuffers , NO FILE_FLAG_NO_BUFFERING I didn't manage to operate this on AWS

Fyi, MSDN article on this topic

https://msdn.microsoft.com/en-us/library/windows/desktop/aa363858(v=vs.85).aspx#caching_behavior

A write-through request via FILE_FLAG_WRITE_THROUGH also causes NTFS to flush any metadata changes, such as a time stamp update or a rename operation, that result from processing the request. For this reason, the FILE_FLAG_WRITE_THROUGH flag is often used with the FILE_FLAG_NO_BUFFERING flag as a replacement for calling the FlushFileBuffers function after each write, which can cause unnecessary performance penalties. Using these flags together avoids those penalties. For general information about the caching of files and metadata, see File Caching

Comment by David Daly [ 22/Apr/15 ]

The linux AMI I used had 3.14. We were also able to reproduce against a windows box.

Comment by Asya Kamsky [ 22/Apr/15 ]

what's the kernel version on the repros? I ask because I discovered that 3.13 behaves very poorly during these slow-downs (which seem to be all checkpoint activity related in my repros) but 3.16.0 is much better behaved - slows down in the same places but does not completely stall and become unresponsive.

Comment by David Daly [ 22/Apr/15 ]

Copied repro steps from comment to steps to reproduce above.

Comment by Michael Cahill (Inactive) [ 22/Mar/15 ]

brian.towles, are you able to gather statistics during a run with 3.0.1 and upload those logs?

Comment by Brian Towles [ 20/Mar/15 ]

alexander.gorrod Last runs I did with 3.0.1 were still seeing pauses of up to 25 to 30 seconds on the updates sections.

Comment by Alexander Gorrod [ 20/Mar/15 ]

brian.towles@mongodb.com I just re-ran the original workload using an AWS c3.4xlarge machine and MongoDB 3.0.1. I'm not seeing the performance drop outs you report. Could you retest with the latest release?

Comment by Asya Kamsky [ 22/Feb/15 ]

rc9 seems to have improved this use case.

Comment by Brian Towles [ 17/Feb/15 ]

darren.wood@10gen.com Yes I am only running the default config with the storageEngine change to WT. The command line used is as above:

numactl --physcpubind=0-15 --interleave=all mongod --dbpath <path to data> --logpath <path to log> --storageEngine=wiredTiger --logappend --quiet --fork

With my YCSB run (which is straight thumbtack YCSB) its showing up around 242 seconds. Running the get_pauses_with_updates.js (after the data has been loaded using the load_pause_data.js) I was able to replicate it in within 120 to 150 seconds.

Comment by Daniel Pasette (Inactive) [ 06/Feb/15 ]

Would be interested in re-running this workload with the fix for SERVER-15944 checked in and merged back to v3.0

Comment by Daniel Pasette (Inactive) [ 05/Feb/15 ]

I'm also able to reproduce pauses and very large througput dropoffs with a simple javascript test when running with an oplog present. Set up is simple:

journal is symlinked to separate spindle, create oplog of 5GB

mongod --dbpath /home/dan/wt-data/ --storageEngine wiredTiger --master --oplogSize 5000

Run attached script set to use 10 worker threads and run for 1000 seconds, but can be easily configured in script:

mongo drop_outs.js

Using mongostat and timeseries tools to monitor throughput. Wt stats (using timeseries tools) show absurdly high number of pages walked for eviction and page reconciliation calls recorded on the oplog collection.

Generated at Thu Feb 08 03:43:29 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.