[SERVER-16269] WiredTiger blocks queries and updates for many seconds. Created: 21/Nov/14  Updated: 03/Dec/21  Resolved: 29/Dec/14

Status: Closed
Project: Core Server
Component/s: Storage
Affects Version/s: 2.8.0-rc0
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: John Page Assignee: Michael Cahill (Inactive)
Resolution: Duplicate Votes: 2
Labels: wiredtiger
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: PNG File WiredTiger.csv.max.png     HTML File gdbprof.html     PNG File incident1-blocker.png     PNG File incident1-waiters-1.png     PNG File incident1-waiters-2.png     HTML File incident1.html     PNG File incident1.png     HTML File incident2.html     HTML File isse.html     PNG File stats.WT.nocomp.csv.max.png     PNG File try13.png     PNG File try14.png     PNG File try15.png    
Issue Links:
Duplicate
is duplicated by SERVER-16479 Pauses due to large page eviction dur... Closed
is duplicated by SERVER-16492 thread gets stuck doing update for a ... Closed
is duplicated by SERVER-16566 long pause during bulk insert due to ... Closed
Related
related to SERVER-16572 Add counter for calls to sched_yield ... Closed
related to SERVER-16602 Review WiredTiger default settings fo... Closed
related to SERVER-16643 extend existing THP warning to THP de... Closed
is related to SERVER-16619 Very long pauses using 2.8.0-rc3 when... Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Participants:

 Description   

Comparing MMAP to WiredTiger on an 8 core server with 32GB Ram and local SSD (AWS Instance).

Throwing a mix of insert/update/query at the server and wiredtiger viewed in mongostat shows really inconsistent performance - also many operations in log are poking about the 100ms mark with inserts taking >1s in some cases. I dont see this inconsistency with mmap v1.

mmapv1 engine output from mongostat looks like this.

insert query update delete getmore command flushes mapped vsize    res faults     locked db idx miss % qr|qw ar|aw netIn netOut conn     time
  3076  8761   5582     *0       0     2|0       0   6.0g 12.5g 415.0m      0         :0.0%          0 37|85 42|86   10m     9m  129 15:33:16
  2789 10304   5979     *0       0     1|0       0   6.0g 12.5g 418.0m      0         :0.0%          0 42|82 44|83   11m    11m  129 15:33:17
  3668  8509   4778     *0       0     1|0       0   6.0g 12.5g 412.0m      0         :0.0%          0 10|65 39|67   10m     9m  129 15:33:18
  4151  8757   5172     *0       0     1|0       0   6.0g 12.5g 472.0m      0         :0.0%          0 29|94 32|95   11m     9m  129 15:33:19
  3019  7988   6096     *0       0     1|0       0   6.0g 12.5g 477.0m      0         :0.0%          0 44|78 48|79   11m     9m  129 15:33:20
  3661  7152   5647     *0       0     2|0       0   6.0g 12.5g 478.0m      0         :0.0%          0 53|75 56|75   11m     8m  129 15:33:21
  2378  7132   6574     *0       0     1|0       0   6.0g 12.5g 479.0m      0         :0.0%          0 32|91 35|92   11m     8m  129 15:33:22
  2437  8355   6379     *0       0     1|0       0   6.0g 12.5g 484.0m      0         :0.0%          0 41|83 43|84   11m     9m  129 15:33:23
  2334  8889   5756     *0       0     1|0       0   6.0g 12.5g 483.0m      0         :0.0%          0 52|72 54|73   10m     9m  129 15:33:24

WT looks like this

insert query update delete getmore command flushes mapped vsize res faults     locked db idx miss % qr|qw ar|aw netIn netOut conn     time
  2348  8786   5922     *0       0     2|0     n/a    n/a   n/a n/a    n/a           n/a        n/a   n/a   n/a   10m     9m  129 15:35:48
  9918 30670  21542     *0       0     1|0     n/a    n/a   n/a n/a    n/a           n/a        n/a   n/a   n/a   39m    34m  129 15:35:49
 11902 33152  20733     *0       0     1|0     n/a    n/a   n/a n/a    n/a           n/a        n/a   n/a   n/a   41m    37m  129 15:35:50
  3801  8540   5223     *0       0     1|0     n/a    n/a   n/a n/a    n/a           n/a        n/a   n/a   n/a   11m     9m  129 15:35:51
    *0    *0     *0     *0       0     1|0     n/a    n/a   n/a n/a    n/a           n/a        n/a   n/a   n/a   79b    13k  129 15:35:52
  2934  7013   5251     *0       0     2|0     n/a    n/a   n/a n/a    n/a           n/a        n/a   n/a   n/a   10m     8m  129 15:35:53
   548  1183    896     *0       0     1|0     n/a    n/a   n/a n/a    n/a           n/a        n/a   n/a   n/a    1m     1m  129 15:35:54
    52    91     77     *0       0     1|0     n/a    n/a   n/a n/a    n/a           n/a        n/a   n/a   n/a  159k   118k  129 15:35:55
  5622 14617  12105     *0       0     1|0     n/a    n/a   n/a n/a    n/a           n/a        n/a   n/a   n/a   22m    16m  129 15:35:56



 Comments   
Comment by Michael Cahill (Inactive) [ 29/Dec/14 ]

This has been split into SERVER-16651, SERVER-16662 and SERVER-16667.

Comment by Daniel Pasette (Inactive) [ 23/Dec/14 ]

We've seen that disabling Transparent Huge Pages makes the "pauses" go away in some workloads. In order to do, the user had to also change /sys/kernel/mm/transparent_hugepage/defrag to never.
See also: SERVER-16643.

Keeping ticket open for further analysis.

Comment by Keith Bostic (Inactive) [ 23/Dec/14 ]

asya, at this point, are you running with the sched_yield calls backing off to sleep, or with the the sched_yield calls just spinning?

Comment by Keith Bostic (Inactive) [ 22/Dec/14 ]

bruce.lucas@10gen.com: all of our tests were on AWS Linux machines.

Comment by Asya Kamsky [ 21/Dec/14 ]

robert.j.moore@allanbank.com thank you for the offer - I also can reproduce this with YCSB (and suspect a related ticket I filed is also this as a root cause). Fewer threads in fact show less of this problem but at the cost of lower overall throughput. I'm testing a couple of internal configurations and settings to determine which path may be most likely to alleviate this.

Comment by Robert Moore [ 21/Dec/14 ]

Asya -

I am seeing these same pauses using YCSB (SERVER-16619) and I have seen them with 1 connections up to 50 connections.

Where is a log with 1 connection:

2014-12-20T18:33:29.414-0500 I WRITE    [conn6093] update ycsb.usertable query: { _id: "user7906682381250086252" } update: { $set: { field9: BinData(0, 2229303E3C30332C3D392B3D273A212B303C3738213D3A35383D323935292025263B252A3E35363C3D203F3D342E29202331253E3926343537223A3033312A35312334383529...) } } nscanned:1 nscannedObjects:1 nMatched:1 nModified:1 fastmod:1 keyUpdates:0  1310ms
2014-12-20T18:33:29.414-0500 I QUERY    [conn6093] command ycsb.$cmd command: update { update: "usertable", ordered: false, writeConcern: { w: 1 }, updates: [ { q: { _id: "user7906682381250086252" }, u: { $set: { field9: BinData(0, 2229303E3C30332C3D392B3D273A212B303C3738213D3A35383D323935292025263B252A3E35363C3D203F3D342E29202331253E3926343537223A3033312A35312334383529...) } } } ] } keyUpdates:0  reslen:55 1310ms

The length of the pause does seem to be related to the number of connections with more connections causing longer pauses but I need to collect more data to be sure.

I was originally seeing this problem with our application stress test but have not got YCSB exhibiting the pauses regularly so I can also test any log levels or candidates you may have.

Rob.

Comment by Asya Kamsky [ 21/Dec/14 ]

John, how many cores do you have relative to 128 threads? If you lower the number of threads to be closer to the number of cores (say no more than double the number of cores) how does that affect the worst latency?

Comment by John Page [ 19/Dec/14 ]

If you mean my workload the WT cursor is hidden from the mongo API as far
as I can see, anyone is welcome to review that code.

Comment by John Page [ 17/Dec/14 ]

Let me know as soon as you have a candidate fix and I can test for you.

Comment by John Page [ 16/Dec/14 ]

The code for this is available - it's very simple - 128 processes on another box doing 16% insert, 34% update, 50% read - all on primary key. No TTL collections at all.

Records are 30 fields, approx 1K

Comment by John Page [ 16/Dec/14 ]

I've tried nojournal to disbale the write ahaead log
Iv'e tried checkpoint=(wait=2)
Ive tried slowms=2000 (IN case it's that log that's blocking)

Still seeing 15second pause for update and query at 180 seconds in and 50 second pause at 360 seconds in

Comment by John Page [ 16/Dec/14 ]

Uploaded as requested.

Comment by John Page [ 16/Dec/14 ]

Gzipping now - there is 1GB of it.

Comment by John Page [ 16/Dec/14 ]

Tried nojournal to turn of WAL in WT - it's not that.

Comment by John Page [ 16/Dec/14 ]

Ah -except logs can you give me a time range from the files you have and I think I will have the log.

Comment by John Page [ 16/Dec/14 ]

I have many more graphs and stats I can share if you want them bruce.lucas@10gen.com

Comment by John Page [ 16/Dec/14 ]

Logs uploaded via ssh.

Comment by John Page [ 16/Dec/14 ]

30 minute run, 128 threads at client end, seeing Queries and Updates blocking for over a minute.

Comment by John Page [ 09/Dec/14 ]

Also - this is default parameters, 2.8.0-RC-2

Comment by John Page [ 05/Dec/14 ]

I'm doing some much more extensive testing and rechecking/ruling out any effects at my end - will report back once I have some more solid info.

Generated at Thu Feb 08 03:40:31 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.