[SERVER-16938] 60-second stall between checkpoints under WiredTiger Created: 20/Jan/15  Updated: 14/Apr/16  Resolved: 22/Jun/15

Status: Closed
Project: Core Server
Component/s: Storage, WiredTiger
Affects Version/s: 2.8.0-rc5
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Bruce Lucas (Inactive) Assignee: Michael Cahill (Inactive)
Resolution: Cannot Reproduce Votes: 1
Labels: wttt
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: PNG File 60s-stall-between-checkpoints.png     HTML File throttle-rc9-01-gdb.html     HTML File throttle-rc9-01.html     PNG File throttle-rc9-01.png    
Issue Links:
Related
related to SERVER-17157 Seeing pauses in YCSB performance wor... Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Participants:

 Description   

During one 10-minute run of a heavy mixed workload observed a 60-second stall, apparently coinciding with the time between the end of one checkpoint and the start of another.

  • From A to B throughput drops to near 0.
  • mongod log shows a handful of ops completing throughout this period, with increasing latencies suggesting that they have been waiting since A.
  • page acquire time sleeping suggests most threads (about 93 out of 100) are waiting for access to pages
  • throughout this period 40 pages per second are being evicted because they exceeded in-memory maximum
  • yet cache statistics show nothing leaving the cache and no change in cache sizes during this period
  • at the end of the period about 2500 failed evictions are reported within 1 second. This is about the same number as the number of pages evicted during that period, i.e. 60 seconds * 40 pages / second. Is that a coincidence, or are the failed evictions reported at the end of the period the same evictions that were reported througout the period?
  • the 60-second stall appears to coinicide with the time between the end of one checkpoint and the start of the next.


 Comments   
Comment by Bruce Lucas (Inactive) [ 22/Jun/15 ]

Have not seen this issue recently, presuming addressed by one of the performance changes since this ticket was opened.

Comment by Daniel Pasette (Inactive) [ 05/Feb/15 ]

We're playing whack-a-mole with perf issues at this point. I'm moving this out of rc8 while we do a comprehensive analysis of outstanding issues.

Generated at Thu Feb 08 03:42:46 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.