[SERVER-16790] Lengthy pauses associated with checkpoints under WiredTiger Created: 09/Jan/15  Updated: 07/Dec/16  Resolved: 26/Jun/15

Status: Closed
Project: Core Server
Component/s: Storage, WiredTiger
Affects Version/s: None
Fix Version/s: 3.1.5

Type: Bug Priority: Major - P3
Reporter: Bruce Lucas (Inactive) Assignee: David Hows
Resolution: Done Votes: 1
Labels: wttt
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: PNG File 100-3GB.png     PNG File 8s-stall-end-of-checkpoint.png    
Issue Links:
Related
related to SERVER-17157 Seeing pauses in YCSB performance wor... Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Participants:

 Description   

This test with heavy write load, 3 GB cache, shows 4-5 seconds of 0 throughput at the end of a checkpoint.

  • At the end of the pause numerous operations report >4s latency (from "mongod max logged query" graph).
  • System CPU utilization and context switch rate are not high during the pause, so this is not due to SERVER-16662 (as that had been fixed in this version of the code).
  • This test has a large number of threads (50) vs cpu cores (6), but that is probably not very relevant as we are not seeing high system CPU utilization and context switch rate, but will try running with less threads to verify.
  • Cache size was 3 GB, and was full of dirty data; suspect larger cache size may exacerbate problem; will try.


 Comments   
Comment by Michael Cahill (Inactive) [ 26/Jun/15 ]

Based on testing, this should be fixed in 3.1.5 be the next merge of WiredTiger.

Comment by Michael Cahill (Inactive) [ 13/Jan/15 ]

Fixes in WiredTiger for RC5 should help alleviate pauses in this workload significantly. Let's keep this open to confirm.

Generated at Thu Feb 08 03:42:16 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.