[SERVER-18677] Throughput drop during transaction pinned phase of checkpoints under WiredTiger (larger data set) Created: 27/May/15  Updated: 14/Apr/16  Resolved: 16/Jul/15

Status: Closed
Project: Core Server
Component/s: WiredTiger
Affects Version/s: 3.1.3
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Bruce Lucas (Inactive) Assignee: David Hows
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: PNG File try-33.png     PNG File try-49a.png     PNG File try-49b.png     PNG File try-52.png     PNG File try-64-gdb.png    
Issue Links:
Related
related to SERVER-18875 Oplog performance on WT degrades over... Closed
related to SERVER-18829 Cache usage exceeds configured maximu... Closed
related to WT-1907 Speed up transaction-refresh Closed
is related to SERVER-18315 Throughput drop during transaction pi... Closed
is related to SERVER-18674 Very low throughput during portion of... Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Participants:

 Description   
  • 132 GB memory, 32 processors, slowish SSDs, 20 GB WT cache
  • YCSB 10 fields/doc, 50/50 workload (zipfian distribution), 20 threads
  • data set size varies - see tests below
    • 10M docs, data set ~12 GB (but cache usage can be double that)
    • 20M docs, data set ~23 GB (but cache usage can be double that)

In SERVER-18315 an issue was reported in 3.0.2 during the "transaction pinned" phase of checkpoints (B-C) with a 10M document data set:

This was fixed in 3.1.3:

However if the data set is increased to 20M documents a similar problem still appears in 3.1.3. Based on the shapes of the curves it appears this may be a little different issue: in SERVER-18315 the throughput dropped in proportion to the rise in "range of transactions pinned" but that does not seem to be the case here.

(Note: C-D is a different issue - see SERVER-18674).



 Comments   
Comment by David Hows [ 16/Jul/15 ]

Marking this as Done based on my testing, as I was unable to reproduce.

There have been a number of changes to related parts of MongoDB and WT including the linked SERVER-18875, SERVER-18829 and WT-1907. Suspect that some combination of those resolved the underlying issue.

Comment by David Hows [ 15/Jul/15 ]

Ran this on an r3.4xlarge using Bruce's setup and MongoDB master as at 15/7/2015 Hash: 3f301ac62e

The instance had a generic (slow) SSD storage formatted using XFS.

With the 10M workload there was no drop in throughput associated with pinned transaction IDs due to a checkpoint. There was only one small drop in query throughput and this was associated with an increase in writes (update ops).

With the 20M workload there was no instance when a trasnaction ID was pinned due to a checkpoint. There were no sustained drops in throughput

Comment by Michael Cahill (Inactive) [ 06/Jul/15 ]

david.hows can you please retest against MongoDB master?

Generated at Thu Feb 08 03:48:25 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.