[SERVER-16366] Gradually degrading performance (wiredtiger) Created: 01/Dec/14  Updated: 18/Dec/14  Resolved: 17/Dec/14

Status: Closed
Project: Core Server
Component/s: Performance, Storage
Affects Version/s: 2.8.0-rc1, 2.8.0-rc2
Fix Version/s: 2.8.0-rc3

Type: Bug Priority: Major - P3
Reporter: Cailin Nelson Assignee: Bruce Lucas (Inactive)
Resolution: Done Votes: 0
Labels: wiredtiger
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: PNG File Screen Shot 2014-12-01 at 1.41.08 PM.png     PNG File Screen Shot 2014-12-01 at 1.45.38 PM.png     PNG File checkpoint.png     PNG File dev_1.png     HTML File gdbmon.html     HTML File longer_transition.html     PNG File longer_transition.png     PNG File mmsdev-2014-12-10.png     PNG File mmsdev_stable.png     PNG File mmsqa.png     PNG File mmsqa2.png     HTML File transition.html     PNG File transition.png     PNG File waiting-1.png     PNG File waiting-2.png     PNG File working-1.png     PNG File working-2.png    
Issue Links:
Depends
depends on SERVER-16247 Oplog declines in performance over ti... Closed
Related
Backwards Compatibility: Fully Compatible
Operating System: ALL
Participants:

 Description   

After ~24 hours or so, I am finding that performance gradually degrades.

  • Primary: 2.8.0rc1 WT
  • Secondaries: 2.8.0rc1 MMAPv1

Please see attached MMS Screenshot. The displayed range starts shortly after a full system restart. Note that queues increase as well as replication lag on the secondaries. Also attach opcounters, from which you can at least crudely see that the application load is consistent.

I do not see this type of behavior when the primary is MMAPv1.



 Comments   
Comment by Cailin Nelson [ 05/Dec/14 ]

Still seeing a performance issue, however, this one was not gradual. Not that the opcounters drop off not because of a change in app behavior, but because the app just can't get the write through fast enough.

Logs: https://dropbox.10gen.com/cailin/2014-12-05-17-05/mms-qa-2014-12-15.log
Disclosure: I do have TPH enabled.

Here's what it looks like after a restart of the primary:

Generated at Thu Feb 08 03:40:50 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.