[SERVER-19522] Capped collection insert rate declines over time under WiredTiger Created: 22/Jul/15 Updated: 24/Aug/15 Resolved: 17/Aug/15 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | WiredTiger |
| Affects Version/s: | None |
| Fix Version/s: | 3.0.6, 3.1.7 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Bruce Lucas (Inactive) | Assignee: | Michael Cahill (Inactive) |
| Resolution: | Done | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Attachments: |
|
||||||||||||||||||||
| Issue Links: |
|
||||||||||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||||||||||
| Operating System: | ALL | ||||||||||||||||||||
| Backport Completed: | |||||||||||||||||||||
| Participants: | |||||||||||||||||||||
| Description |
|
Capped collection insert rate begins to decline steadily once it is full and deletions start. Restarting mongod (without disturbing the collection) restores inserts to the original rate, but insert rate then begins to decline to a fraction of the initial rate:
Seems to be worse with larger documents and with larger collections:
Stack trace excepts showing involvement of WT delete (complete stack traces attached as capped-delete.html):
|
| Comments |
| Comment by Ramon Fernandez Marina [ 17/Aug/15 ] | ||||||||||||||||||||||||||||||||
|
The capped collection insert rate decline over time has been fixed so I'm resolving this issue. I've created | ||||||||||||||||||||||||||||||||
| Comment by Michael Cahill (Inactive) [ 12/Aug/15 ] | ||||||||||||||||||||||||||||||||
|
bruce.lucas@10gen.com, I am inclined to resolve the original issue reported here as fixed in 3.0.6 and 3.1.7. That is, ignoring the drop-off when the capped collection becomes full, the performance over time is now stable. I'm happy to investigate the impact of the capped deletes further (though I don't have any good ideas at the moment), but I think that should have a separate ticket. Does that seem reasonable to you? | ||||||||||||||||||||||||||||||||
| Comment by Githook User [ 04/Aug/15 ] | ||||||||||||||||||||||||||||||||
|
Author: {u'username': u'agorrod', u'name': u'Alex Gorrod', u'email': u'alexander.gorrod@mongodb.com'}Message: Merge pull request #2091 from wiredtiger/intl-page-empty
| ||||||||||||||||||||||||||||||||
| Comment by Bruce Lucas (Inactive) [ 30/Jul/15 ] | ||||||||||||||||||||||||||||||||
|
My test run was standalone; everything else is as described above (12 core / 24 cpu machine, 64 GB mem, 24 threads of the workload). Any differences in throughput from my tests presumably are due to some difference in configuration or test parameters. In particular my test was entirely CPU bound: the 20 GB capped collection fits in the default 32 GB cache. You can also see this from the stack traces I attached: when deletions begin, 24 threads spend much of their time in pthread_mutex_timedlock called from cappedDeleteAsNeeded, waiting for a single thread to do the deletions, and that single thread is clearly CPU-bound - spending almost all its time in __wt_tree_walk, and much of that time in __wt_evict, with almost no i/o among those stack traces. So it seems if it were possible to eliminate the evictions that would provide improvement. I also wonder if it might be possible to improve things by allowing more than one thread to do the deletions - for example, n threads each deleting every nth record of a range, to provide a better match for the n threads doing insertion. TBD I guess how much contention would limit the speedup. | ||||||||||||||||||||||||||||||||
| Comment by Martin Bligh [ 30/Jul/15 ] | ||||||||||||||||||||||||||||||||
|
Only reason I can see to do it in WT is if you can think of some way to make lookups faster than having to do N lookups when we partition into N pieces. | ||||||||||||||||||||||||||||||||
| Comment by Michael Cahill (Inactive) [ 30/Jul/15 ] | ||||||||||||||||||||||||||||||||
|
I don't see any way to avoid I/O if capped collections are larger than cache (as in the 20GB case here) because we need to read the tail in order to unindex records. Yes, partitioning would fix this but would have to be done at a higher level than the current storage engine API, I think. FWIW, I believe that TokuMX has this kind of partitioning for exactly this reason. | ||||||||||||||||||||||||||||||||
| Comment by Martin Bligh [ 30/Jul/15 ] | ||||||||||||||||||||||||||||||||
|
I'm guessing when we start capping the collection we're doing a bunch of read I/O and our working set is < WT cache size by the time we've done indexing etc? | ||||||||||||||||||||||||||||||||
| Comment by Michael Cahill (Inactive) [ 30/Jul/15 ] | ||||||||||||||||||||||||||||||||
|
martin.bligh, Alex was running without oplog: I suspect the differences were down to the relative CPU and I/O speeds on his test machine vs Bruce's. I haven't had a chance to look into this more, but I think this workload would go faster if the scan avoided evicting dirty pages, since we're just about to mark those pages deleted. I'll see what I can do, but at this point it will probably be early next week before I can really figure out what's possible. | ||||||||||||||||||||||||||||||||
| Comment by Martin Bligh [ 30/Jul/15 ] | ||||||||||||||||||||||||||||||||
|
Are you guys running with or w/o oplog? | ||||||||||||||||||||||||||||||||
| Comment by Alexander Gorrod [ 30/Jul/15 ] | ||||||||||||||||||||||||||||||||
|
bruce.lucas@10gen.com I've been running this workload and don't see as significant a drop off as you once capping starts happening. The drop off I see is around 30%. Having said that my throughput rate appears lower than yours from the start (around 17k ops/sec). The script I'm using to test the performance is based on the snippets you included above:
From looking at the profiling and the implementation of capped collection deletes, that's the sort of performance difference I expect to see. I agree that there is work we could do on improving the performance of capped collections, it feels as though that belongs in a different ticket to me. | ||||||||||||||||||||||||||||||||
| Comment by Daniel Pasette (Inactive) [ 29/Jul/15 ] | ||||||||||||||||||||||||||||||||
|
Author: {u'username': u'agorrod', u'name': u'Alex Gorrod', u'email': u'alexg@wiredtiger.com'}Message: We were leaving the WT_PAGE_SPLIT_LOCKED flag set. | ||||||||||||||||||||||||||||||||
| Comment by Githook User [ 28/Jul/15 ] | ||||||||||||||||||||||||||||||||
|
Author: {u'username': u'agorrod', u'name': u'Alex Gorrod', u'email': u'alexander.gorrod@mongodb.com'}Message: Merge pull request #2091 from wiredtiger/intl-page-empty
| ||||||||||||||||||||||||||||||||
| Comment by Githook User [ 28/Jul/15 ] | ||||||||||||||||||||||||||||||||
|
Author: {u'username': u'keithbostic', u'name': u'Keith Bostic', u'email': u'keith@wiredtiger.com'}Message:
| ||||||||||||||||||||||||||||||||
| Comment by Githook User [ 28/Jul/15 ] | ||||||||||||||||||||||||||||||||
|
Author: {u'username': u'michaelcahill', u'name': u'Michael Cahill', u'email': u'michael.cahill@mongodb.com'}Message: | ||||||||||||||||||||||||||||||||
| Comment by Alexander Gorrod [ 28/Jul/15 ] | ||||||||||||||||||||||||||||||||
|
bruce.lucas@10gen.com I have reproduced the problem on 3.0. After some debugging it appears as though the problem is that WiredTiger ends up with a lot of internal pages with only references to deleted pages at the start of a tree. Walking those pages during the capped delete is slow, which causes the significant slowdown you see. We are working on a fix. | ||||||||||||||||||||||||||||||||
| Comment by Bruce Lucas (Inactive) [ 24/Jul/15 ] | ||||||||||||||||||||||||||||||||
|
Verified after applying the patch for | ||||||||||||||||||||||||||||||||
| Comment by Michael Cahill (Inactive) [ 23/Jul/15 ] | ||||||||||||||||||||||||||||||||
|
bruce.lucas@mongodb.com, there is an attempt to clean up internal pages as well as leaf pages but it is possible that code isn't working well for this workload. What the stacks suggest is that the cursor walk that figures out the next truncation point is spending a lot of time doing (pointless) eviction. That causes other threads to be throttled so that the oplog size is kept under control. All of that code is eliminated by martin.bligh's work on "oplogStones" in | ||||||||||||||||||||||||||||||||
| Comment by Bruce Lucas (Inactive) [ 22/Jul/15 ] | ||||||||||||||||||||||||||||||||
|
Above stack traces and the numbers showing declining rate were obtained with 20 GB capped collection, 4 KB documents, 24 threads, on a machine with 12 cores / 24 cpus, 64 GB memory. Code:
|