[SERVER-17299] Extraneous memory allocations in snappy hurt WiredTiger performance on Windows Created: 16/Feb/15  Updated: 19/Sep/15  Resolved: 24/Feb/15

Status: Closed
Project: Core Server
Component/s: WiredTiger
Affects Version/s: 3.0.0-rc8
Fix Version/s: 3.0.0-rc10, 3.1.0

Type: Bug Priority: Critical - P2
Reporter: Nick Judson Assignee: Mark Benvenuto
Resolution: Done Votes: 0
Labels: wiredtiger
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: File MongoPerfTest.7z     PNG File mongo1.png     PNG File mongo2.png     PNG File mongo3.png     PNG File mongo4.png     PNG File mongo5.png     PNG File mongo6.png     PNG File mongo7.png     PNG File mongo8.png    
Issue Links:
Duplicate
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Completed:
Steps To Reproduce:

I have 50 threads bulk inserting 100 <1k documents as fast as Mongo will take them.

Using the MMap engine doesn't show this behavior. Right now, after running for ~8 minutes, my cpu is sitting at 23% and the insert rate is at 11k/s.

I don't know if this is expected or not, but it seems like quite a dramatic drop off.

Participants:

 Description   

WiredTiger starts at an insertion rate of 40k/sec, and slows down to 15k/sec within 2 minutes.

System memory is not exhausted.

Specifically the CPU% appears to drop from 100% quite quickly and insert times (for the individual bulk inserts) grows.

MongoStat doesn't appear to show any specific resource limit, nor do I see anything obvious.

Using the C# driver 2.0.0-beta2.

Please see https://groups.google.com/forum/#!topic/mongodb-user/DMa24b2TqKM



 Comments   
Comment by Githook User [ 24/Feb/15 ]

Author:

{u'username': u'michaelcahill', u'name': u'Michael Cahill', u'email': u'michael.cahill@wiredtiger.com'}

Message: SERVER-17299 Eliminate memory allocation from snappy_compress.

(cherry picked from commit c614b21cc563dbb400f7258124fc5d2ed58b4dce)
Branch: v3.0
https://github.com/mongodb/mongo/commit/96315ab194016300a663e61c8ba797d0416c23aa

Comment by Michael Cahill (Inactive) [ 24/Feb/15 ]

Fixed by changes to snappy to eliminate unnecessary memory allocation.

Comment by Githook User [ 24/Feb/15 ]

Author:

{u'username': u'michaelcahill', u'name': u'Michael Cahill', u'email': u'michael.cahill@wiredtiger.com'}

Message: SERVER-17299 Eliminate memory allocation from snappy_compress.
Branch: master
https://github.com/mongodb/mongo/commit/c614b21cc563dbb400f7258124fc5d2ed58b4dce

Comment by Nick Judson [ 20/Feb/15 ]

Great. Looking forward to kicking the tires on Snappy to see how it compares. zLib had very impressive compression.

Comment by Daniel Pasette (Inactive) [ 20/Feb/15 ]

Hi nick@innsenroute.com, as you can see from mark.benvenuto's comment above, we've tracked this down to excess allocations in the snappy code. We're working on a patch there. In the meantime, thanks for your report and repro efforts. Very helpful. Please watch this ticket for updates soon.

Comment by Nick Judson [ 19/Feb/15 ]

Setting the block compressor to snappy and the journal compressor to zlib still shows the slowdown, although not quite as dramatic as with both set to snappy.

Comment by Nick Judson [ 19/Feb/15 ]

Switching from zlib to snappy causes the rapid slowdown I'm seeing in my product. No compression also doesn't show the reduced cpu usage, but the performance is quite 'choppy' even with very fast IO.

Comment by Nick Judson [ 19/Feb/15 ]

My work box shows good, steady performance with zlib as well. As an aside, I notice that when compression is turned off for both block and journal, the memory usage will grow past the 1/2 system RAM.

Comment by Nick Judson [ 19/Feb/15 ]

I tested with zlib instead of snappy on my home computer (3.1) it showed steady performance through the night. I'll re-run that test, and with compression off on my work dev box.

Comment by Mark Benvenuto [ 18/Feb/15 ]

I have not been able to repro the exact same "dive" in performance, but I have seen a significant difference between none and snappy compression for the log (see --wiredTigerJournalCompressor) with the attached repro.

sue.loverso, while I gathered statistics, it was simply easier for me to attach the VS 2013 profiler, and examine where the bottleneck was. It turns out it was not in WT directly.

Function Name Module Name Inclusive Samples Exclusive Samples Inclusive Samples % Exclusive Samples %
mongo::WiredTigerRecoveryUnit::_commit mongod.exe 70,800 171 65.27 0.16
mongo::WiredTigerRecoveryUnit::_txnClose mongod.exe 70,179 103 64.69 0.09
__session_commit_transaction mongod.exe 69,854 76 64.39 0.07
__wt_txn_commit mongod.exe 69,771 61 64.32 0.06
__wt_log_write mongod.exe 69,574 275 64.14 0.25
wt_snappy_compress mongod.exe 61,390 77 56.59 0.07
snappy_compress mongod.exe 61,313 36 56.52 0.03
snappy::RawCompress mongod.exe 61,277 32 56.49 0.03
snappy::Compress mongod.exe 61,237 131 56.45 0.12
-> free mongod.exe 28,827 68 26.57 0.06
-> operator new mongod.exe 28,479 40 26.25 0.04

From Compress in snappy.cc, we see that max_output is constantly being allocated and deallocated.

    // Compress input_fragment and append to dest
    const int max_output = MaxCompressedLength(num_to_read);
 
    // Need a scratch buffer for the output, in case the byte sink doesn't
    // have room for us directly.
    if (scratch_output == NULL) {
      scratch_output = new char[max_output];
    } else {
      // Since we encode kBlockSize regions followed by a region
      // which is <= kBlockSize in length, a previously allocated
      // scratch_output[] region is big enough for this iteration.
    }

This leaves a few possible ways to improve this:

  1. Call snappy_compress less often by having fewer threads compress the journal
  2. Change snappy to take scratch buffers from WT instead of allocating them itself.

The max size of max_output is MaxCompressedLength(num_to_read = min(N, kBlockSize)) where N = the size of the input, and kBlockSize is 65536. We just need to pass two buffers, one for scratch (which I did not show, and has a max of 65336), and one of for scratch_output which has a max size outlined below.

sue.loverso What are suggestions on how to improve this?

Comment by Daniel Pasette (Inactive) [ 18/Feb/15 ]

Thanks for attaching this Nick. We're taking a look.

Comment by Nick Judson [ 18/Feb/15 ]

mongo 8: Same application under MMap shows steady state performance.

Comment by Nick Judson [ 18/Feb/15 ]

mongo4.png - mongo7.png: Time sequence showing slow-down / reduced cpu usage in my product. This is write-once and read-once.

Comment by Nick Judson [ 17/Feb/15 ]

mongo3: After 4 mintes, rc8

Comment by Nick Judson [ 17/Feb/15 ]

mongo2: Initial speeds with rc8.

Comment by Nick Judson [ 17/Feb/15 ]

mongo 1.png: Latest daily build (3.1, 2/17/15) shows a sudden drop in speed and cpu usage.

Comment by Nick Judson [ 17/Feb/15 ]

Sample app showing slowdown.

Comment by Daniel Pasette (Inactive) [ 16/Feb/15 ]

I think long run, it will probably be better to have a reproducer in hand.
If it's too much trouble, lmk.

Thanks for the report.

On Mon, Feb 16, 2015 at 6:14 PM, Nick Judson (JIRA) <jira@mongodb.org>

Comment by Nick Judson [ 16/Feb/15 ]

Ah - I see you want me to attach it to this ticket. It's part of a larger project however I'll attempt to split it out and get something over to you.

Alternately you could take over my dev machine.

Comment by Nick Judson [ 16/Feb/15 ]

Sure. How do you want to look at this?

Comment by Daniel Pasette (Inactive) [ 16/Feb/15 ]

Hi Nick, Is it possible to attach your workload generator?

Comment by Nick Judson [ 16/Feb/15 ]

https://mms.mongodb.com/host/detail/54299fbce4b0b14ef1439821/fed5100f74ed5ea839bb8118054296c0

Comment by Nick Judson [ 16/Feb/15 ]

Windows 8.1, 64bit, 16GB RAM, All SSD, i7-5760X.

Comment by Nick Judson [ 16/Feb/15 ]

Let me know if you want me to move all the information from the forum into jira.

Generated at Thu Feb 08 03:43:57 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.