[SERVER-16902] wt cache: maximum page size at eviction (MB) exceeds memory_page_max by order of magnitude Created: 16/Jan/15  Updated: 02/Dec/15  Resolved: 02/Oct/15

Status: Closed
Project: Core Server
Component/s: WiredTiger
Affects Version/s: None
Fix Version/s: 3.0.7, 3.1.7

Type: Bug Priority: Major - P3
Reporter: Mark Callaghan Assignee: David Hows
Resolution: Done Votes: 0
Labels: wiredtiger, wttt
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: File jmongoiibench$MyQuery.class     File jmongoiibench$MyReporter.class     File jmongoiibench$MyWriter.class     File jmongoiibench.class     Java Source File jmongoiibench.java     Java Source File jmongoiibench.java     Java Source File jmongoiibench.java     Java Source File jmongoiibench.java     Java Source File jmongoiibench.java     File mongo.conf     File run.simple.bash     File run.simple.bash     File run.simple.bash     File run.simple.bash     File run.simple.bash.L1.Q10     File run.simple.bash.L10.Q1    
Issue Links:
Depends
is depended on by SERVER-21442 WiredTiger changes for MongoDB 3.0.8 Closed
Related
related to SERVER-16977 Memory increase trend when running ha... Closed
related to SERVER-20728 mongod hang during 3-node replSet per... Closed
is related to WT-1869 cache: maximum page size at eviction ... Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Completed:
Participants:

 Description   

With rc5, WT b-tree and the zlib block compressor there is either a memory leak or WT uses much more memory compared to rc4.

My 144G test server gets OOM after insert 37M docs and vsz, rss for mongod are 175590976 142382556 (175B, 142B) at that point. With rc4 vsz, rss were 100881768 92328172 (100B, 92B) after 400M inserts.

The test is run as described in SERVER-16886

Looking at /proc/smaps there is one huge allocation:

7f0fcd400000-7f297c000000 rw-p 00000000 00:00 0 
Size:           107720704 kB
Rss:            90840720 kB
Pss:            90840720 kB
Shared_Clean:          0 kB
Shared_Dirty:          0 kB
Private_Clean:         0 kB
Private_Dirty:  90840720 kB
Referenced:     90840080 kB
Anonymous:      90840720 kB
AnonHugePages:         0 kB
Swap:                  0 kB
KernelPageSize:        4 kB
MMUPageSize:           4 kB
Locked:                0 kB
VmFlags: rd wr mr mw me ac ?? 

And one other large allocation

7f2990000000-7f3999800000 rw-p 00000000 00:00 0 
Size:           67264512 kB
Rss:              207524 kB
Pss:              207524 kB
Shared_Clean:          0 kB
Shared_Dirty:          0 kB
Private_Clean:         0 kB
Private_Dirty:    207524 kB
Referenced:       190240 kB
Anonymous:        207524 kB
AnonHugePages:         0 kB
Swap:             106344 kB
KernelPageSize:        4 kB
MMUPageSize:           4 kB
Locked:                0 kB
VmFlags: rd wr mr mw me ac ?? 

From log, I don't set the cache size, but it used 70G

2015-01-16T09:06:10.560-0800 I STORAGE  [initandlisten] wiredtiger_open config: create,cache_size=70G,session_max=20000,eviction=(threads_max=4),statistics=(fast),log=(enabled=true,archive=true,path=journal,compressor=snappy),checkpoint=(wait=60,log_size=2GB),statistics_log=(wait=0),



 Comments   
Comment by Githook User [ 02/Oct/15 ]

Author:

{u'username': u'michaelcahill', u'name': u'Michael Cahill', u'email': u'michael.cahill@mongodb.com'}

Message: Complete SERVER-16902 backport: in-memory splits during checkpoints.

This change was partially backported already via changes to
__wt_page_can_split, make it work for real.
Branch: mongodb-3.0
https://github.com/wiredtiger/wiredtiger/commit/5c32a0bee76ec0fd09722b80b40825a272cb46e9

Comment by David Hows [ 04/Sep/15 ]

I've re-run the test I did to confirm if the issue here has been resolved.

As it stands, there are times when the "maximum page size at eviction" can exceed the 10MB limit, but it is never by the orders of magnitude seen before.

I'm marking this one as resolved.

Comment by Michael Templeman [ 05/Jun/15 ]

I have one replica set server that encounters an OOM fatal crash every time I make it a primary for the shard. Watching the server with mongostat this time I noticed that the resident memory grew to 29.4G even though I had set the wiredTigerCacheSizeGB to 22 GB (--wiredTigerCacheSizeGB 22). The server is an AWS i2.xlarge instance running Amazon Linux (ami-1ecae776). The backtrace in the log was:

2015-06-05T18:24:17.343+0000 I NETWORK  [initandlisten] connection accepted from 10.233.151.237:54382 #22573 (229 connections now open)
2015-06-05T18:24:17.344+0000 F -        [conn22195] out of memory.
 
 0xf6d609 0xf6d139 0xef1212 0xbba8ec 0xace3ee 0xad379d 0x82a47d 0xf2073b 0x7f263f1b3df3 0x7f263dc671ad
----- BEGIN BACKTRACE -----
{"backtrace":[{"b":"400000","o":"B6D609"},{"b":"400000","o":"B6D139"},{"b":"400000","o":"AF1212"},{"b":"400000","o":"7BA8EC"},{"b":"400000","o":"6CE3EE"},{"b":"400000","o":"6D379D"},{"b":"400000","o":"42A47D"},{"b":"400000","o":"B2073B"},{"b":"7F263F1AC000","o":"7DF3"},{"b":"7F263DB71000","o":"F61AD"}],"processInfo":{ "mongodbVersion" : "3.0.3", "gitVersion" : "b40106b36eecd1b4407eb1ad1af6bc60593c6105", "uname" : { "sysname" : "Linux", "release" : "3.14.35-28.38.amzn1.x86_64", "version" : "#1 SMP Wed Mar 11 22:50:37 UTC 2015", "machine" : "x86_64" }, "somap" : [ { "elfType" : 2, "b" : "400000", "buildId" : "D5BA38CEF0AE14E7FFF3FA98A110C841B25EAA21" }, { "b" : "7FFE7B1DE000", "elfType" : 3, "buildId" : "8DD7A7F645E37D599573A937DBAA3E931F55DE40" }, { "b" : "7F263F1AC000", "path" : "/lib64/libpthread.so.0", "elfType" : 3, "buildId" : "D48D3E6672A77B603B402F661BABF75E90AD570B" }, { "b" : "7F263EF3F000", "path" : "/usr/lib64/libssl.so.10", "elfType" : 3, "buildId" : "F711D67FF0C1FE2222FB003A30AB74DA26A5EF41" }, { "b" : "7F263EB5A000", "path" : "/lib64/libcrypto.so.10", "elfType" : 3, "buildId" : "777069F5EECC26CD66C5C8390FA2BF4E444979D1" }, { "b" : "7F263E952000", "path" : "/lib64/librt.so.1", "elfType" : 3, "buildId" : "E81013CBFA409053D58A65A0653271AB665A4619" }, { "b" : "7F263E74E000", "path" : "/lib64/libdl.so.2", "elfType" : 3, "buildId" : "62A8842157C62F95C3069CBF779AFCC26577A99A" }, { "b" : "7F263E44A000", "path" : "/usr/lib64/libstdc++.so.6", "elfType" : 3, "buildId" : "DD6383EEAC49E9BAA9E3D1080AE932F42CF8A385" }, { "b" : "7F263E148000", "path" : "/lib64/libm.so.6", "elfType" : 3, "buildId" : "5F97F8F8E5024E29717CF35998681F84D4A22D45" }, { "b" : "7F263DF32000", "path" : "/lib64/libgcc_s.so.1", "elfType" : 3, "buildId" : "C52958E393BDF8E8D090F36DE0F4E620D8736FBF" }, { "b" : "7F263DB71000", "path" : "/lib64/libc.so.6", "elfType" : 3, "buildId" : "77E48C0DBAE0843560558F1E51A5FC61602ABC5F" }, { "b" : "7F263F3C8000", "path" : "/lib64/ld-linux-x86-64.so.2", "elfType" : 3, "buildId" : "6F90843B9087FE91955FEB0355EB0858EF9E97B2" }, { "b" : "7F263D92E000", "path" : "/lib64/libgssapi_krb5.so.2", "elfType" : 3, "buildId" : "72C1DB5E2447A90D1BF34065BCC031B7263FFBAC" }, { "b" : "7F263D649000", "path" : "/lib64/libkrb5.so.3", "elfType" : 3, "buildId" : "2B8787E8E0C317CF820E5D830D923BC744E497F4" }, { "b" : "7F263D446000", "path" : "/usr/lib64/libcom_err.so.2", "elfType" : 3, "buildId" : "622F315EB5CB2F791E9B64020692EBA98195D06D" }, { "b" : "7F263D21B000", "path" : "/lib64/libk5crypto.so.3", "elfType" : 3, "buildId" : "B10FBFEC246C4EAD1719D16090D0BE54904BBFC9" }, { "b" : "7F263D005000", "path" : "/lib64/libz.so.1", "elfType" : 3, "buildId" : "89C6AF118B6B4FB6A73AE1813E2C8BDD722956D1" }, { "b" : "7F263CDFA000", "path" : "/lib64/libkrb5support.so.0", "elfType" : 3, "buildId" : "7292C0673D7C116E3389D3FFA67087A6B9287A71" }, { "b" : "7F263CBF7000", "path" : "/lib64/libkeyutils.so.1", "elfType" : 3, "buildId" : "37A58210FA50C91E09387765408A92909468D25B" }, { "b" : "7F263C9DD000", "path" : "/lib64/libresolv.so.2", "elfType" : 3, "buildId" : "17295EFA7A45ADE184C2A7A3294460BDC8B43277" }, { "b" : "7F263C7BC000", "path" : "/usr/lib64/libselinux.so.1", "elfType" : 3, "buildId" : "F5054DC94443326819FBF3065CFDF5E4726F57EE" } ] }}
 mongod(_ZN5mongo15printStackTraceERSo+0x29) [0xf6d609]
 mongod(_ZN5mongo29reportOutOfMemoryErrorAndExitEv+0x49) [0xf6d139]
 mongod(_ZN5mongo11mongoMallocEm+0x22) [0xef1212]
 mongod(_ZN5mongo7getMoreEPNS_16OperationContextEPKcixRNS_5CurOpEiRbPb+0x24C) [0xbba8ec]
 mongod(_ZN5mongo15receivedGetMoreEPNS_16OperationContextERNS_10DbResponseERNS_7MessageERNS_5CurOpE+0x35E) [0xace3ee]
 mongod(_ZN5mongo16assembleResponseEPNS_16OperationContextERNS_7MessageERNS_10DbResponseERKNS_11HostAndPortE+0x13CD) [0xad379d]
 mongod(_ZN5mongo16MyMessageHandler7processERNS_7MessageEPNS_21AbstractMessagingPortEPNS_9LastErrorE+0xDD) [0x82a47d]
 mongod(_ZN5mongo17PortMessageServer17handleIncomingMsgEPv+0x34B) [0xf2073b]
 libpthread.so.0(+0x7DF3) [0x7f263f1b3df3]
 libc.so.6(clone+0x6D) [0x7f263dc671ad]
-----  END BACKTRACE  -----

Comment by Mark Callaghan [ 24/Mar/15 ]

This is a problem in 3.0.1 and 3.1.0

Comment by Mark Callaghan [ 24/Mar/15 ]

My new test pattern is:
1) load 400M docs using 10 loader threads and 1 query thread
2) rate limit 1 writer thread to 1000 docs/second and use 10 or 30 query threads

vsz is stable after #1. But after running 2 rounds of #2 with ~8 hours per round, vsz is up to 130G. Looks like it will OOM

command lines are:
for #1 -> bash run.simple.bash.L10.Q1 1000000
for #2 -> bash run.simple.bash.L1.Q10 480

Comment by Mark Callaghan [ 11/Feb/15 ]

And I got an OOM, with vsz and rss growing fast at the end after having been stable. This is from "ps" sampled at 60 second intervals. The columns are vsz and rss:
135614640 103303280
135614640 103310344
135614640 103313240
135614640 111993784
135614640 133409724
143773872 140719864

The max page size did not change:
"maximum page size at eviction" : 625302856

Comment by Mark Callaghan [ 11/Feb/15 ]

Still using rc8, 70G block cache, zlib and oplog is disabled:

  • vsz is 135504048 after 400M docs inserted
  • vsz is 135585968 after 677M docs inserted

"maximum page size at eviction" is 625302856 after 400M docs inserted and has stopped growing.

I will run this for a few more days to see if vsz continues to grow.

Running a similar config except enabling the oplog has vsz at 93462712 after 400M docs. This is odd. I will let that run for another 400M docs. It also has a much small max page size:
"maximum page size at eviction" : 246616304

Comment by Mark Callaghan [ 10/Feb/15 ]

Using similar test, rc8, 70G block cache – after 300M docs inserted vsz is almost 130G and I have at least one page that is almost 600M. Will let the test run to see if I get an OOM.

"maximum page size at eviction" : 625302856,

Comment by Mark Benvenuto [ 30/Jan/15 ]

With RC7, I am cannot see any OOM issues.

For testing, I used

Machine Duration Inserts Max Docs Size MB
EC2, c3.8xlarge, 32 core, 60 GB RAM 3.45hr 158,600,000 21.3
Softlayer, 24 core, 128 GB RAM 13:57 hr 327,600,000 2446.3

It was on the larger machine that I saw a page size problem, but no OOM. michael.cahill I will reopen the issue and change the title to reflect this issue is tracking the large.

Comment by Michael Cahill (Inactive) [ 29/Jan/15 ]

Hi mdcallag, we fixed one bug that was allowing pages to grow unexpectedly large in WiredTiger. I wasn't confident that the fix was sufficient, so we didn't call it out in RC7, but I'm glad it has helped.

Comment by Mark Callaghan [ 29/Jan/15 ]

rc7 looks better. After 15 minutes and 10M docs inserted vsz is 1773576 (1.8B) and "maximum page size at eviction" : 50150781,

Is this improvement expected?

Comment by Mark Callaghan [ 28/Jan/15 ]

I was able to get an OOM on a 144G server with --wiredTigerCacheSizeGB 70 using the test described 2 comments back.

Comment by Mark Callaghan [ 27/Jan/15 ]

Better test case, use a --wiredTigerCacheSizeGB 1. After ~15 minutes vsz is ~5G (5233240) and max page size is almost 1.5G. I think large pages are the problem.
"maximum page size at eviction" : 1486021050,

Comment by Mark Callaghan [ 27/Jan/15 ]

Files to reproduce the problem on rc6 with 10 loader threads and 0 query threads. The command line for mongod is:
numactl --interleave=all bin/mongod --config mongo.conf --storageEngine wiredTiger --wiredTigerCacheSizeGB 70

Comment by Mark Callaghan [ 27/Jan/15 ]

After 32,000 seconds and 400M inserts:

  • vsz and rss are ~103G and still growing with --wiredTigerCacheSizeGB 70
  • worst case stall is ~110 seconds
  • there were 4M writes and 26330 took >= 2 seconds. Of the stalls, 11850 are >= 5 seconds, 4039 are >= 10 seconds, 377 are >= 20 seconds
  • this is 32B, that is too big even if only 1 page is that big --> "maximum page size at eviction" : 32690460080,
  • db.serverStatus() includes:
    "bytes currently in the cache" : 60123488841,
    "maximum bytes configured" : 75161927680,
Comment by Mark Callaghan [ 26/Jan/15 ]

from a test with 10 loader threads, 0 query threads using rc6 after ~380M docs inserted. The value below is ~32B. Even if this is just the root page for the tree that is too big.

"maximum page size at eviction" : 32690460080,

Comment by Mark Callaghan [ 26/Jan/15 ]

With rc6 running iibench-mongodb with 10 loader threads, no query threads and "--storageEngine wiredTiger --wiredTigerCacheSizeGB 70". After 341M docs inserted I get vsz and rss up to 94G and growing and 94/70 is about 1.34. I think this is better than before but that overhead is still much higher than I expect.

root 8902 662 64.1 96114260 95182260 ? Sl 07:11 2254:02 bin/mongod.rc6.jem.roxsnappy70g --config /data/mysql/mongo.280rc4/mongo.conf --storageEngine wiredTiger --wiredTigerCacheSizeGB 70

Comment by Mark Callaghan [ 24/Jan/15 ]

I am running 3.0.0rc6 using the uploaded repro case. After a few minutes I see this. If that is accurate, why is there a 65MB page in my WT b-tree. That seems like a serious performance problem.

"maximum page size at eviction" : 68375657,

Comment by Mark Benvenuto [ 24/Jan/15 ]

Yes, The fix will be in rc6.

Fixed with (https://github.com/mongodb/mongo/commit/876f5f475bf9242c098e6de641aa7ee0982803c2)

Comment by Mark Callaghan [ 23/Jan/15 ]

Thanks for the details and taking the time on the reproduction. Is the fix likely to be in rc6?

Comment by Mark Benvenuto [ 23/Jan/15 ]

"Maximum page size at eviction" is the largest page size seen during the lifetime of the server. It will not give you the largest page evicted in any particular eviction cycle. This means the counter is not particularly useful to track how large the typical sie of a page is evicted over time since it never resets.

As WT is a highly concurrent store, every cursor maintains a hazard pointer to each page it uses, and since mongod runs WT in snapshot isolation mode, transactions also pin pages in the cache to ensure transactionally safe reads of the data. Your read workload forces WT to keep pages in memory by both pinning the transaction id, and by grabbing handles on certain pages. As I understand it, this is likely triggering a bug in bt_split code where WT is ignoring this page as a split candidate.

michael.cahill alexander.gorrod keith.bostic have a better understanding of the issue.

Also, use your latest posted repro, I cannot repro the issue any longer against master.

I ran for 4:08 hr, achieved 81,400,000 inserts, and RSS & VSZ were stable at 6.7 GB & 7.08 GB respectively. This was against master with git hash 616bc35c6f7ca8b66b531e4086863697e01a0181. Final DB was ~35 GB.

The machine was EC2 c3.4xlarge with 16 core, 30 gigs of memory, and SSD storage.

iibench status
Thread [Thread-2,5,main] 81,400,000 inserts : 14,905 seconds : cum ips=5,461.15 : int ips=1,515.20 : cum avg qry=0.78 : int avg qry=0.99 : cum avg qps=1,277.63 : int avg qps=1,007.18 : exceptions=0

Comment by Mark Callaghan [ 23/Jan/15 ]

Restarted a test for rc5 and after ~10 minutes I see a large and growing value for that. So leaf_page_max and internal_page_max aren't respected because of the bug? But I am confused because the growth rate is faster when I use iibench-mongodb with 1 insert and 1 query thread compared to using just the insert thread.

"maximum page size at eviction" : 72350154,

Comment by Mark Benvenuto [ 23/Jan/15 ]

I was able to get a repro with RC5 while collecting additional metrics from db.serverStatus(). It appears that most the of "leak" is around splitting large pages. I see wt cache: maximum page size at eviction grow to 4096 MB (ie, 4GB) which is correlated with the growth in RSS &VSZ sizes. I am investigating whether this repros with master given additional work that has been done port RC5 in the page split code.

Comment by Mark Callaghan [ 23/Jan/15 ]

Had files in too many places and uploaded the wrong ones. These should reproduce the problem.

Comment by Mark Callaghan [ 23/Jan/15 ]

run.simple.bash changed to have no sleep between queries

Comment by Mark Callaghan [ 23/Jan/15 ]

For the files I attached, I changed to export MS_BETWEEN_QUERIES=0 so the QPS rate is higher and on my ubuntu 14.04 VM the problem arrives fast – 2.1M inserts, 206 seconds and vsz is up to 12.7B (12765688)

Thread[main,5,main] *** creating secondary index on price + customerid
Thread[main,5,main] *** creating secondary index on cashregisterid + price + customerid
Thread[main,5,main] *** creating secondary index on price + dateandtime + customerid
Thread[main,5,main]  creating 4,194,304 bytes of random character data...
Thread[main,5,main]  creating 4,194,304 bytes of compressible character data...
Thread[Thread-3,5,main]Writer thread 0 : started to load collection purchases_index
Thread[Thread-2,5,main]100,000 inserts : 5 seconds : cum ips=18,830.93 : int ips=18,830.93 : cum avg qry=0.00 : int avg qry=0.00 : cum avg qps=0.00 : int avg qps=0.00 : exceptions=0
Thread[Thread-4,5,main]Query thread 0 : ready to query collection purchases_index
Thread[Thread-2,5,main]200,000 inserts : 11 seconds : cum ips=17,874.61 : int ips=17,013.18 : cum avg qry=0.61 : int avg qry=0.61 : cum avg qps=1,636.28 : int avg qps=1,450.14 : exceptions=0
Thread[Thread-2,5,main]300,000 inserts : 17 seconds : cum ips=16,962.79 : int ips=15,389.69 : cum avg qry=0.60 : int avg qry=0.60 : cum avg qps=1,652.92 : int avg qps=1,666.31 : exceptions=0
Thread[Thread-2,5,main]400,000 inserts : 24 seconds : cum ips=16,605.31 : int ips=15,615.16 : cum avg qry=0.60 : int avg qry=0.59 : cum avg qps=1,662.70 : int avg qps=1,680.65 : exceptions=0
Thread[Thread-2,5,main]500,000 inserts : 30 seconds : cum ips=16,383.16 : int ips=15,546.21 : cum avg qry=0.60 : int avg qry=0.61 : cum avg qps=1,654.43 : int avg qps=1,630.98 : exceptions=0
Thread[Thread-2,5,main]600,000 inserts : 37 seconds : cum ips=16,213.01 : int ips=15,413.79 : cum avg qry=0.61 : int avg qry=0.63 : cum avg qps=1,639.44 : int avg qps=1,582.83 : exceptions=0
Thread[Thread-2,5,main]700,000 inserts : 43 seconds : cum ips=16,024.84 : int ips=14,969.00 : cum avg qry=0.61 : int avg qry=0.64 : cum avg qps=1,625.05 : int avg qps=1,557.39 : exceptions=0
Thread[Thread-2,5,main]800,000 inserts : 52 seconds : cum ips=15,156.24 : int ips=10,990.57 : cum avg qry=0.65 : int avg qry=0.90 : cum avg qps=1,524.93 : int avg qps=1,110.68 : exceptions=0
Thread[Thread-2,5,main]900,000 inserts : 61 seconds : cum ips=14,722.24 : int ips=11,964.89 : cum avg qry=0.67 : int avg qry=0.83 : cum avg qps=1,476.33 : int avg qps=1,202.62 : exceptions=0
Thread[Thread-2,5,main]1,000,000 inserts : 70 seconds : cum ips=14,238.66 : int ips=10,998.25 : cum avg qry=0.70 : int avg qry=0.88 : cum avg qps=1,426.99 : int avg qps=1,128.86 : exceptions=0
Thread[Thread-2,5,main]1,100,000 inserts : 80 seconds : cum ips=13,755.11 : int ips=10,261.16 : cum avg qry=0.72 : int avg qry=0.96 : cum avg qps=1,376.36 : int avg qps=1,041.74 : exceptions=0
Thread[Thread-2,5,main]1,200,000 inserts : 90 seconds : cum ips=13,304.52 : int ips=9,784.18 : cum avg qry=0.75 : int avg qry=0.98 : cum avg qps=1,333.38 : int avg qps=1,022.75 : exceptions=0
Thread[Thread-2,5,main]1,300,000 inserts : 100 seconds : cum ips=12,901.89 : int ips=9,447.67 : cum avg qry=0.77 : int avg qry=1.04 : cum avg qps=1,291.77 : int avg qps=958.55 : exceptions=0
Thread[Thread-2,5,main]1,400,000 inserts : 112 seconds : cum ips=12,481.79 : int ips=8,777.46 : cum avg qry=0.80 : int avg qry=1.11 : cum avg qps=1,249.74 : int avg qps=901.20 : exceptions=0
Thread[Thread-2,5,main]1,500,000 inserts : 124 seconds : cum ips=12,040.51 : int ips=8,056.65 : cum avg qry=0.83 : int avg qry=1.18 : cum avg qps=1,207.51 : int avg qps=846.62 : exceptions=0
Thread[Thread-2,5,main]1,600,000 inserts : 136 seconds : cum ips=11,709.83 : int ips=8,282.46 : cum avg qry=0.85 : int avg qry=1.13 : cum avg qps=1,177.11 : int avg qps=877.24 : exceptions=0
Thread[Thread-2,5,main]1,700,000 inserts : 149 seconds : cum ips=11,389.02 : int ips=7,918.78 : cum avg qry=0.87 : int avg qry=1.20 : cum avg qps=1,146.59 : int avg qps=830.98 : exceptions=0
Thread[Thread-2,5,main]1,800,000 inserts : 162 seconds : cum ips=11,063.16 : int ips=7,441.93 : cum avg qry=0.89 : int avg qry=1.30 : cum avg qps=1,114.08 : int avg qps=767.35 : exceptions=0
Thread[Thread-2,5,main]1,900,000 inserts : 176 seconds : cum ips=10,767.13 : int ips=7,283.18 : cum avg qry=0.92 : int avg qry=1.29 : cum avg qps=1,086.25 : int avg qps=770.78 : exceptions=0
Thread[Thread-2,5,main]2,000,000 inserts : 191 seconds : cum ips=10,464.74 : int ips=6,817.00 : cum avg qry=0.94 : int avg qry=1.36 : cum avg qps=1,058.35 : int avg qps=733.22 : exceptions=0
Thread[Thread-2,5,main]2,100,000 inserts : 206 seconds : cum ips=10,169.64 : int ips=6,491.03 : cum avg qry=0.97 : int avg qry=1.49 : cum avg qps=1,028.60 : int avg qps=669.38 : exceptions=0

while :; do ps aux | grep mongod | grep -v grep; sleep 10; done
mdcallag  19406  0.6  1.1 232940 47704 ?        Sl   17:55   0:00 bin/mongod --config /b/mongo280rc5/mongo.conf --wiredTigerCacheSizeGB 1
mdcallag  19406  0.5  1.1 232940 47704 ?        Sl   17:55   0:00 bin/mongod --config /b/mongo280rc5/mongo.conf --wiredTigerCacheSizeGB 1
mdcallag  19406  7.4 11.9 1234420 482968 ?      Sl   17:55   0:02 bin/mongod --config /b/mongo280rc5/mongo.conf --wiredTigerCacheSizeGB 1
mdcallag  19406 28.8 29.4 2284024 1188596 ?     Sl   17:55   0:11 bin/mongod --config /b/mongo280rc5/mongo.conf --wiredTigerCacheSizeGB 1
mdcallag  19406 42.9 45.3 4380152 1828508 ?     Sl   17:55   0:21 bin/mongod --config /b/mongo280rc5/mongo.conf --wiredTigerCacheSizeGB 1
mdcallag  19406 52.5 61.2 4380152 2472064 ?     Sl   17:55   0:31 bin/mongod --config /b/mongo280rc5/mongo.conf --wiredTigerCacheSizeGB 1
mdcallag  19406 59.7 76.6 6476280 3093408 ?     Sl   17:55   0:41 bin/mongod --config /b/mongo280rc5/mongo.conf --wiredTigerCacheSizeGB 1
mdcallag  19406 64.2 78.6 6476280 3171240 ?     Sl   17:55   0:51 bin/mongod --config /b/mongo280rc5/mongo.conf --wiredTigerCacheSizeGB 1
mdcallag  19406 66.7 78.0 8573432 3149468 ?     Sl   17:55   1:00 bin/mongod --config /b/mongo280rc5/mongo.conf --wiredTigerCacheSizeGB 1
mdcallag  19406 68.2 77.6 8573432 3131532 ?     Sl   17:55   1:08 bin/mongod --config /b/mongo280rc5/mongo.conf --wiredTigerCacheSizeGB 1
mdcallag  19406 69.3 77.1 8573432 3110248 ?     Sl   17:55   1:16 bin/mongod --config /b/mongo280rc5/mongo.conf --wiredTigerCacheSizeGB 1
mdcallag  19406 70.0 77.2 10669560 3117092 ?    Sl   17:55   1:24 bin/mongod --config /b/mongo280rc5/mongo.conf --wiredTigerCacheSizeGB 1
mdcallag  19406 70.7 76.6 10669560 3091768 ?    Sl   17:55   1:32 bin/mongod --config /b/mongo280rc5/mongo.conf --wiredTigerCacheSizeGB 1
mdcallag  19406 71.3 79.3 10669560 3200628 ?    Sl   17:55   1:39 bin/mongod --config /b/mongo280rc5/mongo.conf --wiredTigerCacheSizeGB 1
mdcallag  19406 71.4 79.9 10669560 3227004 ?    Sl   17:55   1:47 bin/mongod --config /b/mongo280rc5/mongo.conf --wiredTigerCacheSizeGB 1
mdcallag  19406 71.4 78.4 10669560 3164648 ?    Sl   17:55   1:54 bin/mongod --config /b/mongo280rc5/mongo.conf --wiredTigerCacheSizeGB 1
mdcallag  19406 71.1 74.8 12765688 3021380 ?    Sl   17:55   2:00 bin/mongod --config /b/mongo280rc5/mongo.conf --wiredTigerCacheSizeGB 1
mdcallag  19406 70.8 70.2 12765688 2832796 ?    Sl   17:55   2:07 bin/mongod --config /b/mongo280rc5/mongo.conf --wiredTigerCacheSizeGB 1
mdcallag  19406 70.6 63.4 12765688 2560728 ?    Sl   17:55   2:14 bin/mongod --config /b/mongo280rc5/mongo.conf --wiredTigerCacheSizeGB 1
mdcallag  19406 70.4 69.5 12765688 2806968 ?    Sl   17:55   2:20 bin/mongod --config /b/mongo280rc5/mongo.conf --wiredTigerCacheSizeGB 1
mdcallag  19406 70.1 71.8 12765688 2900236 ?    Sl   17:55   2:27 bin/mongod --config /b/mongo280rc5/mongo.conf --wiredTigerCacheSizeGB 1
mdcallag  19406 69.7 69.2 12765688 2792748 ?    Sl   17:55   2:33 bin/mongod --config /b/mongo280rc5/mongo.conf --wiredTigerCacheSizeGB 1
mdcallag  19406 69.4 66.1 12765688 2668852 ?    Sl   17:55   2:39 bin/mongod --config /b/mongo280rc5/mongo.conf --wiredTigerCacheSizeGB 1

Comment by Mark Callaghan [ 23/Jan/15 ]

jmongoiibench.java has local edits, not shared with github yet.
run.simple.bash matches my reproduction except that it has a smaller value for MAX_ROWS because I am running this on my laptop VM to confirm I can reproduce it on modern Ubuntu. With a 1G block cache mongod is up to 1.7B for vsz and still growing:
bin/mongod --config /b/mongo280rc5/mongo.conf --storageEngine wiredTiger --wiredTigerCacheSizeGB 1

Comment by Mark Callaghan [ 23/Jan/15 ]

I used "--wiredTigerCacheSizeGB 4" for 4G block cache.

Looks like you were running the query thread with a 1ms sleep between queries which is similar to what I used although my edits to that file are a bit different. Can you confirm that you get between 100 and 1000 queries per second while the test is running? That should be in the test output.

I suspect that block cache misses are the key to reproduce the problem so the database needs to be larger than the WT block cache. The growth rate for vsz and rss is a problem when the query thread is not used, but to get the rapid growth I need to have the query thread. I suggest:

  • change run.simple.bash to use "export LENGTH_CHAR_FIELDS=2000" and "export PERCENT_COMPRESSIBLE=50"
  • Reduce the block cache from 15G to 4G
  • Finally, snappy seemed to have more problems than zlib, so use the snappy block compressor

With this setup I get OOM after ~57M inserts and ~13,000 seconds

Comment by Mark Benvenuto [ 23/Jan/15 ]

Dear Mark,

A have tried to repro your issue against a 32GB Physical RAM machine with a 15GB WT cache (automatically calculated by mongod), and after a 1:40 hour with 67,000,000 documents, I see RSS & VSZ stable at about 16.6 GB on RC5. I used the default of snappy collection compression, and journal on.

Here is the changes to iibench I used to try to repro your issue:
https://github.com/markbenvenuto/iibench-mongodb/commits/SERVER-16902-repro

When you earlier said

4G block caches were used for all tests

Did you set --wiredTigerCacheSizeGB 4 or did you mean something else?

Thanks

mark

Comment by Mark Callaghan [ 21/Jan/15 ]

updated with some results for rc4. The problem appears to be new in rc5

Comment by Mark Callaghan [ 21/Jan/15 ]

I will start tests for rc4. I am not certain this was not a problem there.

Comment by Mark Callaghan [ 20/Jan/15 ]

Regardless of allocator, there are periods of crazy growth in vsz and rss with the WT b-tree. This is from "ps" running at 60 second intervals. Here is one example from jemalloc

root      5782  366  5.7 8984056 8549556 ?     Sl   19:18 172:32 bin/mongod --config /data/mysql/mongo.280rc4/mongo.conf --storageEngine wiredTiger --wiredTigerCacheSizeGB 4
root      5782  369  8.1 12416504 12061548 ?   Sl   19:18 177:36 bin/mongod --config /data/mysql/mongo.280rc4/mongo.conf --storageEngine wiredTiger --wiredTigerCacheSizeGB 4
root      5782  372 10.2 15635960 15276540 ?   Sl   19:18 182:39 bin/mongod --config /data/mysql/mongo.280rc4/mongo.conf --storageEngine wiredTiger --wiredTigerCacheSizeGB 4
root      5782  375 12.3 18658808 18301452 ?   Sl   19:18 187:43 bin/mongod --config /data/mysql/mongo.280rc4/mongo.conf --storageEngine wiredTiger --wiredTigerCacheSizeGB 4
root      5782  377 14.2 21501432 21145260 ?   Sl   19:18 192:46 bin/mongod --config /data/mysql/mongo.280rc4/mongo.conf --storageEngine wiredTiger --wiredTigerCacheSizeGB 4
root      5782  380 16.0 24212984 23855748 ?   Sl   19:18 197:50 bin/mongod --config /data/mysql/mongo.280rc4/mongo.conf --storageEngine wiredTiger --wiredTigerCacheSizeGB 4
root      5782  382 17.7 26764792 26410476 ?   Sl   19:18 202:53 bin/mongod --config /data/mysql/mongo.280rc4/mongo.conf --storageEngine wiredTiger --wiredTigerCacheSizeGB 4
root      5782  384 19.4 29193720 28839012 ?   Sl   19:18 207:56 bin/mongod --config /data/mysql/mongo.280rc4/mongo.conf --storageEngine wiredTiger --wiredTigerCacheSizeGB 4
root      5782  386 21.0 31528440 31172772 ?   Sl   19:18 212:59 bin/mongod --config /data/mysql/mongo.280rc4/mongo.conf --storageEngine wiredTiger --wiredTigerCacheSizeGB 4
root      5782  388 22.5 33785336 33431292 ?   Sl   19:18 218:02 bin/mongod --config /data/mysql/mongo.280rc4/mongo.conf --storageEngine wiredTiger --wiredTigerCacheSizeGB 4
root      5782  390 24.0 35993080 35635956 ?   Sl   19:18 223:06 bin/mongod --config /data/mysql/mongo.280rc4/mongo.conf --storageEngine wiredTiger --wiredTigerCacheSizeGB 4
root      5782  392 25.4 38090232 37734756 ?   Sl   19:18 228:09 bin/mongod --config /data/mysql/mongo.280rc4/mongo.conf --storageEngine wiredTiger --wiredTigerCacheSizeGB 4
root      5782  394 26.7 40113656 39758052 ?   Sl   19:18 233:12 bin/mongod --config /data/mysql/mongo.280rc4/mongo.conf --storageEngine wiredTiger --wiredTigerCacheSizeGB 4
root      5782  396 28.0 42055160 41696340 ?   Sl   19:18 238:15 bin/mongod --config /data/mysql/mongo.280rc4/mongo.conf --storageEngine wiredTiger --wiredTigerCacheSizeGB 4
root      5782  398 29.3 43963896 43606380 ?   Sl   19:18 243:18 bin/mongod --config /data/mysql/mongo.280rc4/mongo.conf --storageEngine wiredTiger --wiredTigerCacheSizeGB 4
root      5782  400 30.6 45807096 45450948 ?   Sl   19:18 248:21 bin/mongod --config /data/mysql/mongo.280rc4/mongo.conf --storageEngine wiredTiger --wiredTigerCacheSizeGB 4
root      5782  401 31.8 47597048 47242980 ?   Sl   19:18 253:24 bin/mongod --config /data/mysql/mongo.280rc4/mongo.conf --storageEngine wiredTiger --wiredTigerCacheSizeGB 4
root      5782  403 33.0 49354232 48999900 ?   Sl   19:18 258:26 bin/mongod --config /data/mysql/mongo.280rc4/mongo.conf --storageEngine wiredTiger --wiredTigerCacheSizeGB 4
root      5782  404 34.1 51070456 50717220 ?   Sl   19:18 263:29 bin/mongod --config /data/mysql/mongo.280rc4/mongo.conf --storageEngine wiredTiger --wiredTigerCacheSizeGB 4
root      5782  406 35.3 52753912 52397316 ?   Sl   19:18 268:32 bin/mongod --config /data/mysql/mongo.280rc4/mongo.conf --storageEngine wiredTiger --wiredTigerCacheSizeGB 4
root      5782  407 36.4 54404600 54049428 ?   Sl   19:18 273:34 bin/mongod --config /data/mysql/mongo.280rc4/mongo.conf --storageEngine wiredTiger --wiredTigerCacheSizeGB 4
root      5782  409 37.5 56026616 55673028 ?   Sl   19:18 278:37 bin/mongod --config /data/mysql/mongo.280rc4/mongo.conf --storageEngine wiredTiger --wiredTigerCacheSizeGB 4
root      5782  410 38.6 57636344 57281316 ?   Sl   19:18 283:39 bin/mongod --config /data/mysql/mongo.280rc4/mongo.conf --storageEngine wiredTiger --wiredTigerCacheSizeGB 4
root      5782  412 39.6 59233784 58879308 ?   Sl   19:18 288:42 bin/mongod --config /data/mysql/mongo.280rc4/mongo.conf --storageEngine wiredTiger --wiredTigerCacheSizeGB 4
root      5782  413 40.7 60806648 60450108 ?   Sl   19:18 293:45 bin/mongod --config /data/mysql/mongo.280rc4/mongo.conf --storageEngine wiredTiger --wiredTigerCacheSizeGB 4

And an example with glibc malloc

root      3738  481 37.0 56105624 54944764 ?   Sl   Jan18 2449:34 bin/mongod --config /data/mysql/mongo.280rc4/mongo.conf --storageEngine wiredTiger --wiredTigerCacheSizeGB 4
root      3738  482 37.6 57088664 55926528 ?   Sl   Jan18 2455:59 bin/mongod --config /data/mysql/mongo.280rc4/mongo.conf --storageEngine wiredTiger --wiredTigerCacheSizeGB 4
root      3738  482 41.6 62986904 61854092 ?   Sl   Jan18 2462:26 bin/mongod --config /data/mysql/mongo.280rc4/mongo.conf --storageEngine wiredTiger --wiredTigerCacheSizeGB 4
root      3738  482 45.4 68623000 67498608 ?   Sl   Jan18 2468:51 bin/mongod --config /data/mysql/mongo.280rc4/mongo.conf --storageEngine wiredTiger --wiredTigerCacheSizeGB 4
root      3738  483 49.1 74128024 72958080 ?   Sl   Jan18 2475:15 bin/mongod --config /data/mysql/mongo.280rc4/mongo.conf --storageEngine wiredTiger --wiredTigerCacheSizeGB 4
root      3738  483 52.6 79305368 78121420 ?   Sl   Jan18 2481:39 bin/mongod --config /data/mysql/mongo.280rc4/mongo.conf --storageEngine wiredTiger --wiredTigerCacheSizeGB 4
root      3738  483 55.9 84220568 83049600 ?   Sl   Jan18 2488:02 bin/mongod --config /data/mysql/mongo.280rc4/mongo.conf --storageEngine wiredTiger --wiredTigerCacheSizeGB 4
root      3738  484 59.1 88939160 87764048 ?   Sl   Jan18 2494:24 bin/mongod --config /data/mysql/mongo.280rc4/mongo.conf --storageEngine wiredTiger --wiredTigerCacheSizeGB 4
root      3738  484 62.1 93461144 92257644 ?   Sl   Jan18 2500:45 bin/mongod --config /data/mysql/mongo.280rc4/mongo.conf --storageEngine wiredTiger --wiredTigerCacheSizeGB 4
root      3738  484 65.0 97786520 96600920 ?   Sl   Jan18 2507:06 bin/mongod --config /data/mysql/mongo.280rc4/mongo.conf --storageEngine wiredTiger --wiredTigerCacheSizeGB 4
root      3738  484 67.9 101980824 100784364 ? Sl   Jan18 2513:26 bin/mongod --config /data/mysql/mongo.280rc4/mongo.conf --storageEngine wiredTiger --wiredTigerCacheSizeGB 4
root      3738  485 70.5 105912984 104761248 ? Sl   Jan18 2519:45 bin/mongod --config /data/mysql/mongo.280rc4/mongo.conf --storageEngine wiredTiger --wiredTigerCacheSizeGB 4
root      3738  485 73.1 109714072 108579396 ? Sl   Jan18 2526:03 bin/mongod --config /data/mysql/mongo.280rc4/mongo.conf --storageEngine wiredTiger --wiredTigerCacheSizeGB 4
root      3738  485 75.6 113449624 112284800 ? Sl   Jan18 2532:22 bin/mongod --config /data/mysql/mongo.280rc4/mongo.conf --storageEngine wiredTiger --wiredTigerCacheSizeGB 4
root      3738  486 78.0 117054104 115862276 ? Sl   Jan18 2538:39 bin/mongod --config /data/mysql/mongo.280rc4/mongo.conf --storageEngine wiredTiger --wiredTigerCacheSizeGB 4
root      3738  486 80.3 120461976 119292492 ? Sl   Jan18 2544:56 bin/mongod --config /data/mysql/mongo.280rc4/mongo.conf --storageEngine wiredTiger --wiredTigerCacheSizeGB 4
root      3738  486 82.6 123804312 122591652 ? Sl   Jan18 2551:12 bin/mongod --config /data/mysql/mongo.280rc4/mongo.conf --storageEngine wiredTiger --wiredTigerCacheSizeGB 4
root      3738  486 84.7 127015576 125831320 ? Sl   Jan18 2557:28 bin/mongod --config /data/mysql/mongo.280rc4/mongo.conf --storageEngine wiredTiger --wiredTigerCacheSizeGB 4
root      3738  487 86.9 130161304 128961388 ? Sl   Jan18 2563:44 bin/mongod --config /data/mysql/mongo.280rc4/mongo.conf --storageEngine wiredTiger --wiredTigerCacheSizeGB 4
root      3738  487 88.9 133175960 131997404 ? Sl   Jan18 2569:59 bin/mongod --config /data/mysql/mongo.280rc4/mongo.conf --storageEngine wiredTiger --wiredTigerCacheSizeGB 4
root      3738  487 90.9 136125080 134905752 ? Sl   Jan18 2576:14 bin/mongod --config /data/mysql/mongo.280rc4/mongo.conf --storageEngine wiredTiger --wiredTigerCacheSizeGB 4
root      3738  487 92.1 138812056 136750932 ? Sl   Jan18 2582:27 bin/mongod --config /data/mysql/mongo.280rc4/mongo.conf --storageEngine wiredTiger --wiredTigerCacheSizeGB 4
root      3738  488 93.4 141433496 138678284 ? Sl   Jan18 2588:41 bin/mongod --config /data/mysql/mongo.280rc4/mongo.conf --storageEngine wiredTiger --wiredTigerCacheSizeGB 4
root      3738  488 93.7 141891588 139181416 ? Sl   Jan18 2594:13 bin/mongod --config /data/mysql/mongo.280rc4/mongo.conf --storageEngine wiredTiger --wiredTigerCacheSizeGB 4

Comment by Mark Callaghan [ 20/Jan/15 ]

I ran iibench-mongodb using 1 insert thread and 1 query thread. The insert thread adds 100M docs to an empty collection with 10 docs per insert call. The query thread sleeps 1 ms between each query. It is running to increase stress on the block cache and get more fragmentation from the frequent use of malloc. I previously ran tests without the query thread but it looks like fragmentation from reading data into the block cache is the problem and the query makes that visible faster.

All tests use rc5 or rc4 and I compared:

  • tcmalloc, glibc malloc and jemalloc
  • WT-btree with zlib (zlib/snappy below), WT-btree with snappy (snappy/snappy below), RocksDB with zlib, RocksDB with snappy

Other notes:

  • Journal was enabled for WT with snappy compression. WAL was enabled for RocksDB without compression. Disabling the journal or using the journal without compression doesn't fix this problem.
  • 4G block caches were used for all tests

These results show either "OOM" or 2 values – vsz and rss at test end. OOM == out of memory and in that case I show how far the insert thread progressed. The test server has 144G of RAM.

6.2, 4.0 - rc4, WT btree, none/no-journal, jemalloc
6.1, 4.0 - rc4, WT btree, snappy/no-journal, jemalloc
7.6, 4.5 - rc4, WT btree, zlib/no-journal, jemalloc

OOM - rc5, WT btree, snappy/no-journal, jemalloc, 25% done
OOM - rc5, WT btree, none/snappy, jemalloc, 23% done
OOM - rc5, WT btree, none/no-journal, jemalloc, 63% done
91.4, 5.2 - rc5, WT btree, zlib/no-journal, jemalloc

89.7, 81.1 - rc5, WT lsm, zlib/snappy, jemalloc

OOM - rc5, WT btree, snappy/snappy, glibc malloc, 55% done
OOM - rc5, WT btree, snappy/snappy, jemalloc, 57% done
OOM - rc5, WT btree, snappy/snappy, tcmalloc, 72% done

OOM - rc5, WT btree, zlib/snappy, glibc malloc, 89% done
55.8, 5.2 - rc5, WT btree, zlib/snappy, jemalloc
102.1, 10.1 - rc5, WT btree, zlib/snappy, tcmalloc

14.7, 12.9 - rc5, RocksDB, snappy, glibc malloc —> inserted 400M via 10 threads, not 100M via 1 thread
7.0, 6.3 - rc5, RocksDB, snappy, jemalloc
6.6, 6.2 - rc5, RocksDB, snappy, tcmalloc

7.8, 6.5 - rc5, RocksDB, zlib, glibc malloc
7.2, 6.5 - rc5, RocksDB, zlib, jemalloc
6.8, 6.4 - rc5, RocksDB, zlib, tcmalloc

Comment by Mark Callaghan [ 18/Jan/15 ]

New theory with tests in progress, this is just really, really bad fragmentation from malloc. If there is a leak the impact from it is much less from the malloc fragmentation.

Comment by Mark Callaghan [ 16/Jan/15 ]

If you run iibench before fixing the write-stall problems then set socketTimeout to 600,000 in src/jmongoiibench.java

With the WT cache set to 2G, vsz grows to 5.4G

Comment by Michael Cahill (Inactive) [ 16/Jan/15 ]

Hi mdcallag, there are some changes in the WiredTiger pipeline. Once they're stable, I'll run iibench to confirm that they fix this issue.

Comment by Mark Callaghan [ 16/Jan/15 ]

--wiredTigerCacheSizeGB 32 has same problem as --wiredTigerCacheSizeGB 16. Now trying again with 16G but reducing client threads from 10 to 1

Comment by Mark Callaghan [ 16/Jan/15 ]

Tried with --wiredTigerCacheSizeGB 16 but the test ran too slow and there were many more exceptions from https://jira.mongodb.org/browse/SERVER-16717. Retrying with a 32G cache.

Generated at Thu Feb 08 03:42:39 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.