|
Author:
{u'username': u'michaelcahill', u'name': u'Michael Cahill', u'email': u'michael.cahill@mongodb.com'}
Message: Complete SERVER-16902 backport: in-memory splits during checkpoints.
This change was partially backported already via changes to
__wt_page_can_split, make it work for real.
Branch: mongodb-3.0
https://github.com/wiredtiger/wiredtiger/commit/5c32a0bee76ec0fd09722b80b40825a272cb46e9
|
|
I've re-run the test I did to confirm if the issue here has been resolved.
As it stands, there are times when the "maximum page size at eviction" can exceed the 10MB limit, but it is never by the orders of magnitude seen before.
I'm marking this one as resolved.
|
|
I have one replica set server that encounters an OOM fatal crash every time I make it a primary for the shard. Watching the server with mongostat this time I noticed that the resident memory grew to 29.4G even though I had set the wiredTigerCacheSizeGB to 22 GB (--wiredTigerCacheSizeGB 22). The server is an AWS i2.xlarge instance running Amazon Linux (ami-1ecae776). The backtrace in the log was:
2015-06-05T18:24:17.343+0000 I NETWORK [initandlisten] connection accepted from 10.233.151.237:54382 #22573 (229 connections now open)
|
2015-06-05T18:24:17.344+0000 F - [conn22195] out of memory.
|
|
0xf6d609 0xf6d139 0xef1212 0xbba8ec 0xace3ee 0xad379d 0x82a47d 0xf2073b 0x7f263f1b3df3 0x7f263dc671ad
|
----- BEGIN BACKTRACE -----
|
{"backtrace":[{"b":"400000","o":"B6D609"},{"b":"400000","o":"B6D139"},{"b":"400000","o":"AF1212"},{"b":"400000","o":"7BA8EC"},{"b":"400000","o":"6CE3EE"},{"b":"400000","o":"6D379D"},{"b":"400000","o":"42A47D"},{"b":"400000","o":"B2073B"},{"b":"7F263F1AC000","o":"7DF3"},{"b":"7F263DB71000","o":"F61AD"}],"processInfo":{ "mongodbVersion" : "3.0.3", "gitVersion" : "b40106b36eecd1b4407eb1ad1af6bc60593c6105", "uname" : { "sysname" : "Linux", "release" : "3.14.35-28.38.amzn1.x86_64", "version" : "#1 SMP Wed Mar 11 22:50:37 UTC 2015", "machine" : "x86_64" }, "somap" : [ { "elfType" : 2, "b" : "400000", "buildId" : "D5BA38CEF0AE14E7FFF3FA98A110C841B25EAA21" }, { "b" : "7FFE7B1DE000", "elfType" : 3, "buildId" : "8DD7A7F645E37D599573A937DBAA3E931F55DE40" }, { "b" : "7F263F1AC000", "path" : "/lib64/libpthread.so.0", "elfType" : 3, "buildId" : "D48D3E6672A77B603B402F661BABF75E90AD570B" }, { "b" : "7F263EF3F000", "path" : "/usr/lib64/libssl.so.10", "elfType" : 3, "buildId" : "F711D67FF0C1FE2222FB003A30AB74DA26A5EF41" }, { "b" : "7F263EB5A000", "path" : "/lib64/libcrypto.so.10", "elfType" : 3, "buildId" : "777069F5EECC26CD66C5C8390FA2BF4E444979D1" }, { "b" : "7F263E952000", "path" : "/lib64/librt.so.1", "elfType" : 3, "buildId" : "E81013CBFA409053D58A65A0653271AB665A4619" }, { "b" : "7F263E74E000", "path" : "/lib64/libdl.so.2", "elfType" : 3, "buildId" : "62A8842157C62F95C3069CBF779AFCC26577A99A" }, { "b" : "7F263E44A000", "path" : "/usr/lib64/libstdc++.so.6", "elfType" : 3, "buildId" : "DD6383EEAC49E9BAA9E3D1080AE932F42CF8A385" }, { "b" : "7F263E148000", "path" : "/lib64/libm.so.6", "elfType" : 3, "buildId" : "5F97F8F8E5024E29717CF35998681F84D4A22D45" }, { "b" : "7F263DF32000", "path" : "/lib64/libgcc_s.so.1", "elfType" : 3, "buildId" : "C52958E393BDF8E8D090F36DE0F4E620D8736FBF" }, { "b" : "7F263DB71000", "path" : "/lib64/libc.so.6", "elfType" : 3, "buildId" : "77E48C0DBAE0843560558F1E51A5FC61602ABC5F" }, { "b" : "7F263F3C8000", "path" : "/lib64/ld-linux-x86-64.so.2", "elfType" : 3, "buildId" : "6F90843B9087FE91955FEB0355EB0858EF9E97B2" }, { "b" : "7F263D92E000", "path" : "/lib64/libgssapi_krb5.so.2", "elfType" : 3, "buildId" : "72C1DB5E2447A90D1BF34065BCC031B7263FFBAC" }, { "b" : "7F263D649000", "path" : "/lib64/libkrb5.so.3", "elfType" : 3, "buildId" : "2B8787E8E0C317CF820E5D830D923BC744E497F4" }, { "b" : "7F263D446000", "path" : "/usr/lib64/libcom_err.so.2", "elfType" : 3, "buildId" : "622F315EB5CB2F791E9B64020692EBA98195D06D" }, { "b" : "7F263D21B000", "path" : "/lib64/libk5crypto.so.3", "elfType" : 3, "buildId" : "B10FBFEC246C4EAD1719D16090D0BE54904BBFC9" }, { "b" : "7F263D005000", "path" : "/lib64/libz.so.1", "elfType" : 3, "buildId" : "89C6AF118B6B4FB6A73AE1813E2C8BDD722956D1" }, { "b" : "7F263CDFA000", "path" : "/lib64/libkrb5support.so.0", "elfType" : 3, "buildId" : "7292C0673D7C116E3389D3FFA67087A6B9287A71" }, { "b" : "7F263CBF7000", "path" : "/lib64/libkeyutils.so.1", "elfType" : 3, "buildId" : "37A58210FA50C91E09387765408A92909468D25B" }, { "b" : "7F263C9DD000", "path" : "/lib64/libresolv.so.2", "elfType" : 3, "buildId" : "17295EFA7A45ADE184C2A7A3294460BDC8B43277" }, { "b" : "7F263C7BC000", "path" : "/usr/lib64/libselinux.so.1", "elfType" : 3, "buildId" : "F5054DC94443326819FBF3065CFDF5E4726F57EE" } ] }}
|
mongod(_ZN5mongo15printStackTraceERSo+0x29) [0xf6d609]
|
mongod(_ZN5mongo29reportOutOfMemoryErrorAndExitEv+0x49) [0xf6d139]
|
mongod(_ZN5mongo11mongoMallocEm+0x22) [0xef1212]
|
mongod(_ZN5mongo7getMoreEPNS_16OperationContextEPKcixRNS_5CurOpEiRbPb+0x24C) [0xbba8ec]
|
mongod(_ZN5mongo15receivedGetMoreEPNS_16OperationContextERNS_10DbResponseERNS_7MessageERNS_5CurOpE+0x35E) [0xace3ee]
|
mongod(_ZN5mongo16assembleResponseEPNS_16OperationContextERNS_7MessageERNS_10DbResponseERKNS_11HostAndPortE+0x13CD) [0xad379d]
|
mongod(_ZN5mongo16MyMessageHandler7processERNS_7MessageEPNS_21AbstractMessagingPortEPNS_9LastErrorE+0xDD) [0x82a47d]
|
mongod(_ZN5mongo17PortMessageServer17handleIncomingMsgEPv+0x34B) [0xf2073b]
|
libpthread.so.0(+0x7DF3) [0x7f263f1b3df3]
|
libc.so.6(clone+0x6D) [0x7f263dc671ad]
|
----- END BACKTRACE -----
|
|
|
|
This is a problem in 3.0.1 and 3.1.0
|
|
My new test pattern is:
1) load 400M docs using 10 loader threads and 1 query thread
2) rate limit 1 writer thread to 1000 docs/second and use 10 or 30 query threads
vsz is stable after #1. But after running 2 rounds of #2 with ~8 hours per round, vsz is up to 130G. Looks like it will OOM
command lines are:
for #1 -> bash run.simple.bash.L10.Q1 1000000
for #2 -> bash run.simple.bash.L1.Q10 480
|
|
And I got an OOM, with vsz and rss growing fast at the end after having been stable. This is from "ps" sampled at 60 second intervals. The columns are vsz and rss:
135614640 103303280
135614640 103310344
135614640 103313240
135614640 111993784
135614640 133409724
143773872 140719864
The max page size did not change:
"maximum page size at eviction" : 625302856
|
|
Still using rc8, 70G block cache, zlib and oplog is disabled:
- vsz is 135504048 after 400M docs inserted
- vsz is 135585968 after 677M docs inserted
"maximum page size at eviction" is 625302856 after 400M docs inserted and has stopped growing.
I will run this for a few more days to see if vsz continues to grow.
Running a similar config except enabling the oplog has vsz at 93462712 after 400M docs. This is odd. I will let that run for another 400M docs. It also has a much small max page size:
"maximum page size at eviction" : 246616304
|
|
Using similar test, rc8, 70G block cache – after 300M docs inserted vsz is almost 130G and I have at least one page that is almost 600M. Will let the test run to see if I get an OOM.
"maximum page size at eviction" : 625302856,
|
|
With RC7, I am cannot see any OOM issues.
For testing, I used
| Machine |
Duration |
Inserts |
Max Docs Size MB |
| EC2, c3.8xlarge, 32 core, 60 GB RAM |
3.45hr |
158,600,000 |
21.3 |
| Softlayer, 24 core, 128 GB RAM |
13:57 hr |
327,600,000 |
2446.3 |
It was on the larger machine that I saw a page size problem, but no OOM. michael.cahill I will reopen the issue and change the title to reflect this issue is tracking the large.
|
|
Hi mdcallag, we fixed one bug that was allowing pages to grow unexpectedly large in WiredTiger. I wasn't confident that the fix was sufficient, so we didn't call it out in RC7, but I'm glad it has helped.
|
|
rc7 looks better. After 15 minutes and 10M docs inserted vsz is 1773576 (1.8B) and "maximum page size at eviction" : 50150781,
Is this improvement expected?
|
|
I was able to get an OOM on a 144G server with --wiredTigerCacheSizeGB 70 using the test described 2 comments back.
|
|
Better test case, use a --wiredTigerCacheSizeGB 1. After ~15 minutes vsz is ~5G (5233240) and max page size is almost 1.5G. I think large pages are the problem.
"maximum page size at eviction" : 1486021050,
|
|
Files to reproduce the problem on rc6 with 10 loader threads and 0 query threads. The command line for mongod is:
numactl --interleave=all bin/mongod --config mongo.conf --storageEngine wiredTiger --wiredTigerCacheSizeGB 70
|
|
After 32,000 seconds and 400M inserts:
- vsz and rss are ~103G and still growing with --wiredTigerCacheSizeGB 70
- worst case stall is ~110 seconds
- there were 4M writes and 26330 took >= 2 seconds. Of the stalls, 11850 are >= 5 seconds, 4039 are >= 10 seconds, 377 are >= 20 seconds
- this is 32B, that is too big even if only 1 page is that big --> "maximum page size at eviction" : 32690460080,
- db.serverStatus() includes:
"bytes currently in the cache" : 60123488841,
"maximum bytes configured" : 75161927680,
|
|
from a test with 10 loader threads, 0 query threads using rc6 after ~380M docs inserted. The value below is ~32B. Even if this is just the root page for the tree that is too big.
"maximum page size at eviction" : 32690460080,
|
|
With rc6 running iibench-mongodb with 10 loader threads, no query threads and "--storageEngine wiredTiger --wiredTigerCacheSizeGB 70". After 341M docs inserted I get vsz and rss up to 94G and growing and 94/70 is about 1.34. I think this is better than before but that overhead is still much higher than I expect.
root 8902 662 64.1 96114260 95182260 ? Sl 07:11 2254:02 bin/mongod.rc6.jem.roxsnappy70g --config /data/mysql/mongo.280rc4/mongo.conf --storageEngine wiredTiger --wiredTigerCacheSizeGB 70
|
|
I am running 3.0.0rc6 using the uploaded repro case. After a few minutes I see this. If that is accurate, why is there a 65MB page in my WT b-tree. That seems like a serious performance problem.
"maximum page size at eviction" : 68375657,
|
|
Yes, The fix will be in rc6.
Fixed with (https://github.com/mongodb/mongo/commit/876f5f475bf9242c098e6de641aa7ee0982803c2)
|
|
Thanks for the details and taking the time on the reproduction. Is the fix likely to be in rc6?
|
|
"Maximum page size at eviction" is the largest page size seen during the lifetime of the server. It will not give you the largest page evicted in any particular eviction cycle. This means the counter is not particularly useful to track how large the typical sie of a page is evicted over time since it never resets.
As WT is a highly concurrent store, every cursor maintains a hazard pointer to each page it uses, and since mongod runs WT in snapshot isolation mode, transactions also pin pages in the cache to ensure transactionally safe reads of the data. Your read workload forces WT to keep pages in memory by both pinning the transaction id, and by grabbing handles on certain pages. As I understand it, this is likely triggering a bug in bt_split code where WT is ignoring this page as a split candidate.
michael.cahill alexander.gorrod keith.bostic have a better understanding of the issue.
Also, use your latest posted repro, I cannot repro the issue any longer against master.
I ran for 4:08 hr, achieved 81,400,000 inserts, and RSS & VSZ were stable at 6.7 GB & 7.08 GB respectively. This was against master with git hash 616bc35c6f7ca8b66b531e4086863697e01a0181. Final DB was ~35 GB.
The machine was EC2 c3.4xlarge with 16 core, 30 gigs of memory, and SSD storage.
iibench status
Thread [Thread-2,5,main] 81,400,000 inserts : 14,905 seconds : cum ips=5,461.15 : int ips=1,515.20 : cum avg qry=0.78 : int avg qry=0.99 : cum avg qps=1,277.63 : int avg qps=1,007.18 : exceptions=0
|
|
Restarted a test for rc5 and after ~10 minutes I see a large and growing value for that. So leaf_page_max and internal_page_max aren't respected because of the bug? But I am confused because the growth rate is faster when I use iibench-mongodb with 1 insert and 1 query thread compared to using just the insert thread.
"maximum page size at eviction" : 72350154,
|
|
I was able to get a repro with RC5 while collecting additional metrics from db.serverStatus(). It appears that most the of "leak" is around splitting large pages. I see wt cache: maximum page size at eviction grow to 4096 MB (ie, 4GB) which is correlated with the growth in RSS &VSZ sizes. I am investigating whether this repros with master given additional work that has been done port RC5 in the page split code.
|
|
Had files in too many places and uploaded the wrong ones. These should reproduce the problem.
|
|
run.simple.bash changed to have no sleep between queries
|
|
For the files I attached, I changed to export MS_BETWEEN_QUERIES=0 so the QPS rate is higher and on my ubuntu 14.04 VM the problem arrives fast – 2.1M inserts, 206 seconds and vsz is up to 12.7B (12765688)
Thread[main,5,main] *** creating secondary index on price + customerid
|
Thread[main,5,main] *** creating secondary index on cashregisterid + price + customerid
|
Thread[main,5,main] *** creating secondary index on price + dateandtime + customerid
|
Thread[main,5,main] creating 4,194,304 bytes of random character data...
|
Thread[main,5,main] creating 4,194,304 bytes of compressible character data...
|
Thread[Thread-3,5,main]Writer thread 0 : started to load collection purchases_index
|
Thread[Thread-2,5,main]100,000 inserts : 5 seconds : cum ips=18,830.93 : int ips=18,830.93 : cum avg qry=0.00 : int avg qry=0.00 : cum avg qps=0.00 : int avg qps=0.00 : exceptions=0
|
Thread[Thread-4,5,main]Query thread 0 : ready to query collection purchases_index
|
Thread[Thread-2,5,main]200,000 inserts : 11 seconds : cum ips=17,874.61 : int ips=17,013.18 : cum avg qry=0.61 : int avg qry=0.61 : cum avg qps=1,636.28 : int avg qps=1,450.14 : exceptions=0
|
Thread[Thread-2,5,main]300,000 inserts : 17 seconds : cum ips=16,962.79 : int ips=15,389.69 : cum avg qry=0.60 : int avg qry=0.60 : cum avg qps=1,652.92 : int avg qps=1,666.31 : exceptions=0
|
Thread[Thread-2,5,main]400,000 inserts : 24 seconds : cum ips=16,605.31 : int ips=15,615.16 : cum avg qry=0.60 : int avg qry=0.59 : cum avg qps=1,662.70 : int avg qps=1,680.65 : exceptions=0
|
Thread[Thread-2,5,main]500,000 inserts : 30 seconds : cum ips=16,383.16 : int ips=15,546.21 : cum avg qry=0.60 : int avg qry=0.61 : cum avg qps=1,654.43 : int avg qps=1,630.98 : exceptions=0
|
Thread[Thread-2,5,main]600,000 inserts : 37 seconds : cum ips=16,213.01 : int ips=15,413.79 : cum avg qry=0.61 : int avg qry=0.63 : cum avg qps=1,639.44 : int avg qps=1,582.83 : exceptions=0
|
Thread[Thread-2,5,main]700,000 inserts : 43 seconds : cum ips=16,024.84 : int ips=14,969.00 : cum avg qry=0.61 : int avg qry=0.64 : cum avg qps=1,625.05 : int avg qps=1,557.39 : exceptions=0
|
Thread[Thread-2,5,main]800,000 inserts : 52 seconds : cum ips=15,156.24 : int ips=10,990.57 : cum avg qry=0.65 : int avg qry=0.90 : cum avg qps=1,524.93 : int avg qps=1,110.68 : exceptions=0
|
Thread[Thread-2,5,main]900,000 inserts : 61 seconds : cum ips=14,722.24 : int ips=11,964.89 : cum avg qry=0.67 : int avg qry=0.83 : cum avg qps=1,476.33 : int avg qps=1,202.62 : exceptions=0
|
Thread[Thread-2,5,main]1,000,000 inserts : 70 seconds : cum ips=14,238.66 : int ips=10,998.25 : cum avg qry=0.70 : int avg qry=0.88 : cum avg qps=1,426.99 : int avg qps=1,128.86 : exceptions=0
|
Thread[Thread-2,5,main]1,100,000 inserts : 80 seconds : cum ips=13,755.11 : int ips=10,261.16 : cum avg qry=0.72 : int avg qry=0.96 : cum avg qps=1,376.36 : int avg qps=1,041.74 : exceptions=0
|
Thread[Thread-2,5,main]1,200,000 inserts : 90 seconds : cum ips=13,304.52 : int ips=9,784.18 : cum avg qry=0.75 : int avg qry=0.98 : cum avg qps=1,333.38 : int avg qps=1,022.75 : exceptions=0
|
Thread[Thread-2,5,main]1,300,000 inserts : 100 seconds : cum ips=12,901.89 : int ips=9,447.67 : cum avg qry=0.77 : int avg qry=1.04 : cum avg qps=1,291.77 : int avg qps=958.55 : exceptions=0
|
Thread[Thread-2,5,main]1,400,000 inserts : 112 seconds : cum ips=12,481.79 : int ips=8,777.46 : cum avg qry=0.80 : int avg qry=1.11 : cum avg qps=1,249.74 : int avg qps=901.20 : exceptions=0
|
Thread[Thread-2,5,main]1,500,000 inserts : 124 seconds : cum ips=12,040.51 : int ips=8,056.65 : cum avg qry=0.83 : int avg qry=1.18 : cum avg qps=1,207.51 : int avg qps=846.62 : exceptions=0
|
Thread[Thread-2,5,main]1,600,000 inserts : 136 seconds : cum ips=11,709.83 : int ips=8,282.46 : cum avg qry=0.85 : int avg qry=1.13 : cum avg qps=1,177.11 : int avg qps=877.24 : exceptions=0
|
Thread[Thread-2,5,main]1,700,000 inserts : 149 seconds : cum ips=11,389.02 : int ips=7,918.78 : cum avg qry=0.87 : int avg qry=1.20 : cum avg qps=1,146.59 : int avg qps=830.98 : exceptions=0
|
Thread[Thread-2,5,main]1,800,000 inserts : 162 seconds : cum ips=11,063.16 : int ips=7,441.93 : cum avg qry=0.89 : int avg qry=1.30 : cum avg qps=1,114.08 : int avg qps=767.35 : exceptions=0
|
Thread[Thread-2,5,main]1,900,000 inserts : 176 seconds : cum ips=10,767.13 : int ips=7,283.18 : cum avg qry=0.92 : int avg qry=1.29 : cum avg qps=1,086.25 : int avg qps=770.78 : exceptions=0
|
Thread[Thread-2,5,main]2,000,000 inserts : 191 seconds : cum ips=10,464.74 : int ips=6,817.00 : cum avg qry=0.94 : int avg qry=1.36 : cum avg qps=1,058.35 : int avg qps=733.22 : exceptions=0
|
Thread[Thread-2,5,main]2,100,000 inserts : 206 seconds : cum ips=10,169.64 : int ips=6,491.03 : cum avg qry=0.97 : int avg qry=1.49 : cum avg qps=1,028.60 : int avg qps=669.38 : exceptions=0
|
while :; do ps aux | grep mongod | grep -v grep; sleep 10; done
|
mdcallag 19406 0.6 1.1 232940 47704 ? Sl 17:55 0:00 bin/mongod --config /b/mongo280rc5/mongo.conf --wiredTigerCacheSizeGB 1
|
mdcallag 19406 0.5 1.1 232940 47704 ? Sl 17:55 0:00 bin/mongod --config /b/mongo280rc5/mongo.conf --wiredTigerCacheSizeGB 1
|
mdcallag 19406 7.4 11.9 1234420 482968 ? Sl 17:55 0:02 bin/mongod --config /b/mongo280rc5/mongo.conf --wiredTigerCacheSizeGB 1
|
mdcallag 19406 28.8 29.4 2284024 1188596 ? Sl 17:55 0:11 bin/mongod --config /b/mongo280rc5/mongo.conf --wiredTigerCacheSizeGB 1
|
mdcallag 19406 42.9 45.3 4380152 1828508 ? Sl 17:55 0:21 bin/mongod --config /b/mongo280rc5/mongo.conf --wiredTigerCacheSizeGB 1
|
mdcallag 19406 52.5 61.2 4380152 2472064 ? Sl 17:55 0:31 bin/mongod --config /b/mongo280rc5/mongo.conf --wiredTigerCacheSizeGB 1
|
mdcallag 19406 59.7 76.6 6476280 3093408 ? Sl 17:55 0:41 bin/mongod --config /b/mongo280rc5/mongo.conf --wiredTigerCacheSizeGB 1
|
mdcallag 19406 64.2 78.6 6476280 3171240 ? Sl 17:55 0:51 bin/mongod --config /b/mongo280rc5/mongo.conf --wiredTigerCacheSizeGB 1
|
mdcallag 19406 66.7 78.0 8573432 3149468 ? Sl 17:55 1:00 bin/mongod --config /b/mongo280rc5/mongo.conf --wiredTigerCacheSizeGB 1
|
mdcallag 19406 68.2 77.6 8573432 3131532 ? Sl 17:55 1:08 bin/mongod --config /b/mongo280rc5/mongo.conf --wiredTigerCacheSizeGB 1
|
mdcallag 19406 69.3 77.1 8573432 3110248 ? Sl 17:55 1:16 bin/mongod --config /b/mongo280rc5/mongo.conf --wiredTigerCacheSizeGB 1
|
mdcallag 19406 70.0 77.2 10669560 3117092 ? Sl 17:55 1:24 bin/mongod --config /b/mongo280rc5/mongo.conf --wiredTigerCacheSizeGB 1
|
mdcallag 19406 70.7 76.6 10669560 3091768 ? Sl 17:55 1:32 bin/mongod --config /b/mongo280rc5/mongo.conf --wiredTigerCacheSizeGB 1
|
mdcallag 19406 71.3 79.3 10669560 3200628 ? Sl 17:55 1:39 bin/mongod --config /b/mongo280rc5/mongo.conf --wiredTigerCacheSizeGB 1
|
mdcallag 19406 71.4 79.9 10669560 3227004 ? Sl 17:55 1:47 bin/mongod --config /b/mongo280rc5/mongo.conf --wiredTigerCacheSizeGB 1
|
mdcallag 19406 71.4 78.4 10669560 3164648 ? Sl 17:55 1:54 bin/mongod --config /b/mongo280rc5/mongo.conf --wiredTigerCacheSizeGB 1
|
mdcallag 19406 71.1 74.8 12765688 3021380 ? Sl 17:55 2:00 bin/mongod --config /b/mongo280rc5/mongo.conf --wiredTigerCacheSizeGB 1
|
mdcallag 19406 70.8 70.2 12765688 2832796 ? Sl 17:55 2:07 bin/mongod --config /b/mongo280rc5/mongo.conf --wiredTigerCacheSizeGB 1
|
mdcallag 19406 70.6 63.4 12765688 2560728 ? Sl 17:55 2:14 bin/mongod --config /b/mongo280rc5/mongo.conf --wiredTigerCacheSizeGB 1
|
mdcallag 19406 70.4 69.5 12765688 2806968 ? Sl 17:55 2:20 bin/mongod --config /b/mongo280rc5/mongo.conf --wiredTigerCacheSizeGB 1
|
mdcallag 19406 70.1 71.8 12765688 2900236 ? Sl 17:55 2:27 bin/mongod --config /b/mongo280rc5/mongo.conf --wiredTigerCacheSizeGB 1
|
mdcallag 19406 69.7 69.2 12765688 2792748 ? Sl 17:55 2:33 bin/mongod --config /b/mongo280rc5/mongo.conf --wiredTigerCacheSizeGB 1
|
mdcallag 19406 69.4 66.1 12765688 2668852 ? Sl 17:55 2:39 bin/mongod --config /b/mongo280rc5/mongo.conf --wiredTigerCacheSizeGB 1
|
|
|
jmongoiibench.java has local edits, not shared with github yet.
run.simple.bash matches my reproduction except that it has a smaller value for MAX_ROWS because I am running this on my laptop VM to confirm I can reproduce it on modern Ubuntu. With a 1G block cache mongod is up to 1.7B for vsz and still growing:
bin/mongod --config /b/mongo280rc5/mongo.conf --storageEngine wiredTiger --wiredTigerCacheSizeGB 1
|
|
I used "--wiredTigerCacheSizeGB 4" for 4G block cache.
Looks like you were running the query thread with a 1ms sleep between queries which is similar to what I used although my edits to that file are a bit different. Can you confirm that you get between 100 and 1000 queries per second while the test is running? That should be in the test output.
I suspect that block cache misses are the key to reproduce the problem so the database needs to be larger than the WT block cache. The growth rate for vsz and rss is a problem when the query thread is not used, but to get the rapid growth I need to have the query thread. I suggest:
- change run.simple.bash to use "export LENGTH_CHAR_FIELDS=2000" and "export PERCENT_COMPRESSIBLE=50"
- Reduce the block cache from 15G to 4G
- Finally, snappy seemed to have more problems than zlib, so use the snappy block compressor
With this setup I get OOM after ~57M inserts and ~13,000 seconds
|
|
Dear Mark,
A have tried to repro your issue against a 32GB Physical RAM machine with a 15GB WT cache (automatically calculated by mongod), and after a 1:40 hour with 67,000,000 documents, I see RSS & VSZ stable at about 16.6 GB on RC5. I used the default of snappy collection compression, and journal on.
Here is the changes to iibench I used to try to repro your issue:
https://github.com/markbenvenuto/iibench-mongodb/commits/SERVER-16902-repro
When you earlier said
4G block caches were used for all tests
Did you set --wiredTigerCacheSizeGB 4 or did you mean something else?
Thanks
mark
|
|
updated with some results for rc4. The problem appears to be new in rc5
|
|
I will start tests for rc4. I am not certain this was not a problem there.
|
|
Regardless of allocator, there are periods of crazy growth in vsz and rss with the WT b-tree. This is from "ps" running at 60 second intervals. Here is one example from jemalloc
root 5782 366 5.7 8984056 8549556 ? Sl 19:18 172:32 bin/mongod --config /data/mysql/mongo.280rc4/mongo.conf --storageEngine wiredTiger --wiredTigerCacheSizeGB 4
|
root 5782 369 8.1 12416504 12061548 ? Sl 19:18 177:36 bin/mongod --config /data/mysql/mongo.280rc4/mongo.conf --storageEngine wiredTiger --wiredTigerCacheSizeGB 4
|
root 5782 372 10.2 15635960 15276540 ? Sl 19:18 182:39 bin/mongod --config /data/mysql/mongo.280rc4/mongo.conf --storageEngine wiredTiger --wiredTigerCacheSizeGB 4
|
root 5782 375 12.3 18658808 18301452 ? Sl 19:18 187:43 bin/mongod --config /data/mysql/mongo.280rc4/mongo.conf --storageEngine wiredTiger --wiredTigerCacheSizeGB 4
|
root 5782 377 14.2 21501432 21145260 ? Sl 19:18 192:46 bin/mongod --config /data/mysql/mongo.280rc4/mongo.conf --storageEngine wiredTiger --wiredTigerCacheSizeGB 4
|
root 5782 380 16.0 24212984 23855748 ? Sl 19:18 197:50 bin/mongod --config /data/mysql/mongo.280rc4/mongo.conf --storageEngine wiredTiger --wiredTigerCacheSizeGB 4
|
root 5782 382 17.7 26764792 26410476 ? Sl 19:18 202:53 bin/mongod --config /data/mysql/mongo.280rc4/mongo.conf --storageEngine wiredTiger --wiredTigerCacheSizeGB 4
|
root 5782 384 19.4 29193720 28839012 ? Sl 19:18 207:56 bin/mongod --config /data/mysql/mongo.280rc4/mongo.conf --storageEngine wiredTiger --wiredTigerCacheSizeGB 4
|
root 5782 386 21.0 31528440 31172772 ? Sl 19:18 212:59 bin/mongod --config /data/mysql/mongo.280rc4/mongo.conf --storageEngine wiredTiger --wiredTigerCacheSizeGB 4
|
root 5782 388 22.5 33785336 33431292 ? Sl 19:18 218:02 bin/mongod --config /data/mysql/mongo.280rc4/mongo.conf --storageEngine wiredTiger --wiredTigerCacheSizeGB 4
|
root 5782 390 24.0 35993080 35635956 ? Sl 19:18 223:06 bin/mongod --config /data/mysql/mongo.280rc4/mongo.conf --storageEngine wiredTiger --wiredTigerCacheSizeGB 4
|
root 5782 392 25.4 38090232 37734756 ? Sl 19:18 228:09 bin/mongod --config /data/mysql/mongo.280rc4/mongo.conf --storageEngine wiredTiger --wiredTigerCacheSizeGB 4
|
root 5782 394 26.7 40113656 39758052 ? Sl 19:18 233:12 bin/mongod --config /data/mysql/mongo.280rc4/mongo.conf --storageEngine wiredTiger --wiredTigerCacheSizeGB 4
|
root 5782 396 28.0 42055160 41696340 ? Sl 19:18 238:15 bin/mongod --config /data/mysql/mongo.280rc4/mongo.conf --storageEngine wiredTiger --wiredTigerCacheSizeGB 4
|
root 5782 398 29.3 43963896 43606380 ? Sl 19:18 243:18 bin/mongod --config /data/mysql/mongo.280rc4/mongo.conf --storageEngine wiredTiger --wiredTigerCacheSizeGB 4
|
root 5782 400 30.6 45807096 45450948 ? Sl 19:18 248:21 bin/mongod --config /data/mysql/mongo.280rc4/mongo.conf --storageEngine wiredTiger --wiredTigerCacheSizeGB 4
|
root 5782 401 31.8 47597048 47242980 ? Sl 19:18 253:24 bin/mongod --config /data/mysql/mongo.280rc4/mongo.conf --storageEngine wiredTiger --wiredTigerCacheSizeGB 4
|
root 5782 403 33.0 49354232 48999900 ? Sl 19:18 258:26 bin/mongod --config /data/mysql/mongo.280rc4/mongo.conf --storageEngine wiredTiger --wiredTigerCacheSizeGB 4
|
root 5782 404 34.1 51070456 50717220 ? Sl 19:18 263:29 bin/mongod --config /data/mysql/mongo.280rc4/mongo.conf --storageEngine wiredTiger --wiredTigerCacheSizeGB 4
|
root 5782 406 35.3 52753912 52397316 ? Sl 19:18 268:32 bin/mongod --config /data/mysql/mongo.280rc4/mongo.conf --storageEngine wiredTiger --wiredTigerCacheSizeGB 4
|
root 5782 407 36.4 54404600 54049428 ? Sl 19:18 273:34 bin/mongod --config /data/mysql/mongo.280rc4/mongo.conf --storageEngine wiredTiger --wiredTigerCacheSizeGB 4
|
root 5782 409 37.5 56026616 55673028 ? Sl 19:18 278:37 bin/mongod --config /data/mysql/mongo.280rc4/mongo.conf --storageEngine wiredTiger --wiredTigerCacheSizeGB 4
|
root 5782 410 38.6 57636344 57281316 ? Sl 19:18 283:39 bin/mongod --config /data/mysql/mongo.280rc4/mongo.conf --storageEngine wiredTiger --wiredTigerCacheSizeGB 4
|
root 5782 412 39.6 59233784 58879308 ? Sl 19:18 288:42 bin/mongod --config /data/mysql/mongo.280rc4/mongo.conf --storageEngine wiredTiger --wiredTigerCacheSizeGB 4
|
root 5782 413 40.7 60806648 60450108 ? Sl 19:18 293:45 bin/mongod --config /data/mysql/mongo.280rc4/mongo.conf --storageEngine wiredTiger --wiredTigerCacheSizeGB 4
|
And an example with glibc malloc
root 3738 481 37.0 56105624 54944764 ? Sl Jan18 2449:34 bin/mongod --config /data/mysql/mongo.280rc4/mongo.conf --storageEngine wiredTiger --wiredTigerCacheSizeGB 4
|
root 3738 482 37.6 57088664 55926528 ? Sl Jan18 2455:59 bin/mongod --config /data/mysql/mongo.280rc4/mongo.conf --storageEngine wiredTiger --wiredTigerCacheSizeGB 4
|
root 3738 482 41.6 62986904 61854092 ? Sl Jan18 2462:26 bin/mongod --config /data/mysql/mongo.280rc4/mongo.conf --storageEngine wiredTiger --wiredTigerCacheSizeGB 4
|
root 3738 482 45.4 68623000 67498608 ? Sl Jan18 2468:51 bin/mongod --config /data/mysql/mongo.280rc4/mongo.conf --storageEngine wiredTiger --wiredTigerCacheSizeGB 4
|
root 3738 483 49.1 74128024 72958080 ? Sl Jan18 2475:15 bin/mongod --config /data/mysql/mongo.280rc4/mongo.conf --storageEngine wiredTiger --wiredTigerCacheSizeGB 4
|
root 3738 483 52.6 79305368 78121420 ? Sl Jan18 2481:39 bin/mongod --config /data/mysql/mongo.280rc4/mongo.conf --storageEngine wiredTiger --wiredTigerCacheSizeGB 4
|
root 3738 483 55.9 84220568 83049600 ? Sl Jan18 2488:02 bin/mongod --config /data/mysql/mongo.280rc4/mongo.conf --storageEngine wiredTiger --wiredTigerCacheSizeGB 4
|
root 3738 484 59.1 88939160 87764048 ? Sl Jan18 2494:24 bin/mongod --config /data/mysql/mongo.280rc4/mongo.conf --storageEngine wiredTiger --wiredTigerCacheSizeGB 4
|
root 3738 484 62.1 93461144 92257644 ? Sl Jan18 2500:45 bin/mongod --config /data/mysql/mongo.280rc4/mongo.conf --storageEngine wiredTiger --wiredTigerCacheSizeGB 4
|
root 3738 484 65.0 97786520 96600920 ? Sl Jan18 2507:06 bin/mongod --config /data/mysql/mongo.280rc4/mongo.conf --storageEngine wiredTiger --wiredTigerCacheSizeGB 4
|
root 3738 484 67.9 101980824 100784364 ? Sl Jan18 2513:26 bin/mongod --config /data/mysql/mongo.280rc4/mongo.conf --storageEngine wiredTiger --wiredTigerCacheSizeGB 4
|
root 3738 485 70.5 105912984 104761248 ? Sl Jan18 2519:45 bin/mongod --config /data/mysql/mongo.280rc4/mongo.conf --storageEngine wiredTiger --wiredTigerCacheSizeGB 4
|
root 3738 485 73.1 109714072 108579396 ? Sl Jan18 2526:03 bin/mongod --config /data/mysql/mongo.280rc4/mongo.conf --storageEngine wiredTiger --wiredTigerCacheSizeGB 4
|
root 3738 485 75.6 113449624 112284800 ? Sl Jan18 2532:22 bin/mongod --config /data/mysql/mongo.280rc4/mongo.conf --storageEngine wiredTiger --wiredTigerCacheSizeGB 4
|
root 3738 486 78.0 117054104 115862276 ? Sl Jan18 2538:39 bin/mongod --config /data/mysql/mongo.280rc4/mongo.conf --storageEngine wiredTiger --wiredTigerCacheSizeGB 4
|
root 3738 486 80.3 120461976 119292492 ? Sl Jan18 2544:56 bin/mongod --config /data/mysql/mongo.280rc4/mongo.conf --storageEngine wiredTiger --wiredTigerCacheSizeGB 4
|
root 3738 486 82.6 123804312 122591652 ? Sl Jan18 2551:12 bin/mongod --config /data/mysql/mongo.280rc4/mongo.conf --storageEngine wiredTiger --wiredTigerCacheSizeGB 4
|
root 3738 486 84.7 127015576 125831320 ? Sl Jan18 2557:28 bin/mongod --config /data/mysql/mongo.280rc4/mongo.conf --storageEngine wiredTiger --wiredTigerCacheSizeGB 4
|
root 3738 487 86.9 130161304 128961388 ? Sl Jan18 2563:44 bin/mongod --config /data/mysql/mongo.280rc4/mongo.conf --storageEngine wiredTiger --wiredTigerCacheSizeGB 4
|
root 3738 487 88.9 133175960 131997404 ? Sl Jan18 2569:59 bin/mongod --config /data/mysql/mongo.280rc4/mongo.conf --storageEngine wiredTiger --wiredTigerCacheSizeGB 4
|
root 3738 487 90.9 136125080 134905752 ? Sl Jan18 2576:14 bin/mongod --config /data/mysql/mongo.280rc4/mongo.conf --storageEngine wiredTiger --wiredTigerCacheSizeGB 4
|
root 3738 487 92.1 138812056 136750932 ? Sl Jan18 2582:27 bin/mongod --config /data/mysql/mongo.280rc4/mongo.conf --storageEngine wiredTiger --wiredTigerCacheSizeGB 4
|
root 3738 488 93.4 141433496 138678284 ? Sl Jan18 2588:41 bin/mongod --config /data/mysql/mongo.280rc4/mongo.conf --storageEngine wiredTiger --wiredTigerCacheSizeGB 4
|
root 3738 488 93.7 141891588 139181416 ? Sl Jan18 2594:13 bin/mongod --config /data/mysql/mongo.280rc4/mongo.conf --storageEngine wiredTiger --wiredTigerCacheSizeGB 4
|
|
|
I ran iibench-mongodb using 1 insert thread and 1 query thread. The insert thread adds 100M docs to an empty collection with 10 docs per insert call. The query thread sleeps 1 ms between each query. It is running to increase stress on the block cache and get more fragmentation from the frequent use of malloc. I previously ran tests without the query thread but it looks like fragmentation from reading data into the block cache is the problem and the query makes that visible faster.
All tests use rc5 or rc4 and I compared:
- tcmalloc, glibc malloc and jemalloc
- WT-btree with zlib (zlib/snappy below), WT-btree with snappy (snappy/snappy below), RocksDB with zlib, RocksDB with snappy
Other notes:
- Journal was enabled for WT with snappy compression. WAL was enabled for RocksDB without compression. Disabling the journal or using the journal without compression doesn't fix this problem.
- 4G block caches were used for all tests
These results show either "OOM" or 2 values – vsz and rss at test end. OOM == out of memory and in that case I show how far the insert thread progressed. The test server has 144G of RAM.
6.2, 4.0 - rc4, WT btree, none/no-journal, jemalloc
6.1, 4.0 - rc4, WT btree, snappy/no-journal, jemalloc
7.6, 4.5 - rc4, WT btree, zlib/no-journal, jemalloc
OOM - rc5, WT btree, snappy/no-journal, jemalloc, 25% done
OOM - rc5, WT btree, none/snappy, jemalloc, 23% done
OOM - rc5, WT btree, none/no-journal, jemalloc, 63% done
91.4, 5.2 - rc5, WT btree, zlib/no-journal, jemalloc
89.7, 81.1 - rc5, WT lsm, zlib/snappy, jemalloc
OOM - rc5, WT btree, snappy/snappy, glibc malloc, 55% done
OOM - rc5, WT btree, snappy/snappy, jemalloc, 57% done
OOM - rc5, WT btree, snappy/snappy, tcmalloc, 72% done
OOM - rc5, WT btree, zlib/snappy, glibc malloc, 89% done
55.8, 5.2 - rc5, WT btree, zlib/snappy, jemalloc
102.1, 10.1 - rc5, WT btree, zlib/snappy, tcmalloc
14.7, 12.9 - rc5, RocksDB, snappy, glibc malloc —> inserted 400M via 10 threads, not 100M via 1 thread
7.0, 6.3 - rc5, RocksDB, snappy, jemalloc
6.6, 6.2 - rc5, RocksDB, snappy, tcmalloc
7.8, 6.5 - rc5, RocksDB, zlib, glibc malloc
7.2, 6.5 - rc5, RocksDB, zlib, jemalloc
6.8, 6.4 - rc5, RocksDB, zlib, tcmalloc
|
|
New theory with tests in progress, this is just really, really bad fragmentation from malloc. If there is a leak the impact from it is much less from the malloc fragmentation.
|
|
If you run iibench before fixing the write-stall problems then set socketTimeout to 600,000 in src/jmongoiibench.java
With the WT cache set to 2G, vsz grows to 5.4G
|
|
Hi mdcallag, there are some changes in the WiredTiger pipeline. Once they're stable, I'll run iibench to confirm that they fix this issue.
|
|
--wiredTigerCacheSizeGB 32 has same problem as --wiredTigerCacheSizeGB 16. Now trying again with 16G but reducing client threads from 10 to 1
|
|
Tried with --wiredTigerCacheSizeGB 16 but the test ran too slow and there were many more exceptions from https://jira.mongodb.org/browse/SERVER-16717. Retrying with a 32G cache.
|
Generated at Thu Feb 08 03:42:39 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.