|
Hi Feng,
I have been able to reproduce an issue whose signature matches the latest issue you reported under 3.0.6. I opened SERVER-20306 to track this specific issue. At this point I think the current ticket has run its course, having covered several issues, so I will close it now. Please follow SERVER-20306 to track progress on this latest issue.
I want to really thank you for brining these issues to our attention, and for your patience as we have diagnosed and fixed them. Please don't hesitate at all to open a new ticket if you encounter any new issues.
Thanks,
Bruce
|
|
Feng, thanks for uploading that data. Here's what I'm seeing over the course of the run:

- until the point marked above, allocated memory and virtual size fairly closely tracked WT cache size, and both were less than the configured 1 GB (specifically, around the target of 80% of the configured size).
- at A there was a jump in virtual memory and heap size so that memory usage was roughly 1.5x configured WT cache size. I'll examine this below.
- but allocated memory and WT cache memory remained bounded as expected at about 80% of the configured WT cache size.
- after A all memory usage stats remained constant, so it appears there is no longer any issue of memory growing without bound.
However regarding the jump in virtual and heap memory to about 1.5x configured cache size, here's a detailed view:

It appears that the jump was caused by some activity that required reading data into the cache, marked above. Looking at the numbers (in MB) before and after the jump:
before after diff
|
bytes read into cache 51 639 587 (A)
|
bytes currently in the cache 737 818 81 (B)
|
current allocated bytes 797 851 55 (C)
|
virtual 967 1562 595 (D)
|
heap size 812 1383 571 (E)
|
central cache free bytes 8 499 491 (F)
|
- A: the activity during this interval has required reading about 500 MB into the cache. This will require allocating about 500 MB of memory.
- B: but WT has, as expected, kept the number of bytes in the cache about constant, around its target of 80% of the 1 GB cache. To do this it has had to evict about 500 MB of data from the cache.
- C: as it evicted the data it has freed the memory for the evicted data, thus keeping total allocated bytes also about constant.
- D, E, F: however during this time virtual memory, allocator heap size, and allocator central cache free bytes have all grown by about 500 MB.
Thus it appears that
- the 500 MB of new allocations required to read the 500 MB of new data into the cache were not satisfied from the 500 MB of memory freed as WT evicted data from the cache,
- but rather were satisfied from a fresh 500 MB of memory obtained from the OS,
- and the 500 MB of freed memory were returned to the allocator central cache instead.
I suspect this might reflect a memory fragmentation issue: possibly the memory freed by evicting the 500 MB of data from the cache consisted of a lot of small non-contiguous buffers, whereas the data read from disk required larger buffers that could not be accomodated in the small non-contiguous regions of free space resulting from evicting the 500 MB from the cache, so had to be satisfied by obtaining new memory from the OS.
michael.cahill, is the scenario as outlined above plausible? If so, do you think there's anything that could be done to improve this? It does seem that, if true, this implies we could need worst case something like 2x the configured cache size - 1x for "large buffers" of stuff freshly read into cache, and 1x for "small buffers" of stuff accumulated as data ages in the cache. Getting to 2x would probably require a fairly pathological case, but here we seem to have been able to get to about 1.5x or so.
|
|
ben.woodrum@contactive.com, thanks for uploading those logs. I've opened a ticket in our Community Private project, SUPPORT-1434, so that we can continue exchanging information privately about this issue; that ticket will be visible only to us and to yourself. This will also separate out the discussion of your issue; it will keep things clearer if we don't presume at the outset that you are seeing the same issue as Feng.
|
|
I've uploaded my ss.log and mongod.log. Finally mongodb used 28%-30% memory.
|
|
Thanks for the update Ben. Thanks also for getting those DataDog PR's in.
|
|
For what it's worth, we've seen really great memory improvements going from 3.0.4 to 3.0.6. We went from filling up the cache and crashing multiple times per day to completely stable. We tracked down a lot of problems using a patched version of Datadog (see the current pending pull requests https://github.com/DataDog/dd-agent/pulls?utf8=%E2%9C%93&q=%5Bmongo%5D+is%3Apr+is%3Aopen+) that provided us lots of extra metrics on MongoDB and WiredTiger
|
|
Hi Ben,
You can upload your data to this endpoint where they are stored encrypted and viewable only by mongodb engineering staff:
https://10gen-httpsupload.s3.amazonaws.com/upload_forms/d6377b89-8e83-4529-b3ab-38d5cec57ecd.html
|
|
Hi Bruce,
I've also experienced this issue and have been following this ticket since its creation. I've just generated and captured the crash data for ss.log, syslog and mongodb.log and would like to send it privately to you as to help with this investigation. Is there a secure way I can send you our logs?
|
|
Hi Feng,
I've been investigating further, and it looks like the problem I reproduced above on 3.0.5 does not occur on 3.0.6. Since you are still seeing problematic behavior on 3.0.6 we'll need to take a look at your logs and serverStatus time series on 3.0.6. Can you please collect serverStatus time series like you did before (described here) as mongod memory usage grows unacceptably under 3.0.6, then upload that time series data and the corresponding mongod log file?
Thanks,
Bruce
|
|
Hi Feng,
Thanks for uploading the latest information, and sorry for the delay in responding. Based on that information I have been able to reproduce what I assume is the same slow leak you are seeing in your latest tests. Following graphs show total allocated memory, the portion of allocated memory used by the cache, and the portion of allocated memory not accounted for by the cache (in MB):

In my test cache usage grows during the first part of the run but stops growing when the configured cache limit is reached, but allocated memory continues growing, and we see that the difference between allocated memory and cache usage grows steadily throughout the run; this is the presumed leak. I will continue to investigate and will update you with my findings.
|
|
This issue haven't been solved for now(3.0.6). Keeping focus on this issue.
|
|
chengas123, I've opened SERVER-19941 to request that tcmalloc:true be enabled by default.
|
|
Is there a reason the tcmalloc stats aren't included by default and you need to specify the extra option in the serverStatus call? Is it expensive or something? I'm wondering if we should always run with it on or make it to
We're having this problem or a very similar problem on 3.0.4. We just setup DataDog, which collects hundreds of metrics about our instances (running with this patch to add WiredTiger metrics: https://github.com/DataDog/dd-agent/pull/1825), which let us figure out what was happening. I could add the tcmalloc option to DataDog to allow collecting and graphing those metrics as well, but want to understand that option better first so that I add support for it correctly.
|
|
Hi, Bruce Lucas. I've uploaded my logs(ss.log.xz, syslog, syslog.1, mongod.log).
Here is mongod.conf:
systemLog:
|
destination: file
|
path: "/var/log/mongodb/mongod.log"
|
logAppend: true
|
logRotate: reopen
|
storage:
|
journal:
|
enabled: true
|
dbPath: "/mnt/mongodb_wt"
|
engine: wiredTiger
|
wiredTiger.engineConfig.cacheSizeGB: 1
|
net:
|
bindIp: 127.0.0.1
|
port: 27017
|
But mongodb used nearly half physical memory.
# ps aux | grep mongod
|
mongodb 12305 6.5 48.9 2813748 1982144 ? S<sl 10:52 51:00 /usr/bin/mongod --config /etc/mongod.conf
|
And still grows slowly.
|
|
Wait. I will use the collection script on production server.
The production server has 4G physical memory. I've already set storage.wiredTiger.engineConfig.cacheSizeGB to 1. But mongodb will grow more than 50% memory usage within 6 hours. Maybe on production environment we can get more useful things.
I will restart mongodb and run collection script for now.
|
|
abcfy2, can you please repeat the serverStatus time series data collection that you did before while reproducing the most recent problem you describe above, the mongorestore with indexes? That is:
- start mongod as usual
- start the following data collection script:
mongo --eval "while(true) {print(JSON.stringify(db.serverStatus({tcmalloc:1}))); sleep(1000)}" >ss.log &
|
- after the mongorestore when memory utilization is higher than you expect, please attach both syslog and the ss.log output and also mongod.log. This will give us a clearer picture of how memory usage varies over the course of the run.
Regarding the mongorestore, is there any possibility you could share the data with us so we can reproduce it in house? I understand that the dump is about 180 GB, which is a large data set for upload, but might just be feasible if you are willing to do it. If you are I can provide a secure, private upload location.
arunasf, praveenak - would you be so kind as to open separate tickets? We find that it keeps the analysis clearer if we focus each ticket on one problem. The symptom of excess memory usage is very generic and can have many causes, so we would prefer to separate the issues until and unless analysis shows that they are related. If you could collect and include the serverStatus timeseries as described above when you open the tickets that will help speed the process.
Thanks,
Bruce
|
|
I can't get the stack trace log. I've tried many times(set storage.wiredTiger.engineConfig.cacheSizeGB to 3 or 4), but mongodb just killed by kernel oom killer, and I find when "building index using bulk method", the memory usage grows very fast.
So I try to delete the indexes and restore again.
# rm -f dump/**/{system.indexes.bson,*.metadata.json}
|
# mongorestore dump/
|
Without building indexes, the memory usage grows gently. But after a long time, the mongodb used much memory and caused DOS, but not oom.
# mongorestore failed
|
2015-08-06T02:37:24.060+0800 error: Closed explicitly
|
2015-08-06T02:37:24.111+0800 error: Closed explicitly
|
2015-08-06T02:37:24.111+0800 [#######################.] ksRealtimeDB.ks-861 221.2 MB/223.1 MB (99.2%)
|
2015-08-06T02:37:24.111+0800 [#######################.] ksRealtimeDB.ks-718 222.3 MB/223.0 MB (99.7%)
|
2015-08-06T02:37:24.111+0800 [##################......] ksRealtimeDB.ks-804 172.8 MB/222.2 MB (77.8%)
|
2015-08-06T02:37:24.111+0800 [........................] ksRealtimeDB.ks-860 3.3 MB/218.5 MB (1.5%)
|
2015-08-06T02:37:24.111+0800
|
2015-08-06T02:37:24.203+0800 error: Closed explicitly
|
2015-08-06T02:37:24.229+0800 error: Closed explicitly
|
2015-08-06T02:37:24.229+0800 no indexes to restore
|
2015-08-06T02:37:24.229+0800 finished restoring ksRealtimeDB.ks-718 (660668 documents)
|
2015-08-06T02:37:24.229+0800 restoring ksRealtimeDB.ks-878 from file dump/ksRealtimeDB/ks-878.bson
|
2015-08-06T02:47:23.091+0800 error: Closed explicitly
|
2015-08-06T02:47:23.127+0800 [#######################.] ksRealtimeDB.ks-861 222.6 MB/223.1 MB (99.8%)
|
2015-08-06T02:47:23.128+0800 [##################......] ksRealtimeDB.ks-804 174.4 MB/222.2 MB (78.5%)
|
2015-08-06T02:47:23.128+0800 [........................] ksRealtimeDB.ks-860 3.6 MB/218.5 MB (1.6%)
|
2015-08-06T02:47:23.128+0800
|
2015-08-06T02:47:23.128+0800 [########################] ksRealtimeDB.ks-861 223.1 MB/223.1 MB (100.0%)
|
2015-08-06T02:47:23.128+0800 [##################......] ksRealtimeDB.ks-804 174.6 MB/222.2 MB (78.6%)
|
2015-08-06T02:47:23.128+0800 [........................] ksRealtimeDB.ks-860 4.6 MB/218.5 MB (2.1%)
|
2015-08-06T02:47:23.128+0800
|
2015-08-06T02:47:23.156+0800 no indexes to restore
|
2015-08-06T02:47:23.156+0800 finished restoring ksRealtimeDB.ks-861 (685740 documents)
|
2015-08-06T02:47:23.160+0800 error: Closed explicitly
|
2015-08-06T02:47:23.201+0800 Failed: restore error: ksRealtimeDB.ks-860: error restoring from dump/ksRealtimeDB/ks-860.bson: insertion error: no reachable servers
|
|
# ps aux | grep mongo
|
mongodb 3966 21.8 88.4 3763336 3581324 ? Ssl 8月05 253:38 /usr/bin/mongod --config /etc/mongod.conf
|
|
|
We have upgraded our replicated sharded mongo db setup to the latest 3.0.5 but our memory usage issues didn't go away. We are facing the same issue as described by Arunas F. Memory usage increases over time and only restart releases allocated memory. Even if we set cacheSize to 1 GB or 2 GB it uses way more and System kills the process.
Today one of the four shard servers running on our primary instance failed with outofmemory error and stack trace. It has 13 GB cacheSize specified. Initially the mongod process used to take 5 GB and gradually it grown to 17 GB using the full memory of the system. Below is the stack trace:
2015-08-04T17:21:56.567+0000 I COMMAND [conn19473] command admin.$cmd command: replSetUpdatePosition { replSetUpdatePosition: 1, optimes: [ { _id: ObjectId('55a85f47244866407d69f5f7'), optime: Timestamp 1438366559000|90, memberId: 0, cfgver: 1, config: { _id: 0, host: "XXX.com:27022", arbiterOnly: false, buildIndexes: true, hidden: false, priority: 3.0, tags: {}, slaveDelay: 0, votes: 1 } }, { _id: ObjectId('558de07e4547ad2a4f5537dd'), optime: Timestamp 1438708912000|290, memberId: 1, cfgver: 1, config: { _id: 1, host: "XXXX.com:27022", arbiterOnly: false, buildIndexes: true, hidden: false, priority: 2.0, tags: {}, slaveDelay: 0, votes: 1 } }, { _id: ObjectId('558ea820d26ca3ea78419218'), optime: Timestamp 1438708912000|291, memberId: 2, cfgver: 1, config: { _id: 2, host: "XXX.com:27022", arbiterOnly: false, buildIndexes: true, hidden: false, priority: 1.0, tags: {}, slaveDelay: 0, votes: 1 } } ] } ntoreturn:1 keyUpdates:0 writeConflicts:0 numYields:0 reslen:97 locks:{} 335ms
|
2015-08-04T17:21:56.847+0000 I COMMAND [conn19518] command admin.$cmd command: replSetHeartbeat { replSetHeartbeat: "sg0s01", pv: 1, v: 1, from: "XXX.com:27022", fromId: 2, checkEmpty: false } ntoreturn:1 keyUpdates:0 writeConflicts:0 numYields:0 reslen:200 locks:{} 771ms
|
2015-08-04T17:21:57.001+0000 F - [conn19515] out of memory.
|
|
0xf5ba59 0xf5b589 0xede852 0xb9a5dc 0xaae04e 0xab340d 0x80e06d 0xf0e9fb 0x7f2eb2a9b182 0x7f2eb156430d
|
----- BEGIN BACKTRACE -----
|
{"backtrace":[{"b":"400000","o":"B5BA59"},{"b":"400000","o":"B5B589"},{"b":"400000","o":"ADE852"},{"b":"400000","o":"79A5DC"},{"b":"400000","o":"6AE04E"},{"b":"400000","o":"6B340D"},{"b":"400000","o":"40E06D"},{"b":"400000","o":"B0E9FB"},{"b":"7F2EB2A93000","o":"8182"},{"b":"7F2EB1469000","o":"FB30D"}],"processInfo":{ "mongodbVersion" : "3.0.5", "gitVersion" : "8bc4ae20708dbb493cb09338d9e7be6698e4a3a3", "uname" : { "sysname" : "Linux", "release" : "3.13.0-29-generic", "version" : "#53-Ubuntu SMP Wed Jun 4 21:00:20 UTC 2014", "machine" : "x86_64" }, "somap" : [ { "elfType" : 2, "b" : "400000", "buildId" : "695FC6828398A9DB1F99718671147885B5ED116D" }, { "b" : "7FFF2EAFE000", "elfType" : 3, "buildId" : "3D068D088E7EAC15D9DA7C3AC912E783C0897EE7" }, { "b" : "7F2EB2A93000", "path" : "/lib/x86_64-linux-gnu/libpthread.so.0", "elfType" : 3, "buildId" : "FE662C4D7B14EE804E0C1902FB55218A106BC5CB" }, { "b" : "7F2EB2835000", "path" : "/lib/x86_64-linux-gnu/libssl.so.1.0.0", "elfType" : 3, "buildId" : "6C7AE380840DB9034D7763771B55E51B31BCAF14" }, { "b" : "7F2EB245B000", "path" : "/lib/x86_64-linux-gnu/libcrypto.so.1.0.0", "elfType" : 3, "buildId" : "3D522D8E04F5FD7904AE69B50CA8835A71024490" }, { "b" : "7F2EB2253000", "path" : "/lib/x86_64-linux-gnu/librt.so.1", "elfType" : 3, "buildId" : "92FCF41EFE012D6186E31A59AD05BDBB487769AB" }, { "b" : "7F2EB204F000", "path" : "/lib/x86_64-linux-gnu/libdl.so.2", "elfType" : 3, "buildId" : "C1AE4CB7195D337A77A3C689051DABAA3980CA0C" }, { "b" : "7F2EB1D4B000", "path" : "/usr/lib/x86_64-linux-gnu/libstdc++.so.6", "elfType" : 3, "buildId" : "19EFDDAB11B3BF5C71570078C59F91CF6592CE9E" }, { "b" : "7F2EB1A45000", "path" : "/lib/x86_64-linux-gnu/libm.so.6", "elfType" : 3, "buildId" : "574C6350381DA194C00FF555E0C1784618C05569" }, { "b" : "7F2EB182F000", "path" : "/lib/x86_64-linux-gnu/libgcc_s.so.1", "elfType" : 3, "buildId" : "CC0D578C2E0D86237CA7B0CE8913261C506A629A" }, { "b" : "7F2EB1469000", "path" : "/lib/x86_64-linux-gnu/libc.so.6", "elfType" : 3, "buildId" : "B571F83A8A6F5BB22D3558CDDDA9F943A2A67FD1" }, { "b" : "7F2EB2CB1000", "path" : "/lib64/ld-linux-x86-64.so.2", "elfType" : 3, "buildId" : "9F00581AB3C73E3AEA35995A0C50D24D59A01D47" } ] }}
|
mongod(_ZN5mongo15printStackTraceERSo+0x29) [0xf5ba59]
|
mongod(_ZN5mongo29reportOutOfMemoryErrorAndExitEv+0x49) [0xf5b589]
|
mongod(_ZN5mongo11mongoMallocEm+0x22) [0xede852]
|
mongod(_ZN5mongo7getMoreEPNS_16OperationContextEPKcixRNS_5CurOpEiRbPb+0x24C) [0xb9a5dc]
|
mongod(_ZN5mongo15receivedGetMoreEPNS_16OperationContextERNS_10DbResponseERNS_7MessageERNS_5CurOpE+0x35E) [0xaae04e]
|
mongod(_ZN5mongo16assembleResponseEPNS_16OperationContextERNS_7MessageERNS_10DbResponseERKNS_11HostAndPortE+0x13CD) [0xab340d]
|
mongod(_ZN5mongo16MyMessageHandler7processERNS_7MessageEPNS_21AbstractMessagingPortEPNS_9LastErrorE+0xDD) [0x80e06d]
|
mongod(_ZN5mongo17PortMessageServer17handleIncomingMsgEPv+0x34B) [0xf0e9fb]
|
libpthread.so.0(+0x8182) [0x7f2eb2a9b182]
|
libc.so.6(clone+0x6D) [0x7f2eb156430d]
|
----- END BACKTRACE -----
|
|
|
We have the same problem. MongoDB 3.0.5 with WiredTiger storage engine on CentOS 6. MongoDB runs alone without replica set.
Memory usage increases over time, also performance degrades (queries run slower and slower). Until oom event occurs (we have no swap) and kernel kills mongo process.
In mongodb.log there is not stack traces or similar things. Kernel just kills the process and that's all.
Restarting the process helps a lot ( we do it at least twice a day now) - releases allocated memory and performance is great again.
Setting storage.wiredTiger.engineConfig.cacheSizeGB to 1GB makes no difference, memory usage goes over the roof, until process is killed.
MongoDB usage is from light to moderate, mostly read operations (90%). Data size ~150GB uncompressed. Some documents quite large - about 200Kb.
Inspecting of serverStatus() showed that WT "open cursor count" and "open session count" grows boundlessly.
Software is using java 2.13 driver to connect to database.
We were using Mongo 2.X with mmapv1 for several years without any problems.
|
|
I've tried set this value to 3, First time mongodb killed by kernel oom killer, not threw any stack traces in mongod.log. Then I killed all the others processes to release the memory and try again, finally mongodb remained at 75% memory usage.
I will set storage.wiredTiger.engineConfig.cacheSizeGB to 4 and try again.
|
|
It is possible for SERVER-19673 to affect the primary as well. Would you be able to increase storage.wiredTiger.engineConfig.cacheSizeGB in order to hopefully induce mongod to run into an out-of-memory condition and terminate with a stack trace in mongod.log that may help us identify the cause?
|
|
I find storage.wiredTiger.engineConfig.cacheSizeGB could not reduce the memory usage. By default, after restore a dump, mongodb used 35% memory.
# ps aux | grep mongo
|
mongodb 642 133 35.0 1643340 1419756 ? Ssl 09:27 333:17 /usr/bin/mongod --config /etc/mongod.conf
|
But I change storage.wiredTiger.engineConfig.cacheSizeGB to 1, after mongorestore, mongodb used 38% memory.
# ps aux | grep mongod
|
mongodb 30586 129 38.0 1743628 1515648 ? Ssl 16:05 182:38 /usr/bin/mongod --config /etc/mongod.conf
|
|
|
I'm not using replica set, but a single stand alone node. This node is under a heavy write load.
|
|
Can you confirm my understanding that this latest issue is occurring on 3.0.5 on a secondary? We have discovered a new issue with memory consumption under WT, SERVER-19673, that is particularly likely to affect the secondary under heavy replication. If you would like to confirm that this is the issue you have hit, you can increase the cache size in order to intentionally allow the secondary to reach an out-of-memory condition, and this may generate stack traces in the log file that will confirm where the out-of-memory condition occurred.
|
|
Maybe still memory leak. I've set storage.wiredTiger.engineConfig.cacheSizeGB to 1, but after restore from a 180GB dump directory, mongodb used 44% memory.
# ps aux | grep mongo
|
mongodb 2987 33.4 44.0 1975648 1784104 ? Ssl 7月29 859:45 /usr/bin/mongod --config /etc/mongod.conf
|
|
|
You can set the cache size in megabytes, rather than in whole gigabytes. This is undocumented in the MongoDB documentation, but is visible in the underlying WiredTiger docs.
For example, the following is valid startup parameter to set your cache size to 200MB :
--wiredTigerEngineConfigString="cache_size=200M".
|
|
|
I try to set storage.wiredTiger.engineConfig.cacheSizeGB: 1 in /etc/mongod.conf. But the mongodb still use 47.6% memory. I have to restart mongodb to release the memory usually.
# ps aux | grep mongo
|
root 4370 0.0 0.0 12156 928 pts/3 S+ 21:12 0:00 grep --color=auto mongo
|
mongodb 5366 6.1 47.6 2087248 1929408 ? Ssl 12:44 31:24 /usr/bin/mongod --config /etc/mongod.conf
|
|
# mongo
|
MongoDB shell version: 3.0.5
|
connecting to: test
|
> show dbs;
|
ksRealtimeDB 33.053GB
|
local 0.000GB
|
metadataMongoDB 0.014GB
|
> db.serverStatus()
|
{
|
"host" : "ks",
|
"version" : "3.0.5",
|
"process" : "mongod",
|
"pid" : NumberLong(5366),
|
"uptime" : 30633,
|
"uptimeMillis" : NumberLong(30632295),
|
"uptimeEstimate" : 30334,
|
"localTime" : ISODate("2015-07-29T13:14:55.076Z"),
|
"asserts" : {
|
"regular" : 0,
|
"warning" : 0,
|
"msg" : 0,
|
"user" : 0,
|
"rollovers" : 0
|
},
|
"connections" : {
|
"current" : 11,
|
"available" : 51189,
|
"totalCreated" : NumberLong(11)
|
},
|
"cursors" : {
|
"note" : "deprecated, use server status metrics",
|
"clientCursors_size" : 0,
|
"totalOpen" : 0,
|
"pinned" : 0,
|
"totalNoTimeout" : 0,
|
"timedOut" : 0
|
},
|
"extra_info" : {
|
"note" : "fields vary by platform",
|
"heap_usage_bytes" : 1951008568,
|
"page_faults" : 940
|
},
|
"globalLock" : {
|
"totalTime" : NumberLong("30632271000"),
|
"currentQueue" : {
|
"total" : 0,
|
"readers" : 0,
|
"writers" : 0
|
},
|
"activeClients" : {
|
"total" : 16,
|
"readers" : 0,
|
"writers" : 0
|
}
|
},
|
"locks" : {
|
"Global" : {
|
"acquireCount" : {
|
"r" : NumberLong(12778513),
|
"w" : NumberLong(2239505),
|
"W" : NumberLong(4)
|
},
|
"acquireWaitCount" : {
|
"r" : NumberLong(1)
|
},
|
"timeAcquiringMicros" : {
|
"r" : NumberLong(7224)
|
}
|
},
|
"Database" : {
|
"acquireCount" : {
|
"r" : NumberLong(5269498),
|
"w" : NumberLong(1172777),
|
"R" : NumberLong(4),
|
"W" : NumberLong(1066734)
|
},
|
"acquireWaitCount" : {
|
"r" : NumberLong(2822),
|
"W" : NumberLong(1375)
|
},
|
"timeAcquiringMicros" : {
|
"r" : NumberLong(12840574),
|
"W" : NumberLong(12555320)
|
}
|
},
|
"Collection" : {
|
"acquireCount" : {
|
"r" : NumberLong(5617480),
|
"w" : NumberLong(1172777)
|
}
|
}
|
},
|
"network" : {
|
"bytesIn" : 654568095,
|
"bytesOut" : 211110891,
|
"numRequests" : 2252134
|
},
|
"opcounters" : {
|
"insert" : 1172774,
|
"query" : 1398,
|
"update" : 0,
|
"delete" : 0,
|
"getmore" : 33,
|
"command" : 1075299
|
},
|
"opcountersRepl" : {
|
"insert" : 0,
|
"query" : 0,
|
"update" : 0,
|
"delete" : 0,
|
"getmore" : 0,
|
"command" : 0
|
},
|
"storageEngine" : {
|
"name" : "wiredTiger"
|
},
|
"wiredTiger" : {
|
"uri" : "statistics:",
|
"LSM" : {
|
"sleep for LSM checkpoint throttle" : 0,
|
"sleep for LSM merge throttle" : 0,
|
"rows merged in an LSM tree" : 0,
|
"application work units currently queued" : 0,
|
"merge work units currently queued" : 0,
|
"tree queue hit maximum" : 0,
|
"switch work units currently queued" : 0,
|
"tree maintenance operations scheduled" : 0,
|
"tree maintenance operations discarded" : 0,
|
"tree maintenance operations executed" : 0
|
},
|
"async" : {
|
"number of allocation state races" : 0,
|
"number of operation slots viewed for allocation" : 0,
|
"current work queue length" : 0,
|
"number of flush calls" : 0,
|
"number of times operation allocation failed" : 0,
|
"maximum work queue length" : 0,
|
"number of times worker found no work" : 0,
|
"total allocations" : 0,
|
"total compact calls" : 0,
|
"total insert calls" : 0,
|
"total remove calls" : 0,
|
"total search calls" : 0,
|
"total update calls" : 0
|
},
|
"block-manager" : {
|
"mapped bytes read" : 0,
|
"bytes read" : 1964785664,
|
"bytes written" : 20398272512,
|
"mapped blocks read" : 0,
|
"blocks pre-loaded" : 215404,
|
"blocks read" : 471600,
|
"blocks written" : 2791937
|
},
|
"cache" : {
|
"tracked dirty bytes in the cache" : 562019570,
|
"tracked bytes belonging to internal pages in the cache" : 47628364,
|
"bytes currently in the cache" : 851385954,
|
"tracked bytes belonging to leaf pages in the cache" : 1026113460,
|
"maximum bytes configured" : 1073741824,
|
"tracked bytes belonging to overflow pages in the cache" : 0,
|
"bytes read into cache" : 232338402,
|
"bytes written from cache" : 16237644684,
|
"pages evicted by application threads" : 0,
|
"checkpoint blocked page eviction" : 1,
|
"unmodified pages evicted" : 450,
|
"page split during eviction deepened the tree" : 0,
|
"modified pages evicted" : 663,
|
"pages selected for eviction unable to be evicted" : 89,
|
"pages evicted because they exceeded the in-memory maximum" : 0,
|
"pages evicted because they had chains of deleted items" : 0,
|
"failed eviction of pages that exceeded the in-memory maximum" : 0,
|
"hazard pointer blocked page eviction" : 1,
|
"internal pages evicted" : 0,
|
"maximum page size at eviction" : 1496235,
|
"eviction server candidate queue empty when topping up" : 12,
|
"eviction server candidate queue not empty when topping up" : 9,
|
"eviction server evicting pages" : 93,
|
"eviction server populating queue, but not evicting pages" : 14,
|
"eviction server unable to reach eviction goal" : 0,
|
"pages split during eviction" : 588,
|
"pages walked for eviction" : 581490,
|
"eviction worker thread evicting pages" : 1109,
|
"in-memory page splits" : 0,
|
"percentage overhead" : 8,
|
"tracked dirty pages in the cache" : 1763,
|
"pages currently held in the cache" : 10025,
|
"pages read into cache" : 11109,
|
"pages written from cache" : 1873598
|
},
|
"connection" : {
|
"pthread mutex condition wait calls" : 1127704,
|
"files currently open" : 1507,
|
"memory allocations" : 40274294,
|
"memory frees" : 33063359,
|
"memory re-allocations" : 9711,
|
"total read I/Os" : 473140,
|
"pthread mutex shared lock read-lock calls" : 3299221,
|
"pthread mutex shared lock write-lock calls" : 491083,
|
"total write I/Os" : 3966531
|
},
|
"cursor" : {
|
"cursor create calls" : 2029287,
|
"cursor insert calls" : 4158946,
|
"cursor next calls" : 1541183,
|
"cursor prev calls" : 197271,
|
"cursor remove calls" : 0,
|
"cursor reset calls" : 7219504,
|
"cursor search calls" : 5217589,
|
"cursor search near calls" : 35225,
|
"cursor update calls" : 0
|
},
|
"data-handle" : {
|
"connection dhandles swept" : 0,
|
"connection candidate referenced" : 0,
|
"connection sweeps" : 3054,
|
"connection time-of-death sets" : 458879,
|
"session dhandles swept" : 1109,
|
"session sweep attempts" : 537
|
},
|
"log" : {
|
"log buffer size increases" : 0,
|
"total log buffer size" : 8388608,
|
"log bytes of payload data" : 388280660,
|
"log bytes written" : 494305792,
|
"yields waiting for previous log file close" : 0,
|
"total size of compressed records" : 372863683,
|
"total in-memory size of compressed records" : 909213742,
|
"log records too small to compress" : 3207,
|
"log records not compressed" : 88213,
|
"log records compressed" : 1082733,
|
"maximum log file size" : 104857600,
|
"pre-allocated log files prepared" : 5,
|
"number of pre-allocated log files to create" : 1,
|
"pre-allocated log files used" : 4,
|
"log read operations" : 0,
|
"log release advances write LSN" : 1174153,
|
"records processed by log scan" : 11,
|
"log scan records requiring two reads" : 2,
|
"log scan operations" : 5,
|
"consolidated slot closures" : 0,
|
"logging bytes consolidated" : 0,
|
"consolidated slot joins" : 0,
|
"consolidated slot join races" : 0,
|
"slots selected for switching that were unavailable" : 0,
|
"record size exceeded maximum" : 0,
|
"failed to find a slot large enough for record" : 0,
|
"consolidated slot join transitions" : 0,
|
"log sync operations" : 1797,
|
"log sync_dir operations" : 5,
|
"log server thread advances write LSN" : 0,
|
"log write operations" : 1174153
|
},
|
"reconciliation" : {
|
"page reconciliation calls" : 1429966,
|
"page reconciliation calls for eviction" : 393,
|
"split bytes currently awaiting free" : 17288,
|
"split objects currently awaiting free" : 16
|
},
|
"session" : {
|
"open cursor count" : 3400,
|
"open session count" : 17
|
},
|
"thread-yield" : {
|
"page acquire busy blocked" : 0,
|
"page acquire eviction blocked" : 0,
|
"page acquire locked blocked" : 0,
|
"page acquire read blocked" : 4672350,
|
"page acquire time sleeping (usecs)" : 22619981
|
},
|
"transaction" : {
|
"transaction begins" : 2255968,
|
"transaction checkpoints" : 429,
|
"transaction checkpoint generation" : 429,
|
"transaction checkpoint currently running" : 1,
|
"transaction checkpoint max time (msecs)" : 16326,
|
"transaction checkpoint min time (msecs)" : 76,
|
"transaction checkpoint most recent time (msecs)" : 6671,
|
"transaction checkpoint total time (msecs)" : 3036981,
|
"transactions committed" : 1173290,
|
"transaction failures due to cache overflow" : 0,
|
"transaction range of IDs currently pinned by a checkpoint" : 621,
|
"transaction range of IDs currently pinned" : 0,
|
"transactions rolled back" : 1082678
|
},
|
"concurrentTransactions" : {
|
"write" : {
|
"out" : 0,
|
"available" : 128,
|
"totalTickets" : 128
|
},
|
"read" : {
|
"out" : 1,
|
"available" : 127,
|
"totalTickets" : 128
|
}
|
}
|
},
|
"writeBacksQueued" : false,
|
"mem" : {
|
"bits" : 64,
|
"resident" : 2016,
|
"virtual" : 2238,
|
"supported" : true,
|
"mapped" : 0,
|
"mappedWithJournal" : 0
|
},
|
"metrics" : {
|
"commands" : {
|
"buildInfo" : {
|
"failed" : NumberLong(0),
|
"total" : NumberLong(10)
|
},
|
"count" : {
|
"failed" : NumberLong(0),
|
"total" : NumberLong(37)
|
},
|
"createIndexes" : {
|
"failed" : NumberLong(0),
|
"total" : NumberLong(1066044)
|
},
|
"getLastError" : {
|
"failed" : NumberLong(0),
|
"total" : NumberLong(10)
|
},
|
"getLog" : {
|
"failed" : NumberLong(0),
|
"total" : NumberLong(1)
|
},
|
"isMaster" : {
|
"failed" : NumberLong(0),
|
"total" : NumberLong(9192)
|
},
|
"listCollections" : {
|
"failed" : NumberLong(0),
|
"total" : NumberLong(1)
|
},
|
"listDatabases" : {
|
"failed" : NumberLong(0),
|
"total" : NumberLong(1)
|
},
|
"replSetGetStatus" : {
|
"failed" : NumberLong(1),
|
"total" : NumberLong(1)
|
},
|
"serverStatus" : {
|
"failed" : NumberLong(0),
|
"total" : NumberLong(1)
|
},
|
"whatsmyuri" : {
|
"failed" : NumberLong(0),
|
"total" : NumberLong(1)
|
}
|
},
|
"cursor" : {
|
"timedOut" : NumberLong(0),
|
"open" : {
|
"noTimeout" : NumberLong(0),
|
"pinned" : NumberLong(0),
|
"total" : NumberLong(0)
|
}
|
},
|
"document" : {
|
"deleted" : NumberLong(0),
|
"inserted" : NumberLong(1172774),
|
"returned" : NumberLong(194265),
|
"updated" : NumberLong(0)
|
},
|
"getLastError" : {
|
"wtime" : {
|
"num" : 0,
|
"totalMillis" : 0
|
},
|
"wtimeouts" : NumberLong(0)
|
},
|
"operation" : {
|
"fastmod" : NumberLong(0),
|
"idhack" : NumberLong(0),
|
"scanAndOrder" : NumberLong(0),
|
"writeConflicts" : NumberLong(0)
|
},
|
"queryExecutor" : {
|
"scanned" : NumberLong(1340203),
|
"scannedObjects" : NumberLong(1340203)
|
},
|
"record" : {
|
"moves" : NumberLong(0)
|
},
|
"repl" : {
|
"apply" : {
|
"batches" : {
|
"num" : 0,
|
"totalMillis" : 0
|
},
|
"ops" : NumberLong(0)
|
},
|
"buffer" : {
|
"count" : NumberLong(0),
|
"maxSizeBytes" : 268435456,
|
"sizeBytes" : NumberLong(0)
|
},
|
"network" : {
|
"bytes" : NumberLong(0),
|
"getmores" : {
|
"num" : 0,
|
"totalMillis" : 0
|
},
|
"ops" : NumberLong(0),
|
"readersCreated" : NumberLong(0)
|
},
|
"preload" : {
|
"docs" : {
|
"num" : 0,
|
"totalMillis" : 0
|
},
|
"indexes" : {
|
"num" : 0,
|
"totalMillis" : 0
|
}
|
}
|
},
|
"storage" : {
|
"freelist" : {
|
"search" : {
|
"bucketExhausted" : NumberLong(0),
|
"requests" : NumberLong(0),
|
"scanned" : NumberLong(0)
|
}
|
}
|
},
|
"ttl" : {
|
"deletedDocuments" : NumberLong(0),
|
"passes" : NumberLong(509)
|
}
|
},
|
"ok" : 1
|
}
|
>
|
|
|
Hi Michael Cahill . This server running many other services like tomcat, mysql, redis, and so on. Mongodb used so much memory and caused swap leak, it degraded the performance.
The storage.wiredTiger.engineConfig.cacheSizeGB seems to be a useful option, but it can only set to a integer number, is this possible to set a float number like 0.5 ?
The last, after drop database will not release the memory maybe not very well. I have to restart mongod service to release memory. Maybe introduce garbage collection is better, cleaned the garbage data in memory regularly.
|
|
abcfy2, that behavior sounds like what I would expect. The default WiredTiger cache size is half of RAM (so 2GB in your case), and it is not unusual for other memory allocations in mongod to total 400MB, so a total of 2.4GB is not suprising.
I gather that the load completed, so presumably there was no problem with memory growing larger and larger over time.
Is the memory use actually causing a problem? MongoDB has configured WiredTiger not to evict trees from cache unless there is memory pressure, so the RAM use will not go down automatically because a MongoDB workload ends.
The best starting point for memory sizing with WiredTiger is probably: http://docs.mongodb.org/manual/reference/configuration-options/#storage.wiredTiger.engineConfig.cacheSizeGB
|
|
Hi developers. Did this issue fix on 3.0.5 ? I find the related issue fix version/s is 3.0.5.
But when I upgrade to 3.0.5, this issue also appeared. Wiredtiger cost so much memory usage and cause swap leak(over 50% usage)
I try to restore from a dump which has 200GB. After that mongod will use over 60% memory(physical memory 4G). And db.dropDatabase() will not release the memory.
When did this issue will be fixed ? And what's the minimum memory did mongodb need ?
|
|
I am experience the same issue, please help
here is the result of db.serverStatus(
{tcmalloc:true}
)
"storageEngine" :
{
"name" : "wiredTiger"
}
,
"tcmalloc" : {
"generic" :
{
"current_allocated_bytes" : NumberLong("10818777200"),
"heap_size" : NumberLong("13169491968")
}
,
"tcmalloc" :
{
"pageheap_free_bytes" : NumberLong(1584193536),
"pageheap_unmapped_bytes" : 283467776,
"max_total_thread_cache_bytes" : NumberLong(1073741824),
"current_total_thread_cache_bytes" : 47305000,
"central_cache_free_bytes" : 421448760,
"transfer_cache_free_bytes" : 14299696,
"thread_cache_free_bytes" : 47305000,
"aggressive_memory_decommit" : 0
}
,
"formattedString" : "-----------------------------------------------\nMALLOC: 10818777200 (10317.6 MiB) Bytes in use by application\nMALLOC: + 1584193536 ( 1510.8 MiB) Bytes in page heap freelist\nMALLOC: + 421448760 ( 401.9 MiB) Bytes in central cache freelist\nMALLOC: + 14299696 ( 13.6 MiB) Bytes in transfer cache freelist\nMALLOC: + 47305000 ( 45.1 MiB) Bytes in thread cache freelists\nMALLOC: + 39727264 ( 37.9 MiB) Bytes in malloc metadata\nMALLOC: ------------\nMALLOC: = 12925751456 (12327.0 MiB) Actual memory used (physical + swap)\nMALLOC: + 283467776 ( 270.3 MiB) Bytes released to OS (aka unmapped)\nMALLOC: ------------\nMALLOC: = 13209219232 (12597.3 MiB) Virtual address space used\nMALLOC:\nMALLOC: 437763 Spans in use\nMALLOC: 221 Thread heaps in use\nMALLOC: 8192 Tcmalloc page size\n-----------------------------------------------\nCall ReleaseFreeMemory() to release freelist memory to the OS (via madvise()).\nBytes released to the OS take up virtual address space but no physical memory.\n"
},
"wiredTiger" : {
"uri" : "statistics:",
"LSM" :
{
"sleep for LSM checkpoint throttle" : 0,
"sleep for LSM merge throttle" : 0,
"rows merged in an LSM tree" : 0,
"application work units currently queued" : 0,
"merge work units currently queued" : 0,
"tree queue hit maximum" : 0,
"switch work units currently queued" : 0,
"tree maintenance operations scheduled" : 0,
"tree maintenance operations discarded" : 0,
"tree maintenance operations executed" : 0
}
,
"async" :
{
"number of allocation state races" : 0,
"number of operation slots viewed for allocation" : 0,
"current work queue length" : 0,
"number of flush calls" : 0,
"number of times operation allocation failed" : 0,
"maximum work queue length" : 0,
"number of times worker found no work" : 0,
"total allocations" : 0,
"total compact calls" : 0,
"total insert calls" : 0,
"total remove calls" : 0,
"total search calls" : 0,
"total update calls" : 0
}
,
"block-manager" :
{
"mapped bytes read" : 0,
"bytes read" : 161093787648,
"bytes written" : 95925149696,
"mapped blocks read" : 0,
"blocks pre-loaded" : 1447,
"blocks read" : 15805695,
"blocks written" : 11314236
}
,
"cache" :
{
"tracked dirty bytes in the cache" : 63008714,
"tracked bytes belonging to internal pages in the cache" : 5224866301,
"bytes currently in the cache" : 11166329226,
"tracked bytes belonging to leaf pages in the cache" : 8733777411,
"maximum bytes configured" : 13958643712,
"tracked bytes belonging to overflow pages in the cache" : 0,
"bytes read into cache" : 403477130712,
"bytes written from cache" : 245260253309,
"pages evicted by application threads" : 10,
"checkpoint blocked page eviction" : 24548,
"unmodified pages evicted" : 7523027,
"page split during eviction deepened the tree" : 0,
"modified pages evicted" : 7972566,
"pages selected for eviction unable to be evicted" : 400587,
"pages evicted because they exceeded the in-memory maximum" : 54,
"pages evicted because they had chains of deleted items" : 6,
"failed eviction of pages that exceeded the in-memory maximum" : 0,
"hazard pointer blocked page eviction" : 332338,
"internal pages evicted" : 326946,
"maximum page size at eviction" : 10485970,
"eviction server candidate queue empty when topping up" : 116131,
"eviction server candidate queue not empty when topping up" : 1005839,
"eviction server evicting pages" : 310014,
"eviction server populating queue, but not evicting pages" : 1108831,
"eviction server unable to reach eviction goal" : 5,
"pages split during eviction" : 616954,
"pages walked for eviction" : 12768203795,
"eviction worker thread evicting pages" : 15592287,
"in-memory page splits" : 51,
"percentage overhead" : 8,
"tracked dirty pages in the cache" : 4018,
"pages currently held in the cache" : 303763,
"pages read into cache" : 15805426,
"pages written from cache" : 11313730
}
,
"connection" :
{
"pthread mutex condition wait calls" : 2891587,
"files currently open" : 27,
"memory allocations" : 489279295,
"memory frees" : 475486585,
"memory re-allocations" : 9801,
"total read I/Os" : 15805728,
"pthread mutex shared lock read-lock calls" : 16867,
"pthread mutex shared lock write-lock calls" : 73414,
"total write I/Os" : 20721524
}
,
"cursor" :
{
"cursor create calls" : 8208,
"cursor insert calls" : 37627405,
"cursor next calls" : 347708668,
"cursor prev calls" : 1824,
"cursor remove calls" : 18813460,
"cursor reset calls" : 215092881,
"cursor search calls" : 98875308,
"cursor search near calls" : 130497814,
"cursor update calls" : 0
}
,
"data-handle" :
{
"connection dhandles swept" : 0,
"connection candidate referenced" : 207,
"connection sweeps" : 14423,
"connection time-of-death sets" : 217,
"session dhandles swept" : 487,
"session sweep attempts" : 481
}
,
"log" :
{
"log buffer size increases" : 0,
"total log buffer size" : 1048576,
"log bytes of payload data" : 10840117536,
"log bytes written" : 11398885120,
"yields waiting for previous log file close" : 0,
"total size of compressed records" : 10840085895,
"total in-memory size of compressed records" : 15422419601,
"log records too small to compress" : 244,
"log records not compressed" : 133,
"log records compressed" : 9406776,
"maximum log file size" : 104857600,
"pre-allocated log files prepared" : 109,
"number of pre-allocated log files to create" : 1,
"pre-allocated log files used" : 108,
"log read operations" : 0,
"log release advances write LSN" : 9407153,
"records processed by log scan" : 10,
"log scan records requiring two reads" : 0,
"log scan operations" : 5,
"consolidated slot closures" : 0,
"logging bytes consolidated" : 0,
"consolidated slot joins" : 0,
"consolidated slot join races" : 0,
"slots selected for switching that were unavailable" : 0,
"record size exceeded maximum" : 0,
"failed to find a slot large enough for record" : 0,
"consolidated slot join transitions" : 0,
"log sync operations" : 422,
"log sync_dir operations" : 102,
"log server thread advances write LSN" : 0,
"log write operations" : 9407153
}
,
"reconciliation" :
{
"page reconciliation calls" : 10364692,
"page reconciliation calls for eviction" : 6135116,
"split bytes currently awaiting free" : 4824,
"split objects currently awaiting free" : 6
}
,
"session" :
{
"open cursor count" : 1374,
"open session count" : 126
}
,
"thread-yield" :
{
"page acquire busy blocked" : 543,
"page acquire eviction blocked" : 0,
"page acquire locked blocked" : 26980285,
"page acquire read blocked" : 14675,
"page acquire time sleeping (usecs)" : 24088026
}
,
"transaction" :
{
"transaction begins" : 14443774,
"transaction checkpoints" : 245,
"transaction checkpoint generation" : 244,
"transaction checkpoint currently running" : 1,
"transaction checkpoint max time (msecs)" : 121972,
"transaction checkpoint min time (msecs)" : 1,
"transaction checkpoint most recent time (msecs)" : 56556,
"transaction checkpoint total time (msecs)" : 2350919,
"transactions committed" : 9406866,
"transaction failures due to cache overflow" : 0,
"transaction range of IDs currently pinned by a checkpoint" : 0,
"transaction range of IDs currently pinned" : 1,
"transactions rolled back" : 5036908
}
,
"concurrentTransactions" : {
"write" :
{
"out" : 0,
"available" : 128,
"totalTickets" : 128
}
,
"read" :
{
"out" : 1,
"available" : 127,
"totalTickets" : 128
}
}
},
"writeBacksQueued" : false,
"mem" :
{
"bits" : 64,
"resident" : 12546,
"virtual" : 13495,
"supported" : true,
"mapped" : 0,
"mappedWithJournal" : 0
}
,
|
|
All,
we believe the root cause of this issue is the same one reporter in SERVER-17386, so I'm closing this ticket as a duplicate.
cronos, it could be useful if you could share your reproducer, so when SERVER-17386 is fixed we can verify that the fix addresses your specific scenario as well.
Regards,
Ramón.
|
|
"session" :
{
"open cursor count" : 67390,
"open session count" : 15059
}
I have the same problem, i can reproduce it every time in 30 seconds, so if you would like more data let me know. I am using simple benchmark using ab and node.js with 2 same queries on 2 collections that return same data. Memory usage goes to 4 GB in 10-20 seconds while benchmark is on. I've added status.json file.
|
|
We've been running into this issue too. I've attached ay_mr01a_ss.txt with the output of db.serverStatus({tcmalloc:true}) taken just a few minutes before it was killed. The server had ~190MB of RAM free out of 20GB. This seems to happen about every 30 hours for us.
I'm on CentOS 7
Linux ay-prtp-mr01a.ayetier.systems 3.10.0-123.el7.x86_64 #1 SMP Mon Jun 30 12:09:22 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
|
Jun 3 10:36:09 ay-prtp-mr01a kernel: Out of memory: Kill process 2215 (mongod) score 975 or sacrifice child
|
Jun 3 10:36:09 ay-prtp-mr01a kernel: Killed process 2215 (mongod) total-vm:22956780kB, anon-rss:19821844kB, file-rss:0kB
|
|
|
I am experiencing the same issue and hope this can help you.
uname -a
Linux xxx 2.6.32-504.3.3.el6.centos.plus.x86_64 #1 SMP Wed Dec 17 01:21:03 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
|
server_status.json
|
{
|
"host" : "xxx",
|
"version" : "3.0.3",
|
"process" : "mongod",
|
"pid" : NumberLong(1003),
|
"uptime" : 234783,
|
"uptimeMillis" : NumberLong(234782832),
|
"uptimeEstimate" : 233275,
|
"localTime" : ISODate("2015-05-29T20:42:06.124Z"),
|
"asserts" : {
|
"regular" : 0,
|
"warning" : 0,
|
"msg" : 0,
|
"user" : 87,
|
"rollovers" : 0
|
},
|
"connections" : {
|
"current" : 98,
|
"available" : 81822,
|
"totalCreated" : NumberLong(16881)
|
},
|
"cursors" : {
|
"note" : "deprecated, use server status metrics",
|
"clientCursors_size" : 1,
|
"totalOpen" : 1,
|
"pinned" : 0,
|
"totalNoTimeout" : 0,
|
"timedOut" : 3
|
},
|
"extra_info" : {
|
"note" : "fields vary by platform",
|
"heap_usage_bytes" : 1792327968,
|
"page_faults" : 5287
|
},
|
"globalLock" : {
|
"totalTime" : NumberLong("234782976000"),
|
"currentQueue" : {
|
"total" : 0,
|
"readers" : 0,
|
"writers" : 0
|
},
|
"activeClients" : {
|
"total" : 124,
|
"readers" : 0,
|
"writers" : 0
|
}
|
},
|
"locks" : {
|
"Global" : {
|
"acquireCount" : {
|
"r" : NumberLong(11671352),
|
"w" : NumberLong(78476913),
|
"R" : NumberLong(203),
|
"W" : NumberLong(8)
|
},
|
"acquireWaitCount" : {
|
"r" : NumberLong(1),
|
"w" : NumberLong(58),
|
"R" : NumberLong(1),
|
"W" : NumberLong(1)
|
},
|
"timeAcquiringMicros" : {
|
"r" : NumberLong(1736),
|
"w" : NumberLong(495182),
|
"R" : NumberLong(300591),
|
"W" : NumberLong(4)
|
}
|
},
|
"Database" : {
|
"acquireCount" : {
|
"r" : NumberLong(11671139),
|
"w" : NumberLong(64296245),
|
"R" : NumberLong(213),
|
"W" : NumberLong(14180668)
|
},
|
"acquireWaitCount" : {
|
"r" : NumberLong(217),
|
"w" : NumberLong(726),
|
"W" : NumberLong(230)
|
},
|
"timeAcquiringMicros" : {
|
"r" : NumberLong(15032),
|
"w" : NumberLong(62960),
|
"W" : NumberLong(4285)
|
}
|
},
|
"Collection" : {
|
"acquireCount" : {
|
"r" : NumberLong(2592648),
|
"w" : NumberLong(48184423)
|
}
|
},
|
"oplog" : {
|
"acquireCount" : {
|
"r" : NumberLong(9129204),
|
"w" : NumberLong(16111823)
|
}
|
}
|
},
|
"network" : {
|
"bytesIn" : 7260734541,
|
"bytesOut" : 20284843790,
|
"numRequests" : 20453492
|
},
|
"opcounters" : {
|
"insert" : 1216138,
|
"query" : 971862,
|
"update" : 6429396,
|
"delete" : 177781,
|
"getmore" : 4053392,
|
"command" : 7463080
|
},
|
"opcountersRepl" : {
|
"insert" : 22306881,
|
"query" : 0,
|
"update" : 3202,
|
"delete" : 17996278,
|
"getmore" : 0,
|
"command" : 0
|
},
|
"repl" : {
|
"setName" : "gifshow_comment",
|
"setVersion" : 3,
|
"ismaster" : true,
|
"secondary" : false,
|
"hosts" : [
|
"s3.zw:27018",
|
"sd-d6.zw:27018",
|
"sd-d7.zw:27018"
|
],
|
"primary" : "sd-d7.zw:27018",
|
"me" : "sd-d7.zw:27018",
|
"electionId" : ObjectId("55674b8ec486a02be5c82573"),
|
"rbid" : 404533360
|
},
|
"storageEngine" : {
|
"name" : "wiredTiger"
|
},
|
"tcmalloc" : {
|
"generic" : {
|
"current_allocated_bytes" : NumberLong("49036969696"),
|
"heap_size" : NumberLong("49967849472")
|
},
|
"tcmalloc" : {
|
"pageheap_free_bytes" : 272842752,
|
"pageheap_unmapped_bytes" : 123133952,
|
"max_total_thread_cache_bytes" : NumberLong(1073741824),
|
"current_total_thread_cache_bytes" : 53402688,
|
"central_cache_free_bytes" : 453545296,
|
"transfer_cache_free_bytes" : 27955088,
|
"thread_cache_free_bytes" : 53402688,
|
"aggressive_memory_decommit" : 0
|
},
|
"formattedString" : "------------------------------------------------\nMALLOC: 49036969696 (46765.3 MiB) Bytes in use by application\nMALLOC: + 272842752 ( 260.2 MiB) Bytes in page heap freelist\nMALLOC: + 453545296 ( 432.5 MiB) Bytes in central cache freelist\nMALLOC: + 27955088 ( 26.7 MiB) Bytes in transfer cache freelist\nMALLOC: + 53402688 ( 50.9 MiB) Bytes in thread cache freelists\nMALLOC: + 170029216 ( 162.2 MiB) Bytes in malloc metadata\nMALLOC: ------------\nMALLOC: = 50014744736 (47697.8 MiB) Actual memory used (physical + swap)\nMALLOC: + 123133952 ( 117.4 MiB) Bytes released to OS (aka unmapped)\nMALLOC: ------------\nMALLOC: = 50137878688 (47815.2 MiB) Virtual address space used\nMALLOC:\nMALLOC: 2290756 Spans in use\nMALLOC: 147 Thread heaps in use\nMALLOC: 8192 Tcmalloc page size\n------------------------------------------------\nCall ReleaseFreeMemory() to release freelist memory to the OS (via madvise()).\nBytes released to the OS take up virtual address space but no physical memory.\n"
|
},
|
"wiredTiger" : {
|
"uri" : "statistics:",
|
"LSM" : {
|
"sleep for LSM checkpoint throttle" : 0,
|
"sleep for LSM merge throttle" : 0,
|
"rows merged in an LSM tree" : 0,
|
"application work units currently queued" : 0,
|
"merge work units currently queued" : 0,
|
"tree queue hit maximum" : 0,
|
"switch work units currently queued" : 0,
|
"tree maintenance operations scheduled" : 0,
|
"tree maintenance operations discarded" : 0,
|
"tree maintenance operations executed" : 0
|
},
|
"async" : {
|
"number of allocation state races" : 0,
|
"number of operation slots viewed for allocation" : 0,
|
"current work queue length" : 0,
|
"number of flush calls" : 0,
|
"number of times operation allocation failed" : 0,
|
"maximum work queue length" : 0,
|
"number of times worker found no work" : 0,
|
"total allocations" : 0,
|
"total compact calls" : 0,
|
"total insert calls" : 0,
|
"total remove calls" : 0,
|
"total search calls" : 0,
|
"total update calls" : 0
|
},
|
"block-manager" : {
|
"mapped bytes read" : 0,
|
"bytes read" : 24622915584,
|
"bytes written" : 351526281216,
|
"mapped blocks read" : 0,
|
"blocks pre-loaded" : 2750,
|
"blocks read" : 1470378,
|
"blocks written" : 30992724
|
},
|
"cache" : {
|
"tracked dirty bytes in the cache" : 148551891,
|
"tracked bytes belonging to internal pages in the cache" : 165144664,
|
"bytes currently in the cache" : 50610056164,
|
"tracked bytes belonging to leaf pages in the cache" : 66406848424,
|
"maximum bytes configured" : 66571993088,
|
"tracked bytes belonging to overflow pages in the cache" : 0,
|
"bytes read into cache" : 29445741246,
|
"bytes written from cache" : 313199529011,
|
"pages evicted by application threads" : 0,
|
"checkpoint blocked page eviction" : 13226,
|
"unmodified pages evicted" : 0,
|
"page split during eviction deepened the tree" : 2,
|
"modified pages evicted" : 78115,
|
"pages selected for eviction unable to be evicted" : 2482,
|
"pages evicted because they exceeded the in-memory maximum" : 3169,
|
"pages evicted because they had chains of deleted items" : 9898,
|
"failed eviction of pages that exceeded the in-memory maximum" : 889,
|
"hazard pointer blocked page eviction" : 1705,
|
"internal pages evicted" : 356,
|
"maximum page size at eviction" : 10497085,
|
"eviction server candidate queue empty when topping up" : 3129,
|
"eviction server candidate queue not empty when topping up" : 196,
|
"eviction server evicting pages" : 18630,
|
"eviction server populating queue, but not evicting pages" : 173,
|
"eviction server unable to reach eviction goal" : 0,
|
"pages split during eviction" : 2669,
|
"pages walked for eviction" : 562506307,
|
"eviction worker thread evicting pages" : 50607,
|
"in-memory page splits" : 2587,
|
"percentage overhead" : 8,
|
"tracked dirty pages in the cache" : 205,
|
"pages currently held in the cache" : 1297333,
|
"pages read into cache" : 1368314,
|
"pages written from cache" : 30788552
|
},
|
"connection" : {
|
"pthread mutex condition wait calls" : 16758169,
|
"files currently open" : 48,
|
"memory allocations" : 954581752,
|
"memory frees" : 785342202,
|
"memory re-allocations" : 822914,
|
"total read I/Os" : 1522884,
|
"pthread mutex shared lock read-lock calls" : 765801,
|
"pthread mutex shared lock write-lock calls" : 432862,
|
"total write I/Os" : 89470557
|
},
|
"cursor" : {
|
"cursor create calls" : 459706,
|
"cursor insert calls" : 177472350,
|
"cursor next calls" : 107673340,
|
"cursor prev calls" : 2888903,
|
"cursor remove calls" : 99771867,
|
"cursor reset calls" : 416692862,
|
"cursor search calls" : 96158328,
|
"cursor search near calls" : 72805618,
|
"cursor update calls" : 0
|
},
|
"data-handle" : {
|
"connection dhandles swept" : 0,
|
"connection candidate referenced" : 98524,
|
"connection sweeps" : 36372,
|
"connection time-of-death sets" : 98554,
|
"session dhandles swept" : 98500,
|
"session sweep attempts" : 4381
|
},
|
"log" : {
|
"log buffer size increases" : 1,
|
"total log buffer size" : 7888896,
|
"log bytes of payload data" : 14434150786,
|
"log bytes written" : 18157866880,
|
"yields waiting for previous log file close" : 0,
|
"total size of compressed records" : 4911046605,
|
"total in-memory size of compressed records" : 9835096752,
|
"log records too small to compress" : 25076133,
|
"log records not compressed" : 35432212,
|
"log records compressed" : 9736540,
|
"maximum log file size" : 104857600,
|
"pre-allocated log files prepared" : 178,
|
"number of pre-allocated log files to create" : 1,
|
"pre-allocated log files used" : 177,
|
"log read operations" : 0,
|
"log release advances write LSN" : 11242,
|
"records processed by log scan" : 33067,
|
"log scan records requiring two reads" : 19351,
|
"log scan operations" : 3,
|
"consolidated slot closures" : 58475323,
|
"logging bytes consolidated" : 18569641984,
|
"consolidated slot joins" : 70347430,
|
"consolidated slot join races" : 499137,
|
"slots selected for switching that were unavailable" : 46233890,
|
"record size exceeded maximum" : 0,
|
"failed to find a slot large enough for record" : 1,
|
"consolidated slot join transitions" : 204318,
|
"log sync operations" : 10933,
|
"log sync_dir operations" : 155,
|
"log server thread advances write LSN" : 58464391,
|
"log write operations" : 70262480
|
},
|
"reconciliation" : {
|
"page reconciliation calls" : 22727633,
|
"page reconciliation calls for eviction" : 5662,
|
"split bytes currently awaiting free" : 483823,
|
"split objects currently awaiting free" : 6
|
},
|
"session" : {
|
"open cursor count" : 4080,
|
"open session count" : 140
|
},
|
"thread-yield" : {
|
"page acquire busy blocked" : 5,
|
"page acquire eviction blocked" : 324,
|
"page acquire locked blocked" : 3306197,
|
"page acquire read blocked" : 263169319,
|
"page acquire time sleeping (usecs)" : 795089487
|
},
|
"transaction" : {
|
"transaction begins" : 80889468,
|
"transaction checkpoints" : 3546,
|
"transaction checkpoint generation" : 3546,
|
"transaction checkpoint currently running" : 0,
|
"transaction checkpoint max time (msecs)" : 32295,
|
"transaction checkpoint min time (msecs)" : 2,
|
"transaction checkpoint most recent time (msecs)" : 1551,
|
"transaction checkpoint total time (msecs)" : 8492572,
|
"transactions committed" : 70285520,
|
"transaction failures due to cache overflow" : 0,
|
"transaction range of IDs currently pinned by a checkpoint" : 0,
|
"transaction range of IDs currently pinned" : 39,
|
"transactions rolled back" : 10666794
|
},
|
"concurrentTransactions" : {
|
"write" : {
|
"out" : 0,
|
"available" : 128,
|
"totalTickets" : 128
|
},
|
"read" : {
|
"out" : 1,
|
"available" : 127,
|
"totalTickets" : 128
|
}
|
}
|
},
|
"writeBacksQueued" : false,
|
"mem" : {
|
"bits" : 64,
|
"resident" : 47584,
|
"virtual" : 48632,
|
"supported" : true,
|
"mapped" : 0,
|
"mappedWithJournal" : 0
|
},
|
"metrics" : {
|
"commands" : {
|
"buildInfo" : {
|
"failed" : NumberLong(0),
|
"total" : NumberLong(1318)
|
},
|
"delete" : {
|
"failed" : NumberLong(0),
|
"total" : NumberLong(177781)
|
},
|
"getLastError" : {
|
"failed" : NumberLong(0),
|
"total" : NumberLong(1318)
|
},
|
"getLog" : {
|
"failed" : NumberLong(0),
|
"total" : NumberLong(4)
|
},
|
"getnonce" : {
|
"failed" : NumberLong(0),
|
"total" : NumberLong(1)
|
},
|
"insert" : {
|
"failed" : NumberLong(0),
|
"total" : NumberLong(1196251)
|
},
|
"isMaster" : {
|
"failed" : NumberLong(0),
|
"total" : NumberLong(1046456)
|
},
|
"listCollections" : {
|
"failed" : NumberLong(0),
|
"total" : NumberLong(2)
|
},
|
"ping" : {
|
"failed" : NumberLong(0),
|
"total" : NumberLong(19)
|
},
|
"replSetElect" : {
|
"failed" : NumberLong(0),
|
"total" : NumberLong(25)
|
},
|
"replSetFresh" : {
|
"failed" : NumberLong(0),
|
"total" : NumberLong(106)
|
},
|
"replSetGetStatus" : {
|
"failed" : NumberLong(0),
|
"total" : NumberLong(133)
|
},
|
"replSetHeartbeat" : {
|
"failed" : NumberLong(0),
|
"total" : NumberLong(232451)
|
},
|
"replSetUpdatePosition" : {
|
"failed" : NumberLong(0),
|
"total" : NumberLong(6181191)
|
},
|
"serverStatus" : {
|
"failed" : NumberLong(0),
|
"total" : NumberLong(43)
|
},
|
"update" : {
|
"failed" : NumberLong(0),
|
"total" : NumberLong(6429396)
|
},
|
"whatsmyuri" : {
|
"failed" : NumberLong(0),
|
"total" : NumberLong(13)
|
}
|
},
|
"cursor" : {
|
"timedOut" : NumberLong(3),
|
"open" : {
|
"noTimeout" : NumberLong(0),
|
"pinned" : NumberLong(0),
|
"total" : NumberLong(1)
|
}
|
},
|
"document" : {
|
"deleted" : NumberLong(148383),
|
"inserted" : NumberLong(1216138),
|
"returned" : NumberLong(17187205),
|
"updated" : NumberLong(6429396)
|
},
|
"getLastError" : {
|
"wtime" : {
|
"num" : 0,
|
"totalMillis" : 0
|
},
|
"wtimeouts" : NumberLong(0)
|
},
|
"operation" : {
|
"fastmod" : NumberLong(0),
|
"idhack" : NumberLong(721797),
|
"scanAndOrder" : NumberLong(10),
|
"writeConflicts" : NumberLong(17)
|
},
|
"queryExecutor" : {
|
"scanned" : NumberLong(1589246),
|
"scannedObjects" : NumberLong(35525321)
|
},
|
"record" : {
|
"moves" : NumberLong(0)
|
},
|
"repl" : {
|
"apply" : {
|
"batches" : {
|
"num" : 7090319,
|
"totalMillis" : 482498
|
},
|
"ops" : NumberLong(40306361)
|
},
|
"buffer" : {
|
"count" : NumberLong(0),
|
"maxSizeBytes" : 268435456,
|
"sizeBytes" : NumberLong(0)
|
},
|
"network" : {
|
"bytes" : NumberLong("6927059542"),
|
"getmores" : {
|
"num" : 6996686,
|
"totalMillis" : 116533294
|
},
|
"ops" : NumberLong(40306361),
|
"readersCreated" : NumberLong(1)
|
},
|
"preload" : {
|
"docs" : {
|
"num" : 0,
|
"totalMillis" : 0
|
},
|
"indexes" : {
|
"num" : 0,
|
"totalMillis" : 0
|
}
|
}
|
},
|
"storage" : {
|
"freelist" : {
|
"search" : {
|
"bucketExhausted" : NumberLong(0),
|
"requests" : NumberLong(0),
|
"scanned" : NumberLong(0)
|
}
|
}
|
},
|
"ttl" : {
|
"deletedDocuments" : NumberLong(8083424),
|
"passes" : NumberLong(3901)
|
}
|
},
|
"ok" : 1
|
}
|
|
|
mamoos1, to help us determine what the cause is we would like to look at serverStatus information.
Thanks,
Bruce
|
|
Hello,
I am experiencing the exact same issue, however, I do not have multiple cursors open.
When I look in MMS it looks as if the resident memory is slowly rising until ultimately the process is killed by the OS.
I'm using MongoDB 3.0.3 on Ubuntu 14.04.2, with wiredTiger as the storage engine. It's in a replicaSet (Secondary member) where the other members are 2.6.9, not sure if relevant.
I have no special memory configuration (Probably the cache is configured as half of the available memory - this should be the default, right?)
I'm actually not so sure where to start my debugging from, since this is just slowly, but steadily rising, where all other metrics (cursors, page faults, opcounters, connections, storage, queues, etc.) remain pretty constant.
I used to have the exact same application work against a 2.6.9 node, and it worked perfectly. The application runs different queries, inc. count, aggregations and map reduces.
|
|
Hello, I added manual cursor closing and now it works well (mmapv1).
|
|
Server status when 85% memory is used.
|
> db.serverStatus()
|
{
|
"host" : "ws-9-mongodb",
|
"version" : "3.0.2",
|
"process" : "mongod",
|
"pid" : NumberLong(13182),
|
"uptime" : 457,
|
"uptimeMillis" : NumberLong(457300),
|
"uptimeEstimate" : 446,
|
"localTime" : ISODate("2015-04-30T11:31:45.767Z"),
|
"asserts" : {
|
"regular" : 0,
|
"warning" : 0,
|
"msg" : 0,
|
"user" : 0,
|
"rollovers" : 0
|
},
|
"backgroundFlushing" : {
|
"flushes" : 7,
|
"total_ms" : 53,
|
"average_ms" : 7.571428571428571,
|
"last_ms" : 30,
|
"last_finished" : ISODate("2015-04-30T11:31:08.574Z")
|
},
|
"connections" : {
|
"current" : 1,
|
"available" : 51199,
|
"totalCreated" : NumberLong(606)
|
},
|
"cursors" : {
|
"note" : "deprecated, use server status metrics",
|
"clientCursors_size" : 160332,
|
"totalOpen" : 160332,
|
"pinned" : 0,
|
"totalNoTimeout" : 0,
|
"timedOut" : 0
|
},
|
"dur" : {
|
"commits" : 4,
|
"journaledMB" : 0,
|
"writeToDataFilesMB" : 0,
|
"compression" : 0,
|
"commitsInWriteLock" : 0,
|
"earlyCommits" : 0,
|
"timeMs" : {
|
"dt" : 0,
|
"prepLogBuffer" : 0,
|
"writeToJournal" : 0,
|
"writeToDataFiles" : 0,
|
"remapPrivateView" : 0,
|
"commits" : 0,
|
"commitsInWriteLock" : 0
|
}
|
},
|
"extra_info" : {
|
"note" : "fields vary by platform",
|
"heap_usage_bytes" : 1713573064,
|
"page_faults" : 30
|
},
|
"globalLock" : {
|
"totalTime" : NumberLong(457299000),
|
"currentQueue" : {
|
"total" : 0,
|
"readers" : 0,
|
"writers" : 0
|
},
|
"activeClients" : {
|
"total" : 11,
|
"readers" : 0,
|
"writers" : 0
|
}
|
},
|
"locks" : {
|
"Global" : {
|
"acquireCount" : {
|
"r" : NumberLong(178491),
|
"w" : NumberLong(18031),
|
"W" : NumberLong(5)
|
}
|
},
|
"MMAPV1Journal" : {
|
"acquireCount" : {
|
"r" : NumberLong(209740),
|
"w" : NumberLong(45129),
|
"R" : NumberLong(4426),
|
"W" : NumberLong(98)
|
},
|
"acquireWaitCount" : {
|
"r" : NumberLong(132),
|
"w" : NumberLong(200),
|
"R" : NumberLong(51),
|
"W" : NumberLong(20)
|
},
|
"timeAcquiringMicros" : {
|
"r" : NumberLong(315583),
|
"w" : NumberLong(821871),
|
"R" : NumberLong(84356),
|
"W" : NumberLong(9091)
|
}
|
},
|
"Database" : {
|
"acquireCount" : {
|
"r" : NumberLong(178488),
|
"w" : NumberLong(17813),
|
"R" : NumberLong(3),
|
"W" : NumberLong(218)
|
},
|
"acquireWaitCount" : {
|
"w" : NumberLong(1),
|
"W" : NumberLong(18)
|
},
|
"timeAcquiringMicros" : {
|
"w" : NumberLong(68),
|
"W" : NumberLong(2842)
|
}
|
},
|
"Collection" : {
|
"acquireCount" : {
|
"R" : NumberLong(178565),
|
"W" : NumberLong(17815)
|
},
|
"acquireWaitCount" : {
|
"R" : NumberLong(31251),
|
"W" : NumberLong(9244)
|
},
|
"timeAcquiringMicros" : {
|
"R" : NumberLong(113015872),
|
"W" : NumberLong(19730141)
|
}
|
},
|
"Metadata" : {
|
"acquireCount" : {
|
"W" : NumberLong(1)
|
}
|
}
|
},
|
"network" : {
|
"bytesIn" : 30605343,
|
"bytesOut" : 43043585,
|
"numRequests" : 196383
|
},
|
"opcounters" : {
|
"insert" : 17814,
|
"query" : 176583,
|
"update" : 1,
|
"delete" : 0,
|
"getmore" : 0,
|
"command" : 1983
|
},
|
"opcountersRepl" : {
|
"insert" : 0,
|
"query" : 0,
|
"update" : 0,
|
"delete" : 0,
|
"getmore" : 0,
|
"command" : 0
|
},
|
"storageEngine" : {
|
"name" : "mmapv1"
|
},
|
"writeBacksQueued" : false,
|
"mem" : {
|
"bits" : 64,
|
"resident" : 1716,
|
"virtual" : 6333,
|
"supported" : true,
|
"mapped" : 2160,
|
"mappedWithJournal" : 4320
|
},
|
"metrics" : {
|
"commands" : {
|
"createIndexes" : {
|
"failed" : NumberLong(0),
|
"total" : NumberLong(200)
|
},
|
"createUser" : {
|
"failed" : NumberLong(0),
|
"total" : NumberLong(1)
|
},
|
"getLastError" : {
|
"failed" : NumberLong(0),
|
"total" : NumberLong(2)
|
},
|
"getLog" : {
|
"failed" : NumberLong(0),
|
"total" : NumberLong(3)
|
},
|
"insert" : {
|
"failed" : NumberLong(0),
|
"total" : NumberLong(17813)
|
},
|
"isMaster" : {
|
"failed" : NumberLong(0),
|
"total" : NumberLong(235)
|
},
|
"replSetGetStatus" : {
|
"failed" : NumberLong(3),
|
"total" : NumberLong(3)
|
},
|
"saslContinue" : {
|
"failed" : NumberLong(0),
|
"total" : NumberLong(1012)
|
},
|
"saslStart" : {
|
"failed" : NumberLong(0),
|
"total" : NumberLong(506)
|
},
|
"serverStatus" : {
|
"failed" : NumberLong(0),
|
"total" : NumberLong(13)
|
},
|
"usersInfo" : {
|
"failed" : NumberLong(0),
|
"total" : NumberLong(2)
|
},
|
"whatsmyuri" : {
|
"failed" : NumberLong(0),
|
"total" : NumberLong(6)
|
}
|
},
|
"cursor" : {
|
"timedOut" : NumberLong(0),
|
"open" : {
|
"noTimeout" : NumberLong(0),
|
"pinned" : NumberLong(0),
|
"total" : NumberLong(160332)
|
}
|
},
|
"document" : {
|
"deleted" : NumberLong(0),
|
"inserted" : NumberLong(17814),
|
"returned" : NumberLong(1603322),
|
"updated" : NumberLong(1)
|
},
|
"getLastError" : {
|
"wtime" : {
|
"num" : 0,
|
"totalMillis" : 0
|
},
|
"wtimeouts" : NumberLong(0)
|
},
|
"operation" : {
|
"fastmod" : NumberLong(0),
|
"idhack" : NumberLong(0),
|
"scanAndOrder" : NumberLong(0),
|
"writeConflicts" : NumberLong(0)
|
},
|
"queryExecutor" : {
|
"scanned" : NumberLong(1683805),
|
"scannedObjects" : NumberLong(1603329)
|
},
|
"record" : {
|
"moves" : NumberLong(0)
|
},
|
"repl" : {
|
"apply" : {
|
"batches" : {
|
"num" : 0,
|
"totalMillis" : 0
|
},
|
"ops" : NumberLong(0)
|
},
|
"buffer" : {
|
"count" : NumberLong(0),
|
"maxSizeBytes" : 268435456,
|
"sizeBytes" : NumberLong(0)
|
},
|
"network" : {
|
"bytes" : NumberLong(0),
|
"getmores" : {
|
"num" : 0,
|
"totalMillis" : 0
|
},
|
"ops" : NumberLong(0),
|
"readersCreated" : NumberLong(0)
|
},
|
"preload" : {
|
"docs" : {
|
"num" : 0,
|
"totalMillis" : 0
|
},
|
"indexes" : {
|
"num" : 0,
|
"totalMillis" : 0
|
}
|
}
|
},
|
"storage" : {
|
"freelist" : {
|
"search" : {
|
"bucketExhausted" : NumberLong(0),
|
"requests" : NumberLong(18147),
|
"scanned" : NumberLong(0)
|
}
|
}
|
},
|
"ttl" : {
|
"deletedDocuments" : NumberLong(0),
|
"passes" : NumberLong(7)
|
}
|
},
|
"ok" : 1
|
}
|
|
|
Now I have catched the same issue on mmapv1, after 3 minutes and ~200000 operations insert/find.
|
|
After ~10 seconds and 10000 operations insert/find
https://yadi.sk/i/YuWf5yebgMCVR
|
|
Hi, Dan.
In the last test the mongo server (3.0.2 engine=wiredTiger) has huge memory leaks after 10 seconds of execution (~12000 operations insert 10% find 90%).
Code:
|
var bm = require('../libs/basic/basic-mongo');
|
var async = require('async');
|
var _ = require('lodash');
|
var DbState = bm.DbState;
|
var s = { i: 0, e: 0, f: 0 };
|
var generator = {
|
messageRecord: function () {
|
var rnd = _.random(100000000000);
|
var rndTo = _.random(100000000000);
|
return {
|
toId: rndTo + 'to',
|
message: 'one two three four five one two three four five one two three four five one two three four five one two three four five one two three four five one two three four five one two three four five one two three four five one two three four five ',
|
ownerId: rnd + 'user',
|
updatedAt: Date.now(),
|
createdAt: Date.now()
|
};
|
},
|
userRecord: function () {
|
var rnd = _.random(100000000000);
|
var cityRnd = _.random(100000);
|
return {
|
name: 'name ' + rnd,
|
email: 'mail' + rnd + '@gmail.com',
|
phone: rnd,
|
age: _.random(100),
|
gender: _.random(1),
|
cityId: cityRnd,
|
loc: { type: 'Point', coordinates: [180 - Math.random() * 360, 90 - Math.random() * 180] },
|
groups: [],
|
updatedAt: Date.now(),
|
createdAt: Date.now()
|
};
|
},
|
getUserRecord: function () {
|
var fromRnd = _.random(100);
|
var toRnd = _.random(fromRnd, 100);
|
return { query: { age: { $gt: fromRnd, $lt: toRnd }, gender: _.random(1) }, fields: { _id: 1 }, limit: 10 };
|
},
|
locationRecord: function () {
|
var rnd = _.random(100000000000);
|
return {
|
name: 'The Point ' + _.random(1000000),
|
ownerId: rnd + 'user',
|
loc: { type: 'Point', coordinates: [180 - Math.random() * 360, 90 - Math.random() * 180] }
|
};
|
}
|
};
|
var collectionName = 'testUserCollectionToKillMongoIn5seconds';
|
var collectionKeys = [{ age: 1, gender: 1, cityId: 1 }, { email: 1 }];
|
var collectionInsertRecord = generator.userRecord;
|
var collectionFindRecord = generator.getUserRecord;
|
function createServerTestPoint() {
|
var bmi = new bm.BasicMongo('mongodb://user:password@host/database');
|
bmi.on(DbState.CONNECTED, connectHandler);
|
bmi.on(DbState.READY, readyHandler);
|
bmi.connect();
|
function connectHandler() {
|
console.log('mng:connected');
|
bmi.init([{ name: collectionName, keys: collectionKeys }]);
|
}
|
function readyHandler() {
|
console.log('mng:ready');
|
var collection = bmi.getClient().collection(collectionName);
|
function insertOneRecord(cb) {
|
collection.insert(collectionInsertRecord(), function (e, result) {
|
(e ? s.e++ : s.i++);
|
console.log('insert ', s.i, s.e, !!result);
|
cb(e, result);
|
});
|
}
|
function getOneRecord(cb) {
|
var getData = collectionFindRecord();
|
collection.find(getData.query, getData.fields).limit(getData.limit).toArray(function (e, result) {
|
(e ? s.e++ : s.f++);
|
console.log('find ', s.f, s.e, !!result);
|
cb(e, result);
|
});
|
}
|
var insertSeries = _.chain(_.range(1000)).map(function (v) {
|
return insertOneRecord;
|
}).value();
|
var findSeries = _.chain(_.range(10000)).map(function (v) {
|
return getOneRecord;
|
}).value();
|
var resultSeries = _.shuffle([].concat(insertSeries, findSeries));
|
console.log('Start execute %d operations', resultSeries.length, resultSeries);
|
async.series(resultSeries, function (e, results) {
|
console.log('Done', e, results.length, _.first(results), _.last(results));
|
});
|
}
|
return bmi;
|
}
|
_.chain(_.range(100)).each(function (v) { createServerTestPoint(); });
|
//# sourceMappingURL=basic-mongo-test.js.map
|
|
Config:
# mongod.conf
|
|
#where to log
|
logpath=/var/log/mongodb/mongod.log
|
|
logappend=true
|
|
# fork and run in background
|
fork=true
|
|
#port=27017
|
|
dbpath=/var/lib/mongo-wt
|
storageEngine=wiredTiger
|
wiredTigerEngineConfigString=cache_size=1G,direct_io=[data]
|
|
# location of pidfile
|
pidfilepath=/var/run/mongodb/mongod.pid
|
|
# Listen to local interface only. Comment out to listen on all interfaces.
|
#bind_ip=127.0.0.1
|
|
# Disables write-ahead journaling
|
# nojournal=true
|
|
# Enables periodic logging of CPU utilization and I/O wait
|
cpu=true
|
|
# Turn on/off security. Off is currently the default
|
#noauth=true
|
auth=true
|
|
# Verbose logging output.
|
#verbose=true
|
|
# Inspect all client data for validity on receipt (useful for
|
# developing drivers)
|
#objcheck=true
|
|
# Enable db quota management
|
#quota=true
|
|
# Set oplogging level where n is
|
# 0=off (default)
|
# 1=W
|
# 2=R
|
# 3=both
|
# 7=W+some reads
|
#diaglog=0
|
|
# Ignore query hints
|
#nohints=true
|
|
# Enable the HTTP interface (Defaults to port 28017).
|
httpinterface=true
|
|
# Turns off server-side scripting. This will result in greatly limited
|
# functionality
|
#noscripting=true
|
|
# Turns off table scans. Any query that would do a table scan fails.
|
#notablescan=true
|
|
# Disable data file preallocation.
|
#noprealloc=true
|
|
# Specify .ns file size for new databases.
|
# nssize=<size>
|
|
# Replication Options
|
|
# in replicated mongo databases, specify the replica set name here
|
#replSet=setname
|
# maximum size in megabytes for replication operation log
|
#oplogSize=1024
|
# path to a key file storing authentication info for connections
|
# between replica set members
|
#keyFile=/path/to/keyfile
|
|
|
Hi vasily.timofeev@gmail.com, can you post your stress test? Also, which version of mongodb are you testing with and what are the startup parameters you are using?
|
|
I have same issue with WiredTiger on Centos 7.0.
I created a simple stress test. It contains insert to collection that has 2 indexes (string and numeric).
After 2-3 minutes mongod have used all memory (2GB) and was killed by a system manager.
|
|
Thanks for the additional information. I have not been able to exactly replicate the specific pattern of memory growth, cursor growth, etc. that your runs showed; to do that I think I would need to set up the complete application, as the specific pattern of those metrics depends on the exact sequence of operations.
However I have done some more investigation on the question of the amount of memory used by cursors, and, based on my preliminary results, it seems to be quite a bit higher than I had thought - about 1.4-1.5 kB per cursor, in my tests. If this holds up it would mean that the 580 k cursors we see in your runs would account for about 800 MB or so of memory, or about half of the memory utilization you see. So at this point I think that SERVER-17386 accounts for most of the issue you are seeing.
I will continue to investigate and will keep you posted.
|
|
Thanks. I will have a try.
I make a stress test. So each collection will grow up, and memory usage will never reduce. Finally oom.
Only mmapv1 storage engine cost so much less memory.
|
|
Unfortuantly I can't find the reason of your issue. Maybe you can move the lines
def mongo = MongoConnectionPoolUtils.instance.mongo
|
def mongoDB = mongo.getDB("testDB")
|
out of the eventbus handler closure? What happens if you remove the ensureIndex() call?
How many objects do you store in each collection approximately? Maybe you can reuse the mongoDB.getCollection:
if (!colname != "test-$id") {
|
colname = "test-$id";
|
coll = mongoDB.getCollection(colname);
|
}
|
|
|
@Gabriel
Of course. Please check:
MongoConnectionPoolUtils.groovy
package com.myModule.dataaccess.mongo
|
|
import com.mongodb.Mongo
|
import com.mongodb.MongoOptions
|
import com.mongodb.ServerAddress
|
|
class MongoConnectionPoolUtils {
|
|
private def mongo
|
private static String address = "127.0.0.1"
|
private static String port = "27017"
|
private static def instance = new MongoConnectionPoolUtils()
|
|
private MongoConnectionPoolUtils(){
|
def addr = new ServerAddress(address,port as int)
|
def opt = new MongoOptions()
|
opt.connectionsPerHost = 50
|
opt.threadsAllowedToBlockForConnectionMultiplier = 50
|
mongo = new Mongo(addr, opt)
|
}
|
|
public static synchronized getInstance(){
|
instance
|
}
|
|
def getMongo(){
|
mongo
|
}
|
}
|
Yes, you're right. I find the ensureIndex has been deprecated since 3.0. But this code should work for mongo 2.0 to mongo 3.0. So I can't change ensureIndex to createIndex.
|
|
Maybe I can help: The code is executed in a single thread (see http://vertx.io/manual.html#concurrency )
LOOP
1) get the db connection via connection pool (class MongoConnectionPoolUtils is not shown here ...)
2) Gets a collection with a given name ("test-" + $id). If the collection does not exist, a new collection is created.
3) call deprecated ensureIndex on that collection (Creates an index if the index does not already exist.)
4) insert data
5) print the time spent (only if it's more than 5 seconds)
@Fen Yu : Could you give us the code of MongoConnectionPoolUtils?
|
|
Sorry, I don't know the detail about the relationship between threads, collection creation, index creation, and inserts. These codes are in a vert.x vertical. Here is the starter code:
class MongoWorker extends Verticle{
|
|
def mongodbAddress = "dataaccess.mongodb"
|
def eb
|
def log
|
|
def start(){
|
eb = vertx.eventBus
|
log = container.logger
|
vertx.eventBus.registerHandler(mongodbAddress){msg->
|
def data = msg.body
|
def startTime = System.currentTimeMillis()
|
def mongo = MongoConnectionPoolUtils.instance.mongo
|
def id = data.get("id")
|
def mongoDB = mongo.getDB("testDB")
|
def coll = mongoDB.getCollection("test-$id")
|
coll.ensureIndex([timestamp:-1] as BasicDBObject)
|
coll.insert(data.data as BasicDBObject)
|
def spendTime = System.currentTimeMillis() - startTime
|
if(spendTime > 5000){
|
log.info("mongo save spent ${spendTime} ms")
|
}
|
}
|
}
|
}
|
It's in a worker. Every json object will be stored in mongodb, and with a timestamp index. When I use mmapv1 storage engine, the memory usage is normal. But when I set to wiredTiger, the same code will cause mongodb out of memory.
|
|
Thanks for the additional detail. Can you describe the overall flow of the application, perhaps using pseudo-code. In particular I need to know the relationship between threads, collection creation, index creation, and inserts. Is your program doing ensureIndex for each insertion (as the snippet above might suggest), or are you creating each collection, then doing ensureIndex, then inserting some number of documents in a loop? Does the test run you showed me above include creating the collections as part of the run, or are those created prior to the run?
Thanks,
Bruce
|
|
Hi @Bruce Lucas. My program is written in groovy. Here is some related codes:
|
test.groovy
|
import com.mongodb.Mongo
|
import com.mongodb.MongoOptions
|
import com.mongodb.ServerAddress
|
private def mongo
|
private static String address = "127.0.0.1"
|
private static String port = "27017"
|
|
def addr = new ServerAddress(address,port as int)
|
def opt = new MongoOptions()
|
opt.connectionsPerHost = 50
|
opt.threadsAllowedToBlockForConnectionMultiplier = 50
|
mongo = new Mongo(addr, opt)
|
|
def mongoDB = mongo.getDB("testDB")
|
def coll = mongoDB.getCollection("test-$id")
|
coll.ensureIndex([timestamp:-1] as BasicDBObject)
|
coll.insert(data.data as BasicDBObject)
|
There is about 3000 collections in testDB. Here is an example of an object:
"_id" : ObjectId("54f81f0044f4d4a4dd146777"), "31" : "1.0", "32" : "1.0", "34" : "0.0", "35" : "0.0", "10" : "1.0", "9" : "0.0", "7" : "1.0", "8" : "0.0", "6" : "2118.0", "1" : "4.0", "2" : "1.0", "3" : "84.0", "5" : "2894.0", "4" : "2573.0", "timestamp" : NumberLong("1425546342041"), "a" : 1, "b" : 1, "c" : 169.52
You can see I use a thread pool.
|
|
Thanks for providing those logs, it is very helpful. Here are some relevant stats from ss.log:

- The number of allocated bytes grows to 1.6 GB, whereas there are only about 470 MB of data in the WiredTiger cache. We need to account for all the extra memory usage.
- At the point marked A, the number of open cursors begins to climb dramatically, reaching about 580k by the end of the run. I suspect that
SERVER-17386 (which involves the way mongod caches WT cursors) is involved here. To confirm that the number of open cursors is accounted for by SERVER-17386, I would like to know: how many threads the application uses, how many total collections plus indexes are involved, and what application behavior changed at A to account for the sudden increasing trend in the number of cursors. (Note: these are WiredTiger cursors, not MongoDB cursors, and they don't map 1-1, so I don't suspect that the application explicitly opened 580k cursors.)
- My back-of-the-evelope calculation suggests however that the number of cursors only accounts for 200 MB or so of memory.
Focusing on the early part of the run before A:

- The number of bytes in the cache has risen to about 105 MB, while at the same time the number of allocated bytes has risen to about 561 MB. Based on the fact that the rise was correlated, I think you might be encountering
SERVER-17424. However, the excessive memory usage in that case was about 20%, whereas here it is about 400%. So to confirm that it is the same issue I would like to do some memory profiling on your app, if you would be willing to share it with us.
To summarize my requests:
- Was there an application behavior change at A (e.g. number of connections, number of collections or indexes, type of operation)?
- How many threads (connections) does the app use, and how many total collections plus indexes?
- Can you share the Java app with us (let me know if you want a private upload location)?
Thanks,
Bruce
|
|
Hi @Bruce Lucas. Please check the log. I use a java program to make this test. But java process oom first.
After the java program crashed by oom, The mongodb memory usage is still not released.
ps aux | grep mongo
|
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
|
mongodb 17896 16.4 41.1 1812976 1666736 ? Ssl 09:48 18:33 /usr/bin/mongod --config /etc/mongod.conf
|
I've uploaded my ss.log and syslog. Please check.
|
|
Of course. I did this test on my test server, no important data here.
Wait please. I will upload the log file after I do your steps.
The test server is a virtualbox guest with 4G memory, without swap.
And I find wiredTiger memory usage will never reduce. I use mongorestore to restore a 70GB db. After finished mongodb memory usage is stayed at about 2G. But mmapv1 is nomal, and can auto release the memory usage.
|
|
Can you tell me how much physical memory the system has? Also, can you please attach the syslog file (/var/log/syslog* on Ubuntu) that covers the time when the OOM condition occurred? This will contain additional information about system memory usage that will help us understand the incident.
Is this issue readily reproducible? If so, can you please do the following:
- start mongod as usual
- start the following data collection script:
mongo --eval "while(true) {print(JSON.stringify(db.serverStatus({tcmalloc:1}))); sleep(1000)}" >ss.log &
|
- after the OOM condition occurs, please attach both syslog and the ss.log output. This will give us a clearer picture of how memory usage varies over the course of the run.
Finally, we may want to try to reproduce this issue in-house. Is there any possibility you would be able to share the load generator with us? We could provide a private upload location if you don't want to share it on this public ticket.
Thanks,
Bruce
|
|
Edit: Here is my /etc/mongod.conf:
cat /etc/mongod.conf
systemLog:
destination: file
path: "/var/log/mongodb/mongodb.log"
logAppend: true
storage:
journal:
enabled: true
dbPath: "/mnt/mongodb"
engine: wiredTiger
net:
bindIp: 127.0.0.1
port: 27017
I test with mmapv1, it works well very. The memory usage is normal.
|
Generated at Thu Feb 08 03:44:30 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.