Details
-
Bug
-
Resolution: Done
-
Major - P3
-
None
-
4.0.3
-
None
-
ALL
-
Description
HI there,I have a group of mongoDB shard cluster in a docker clusters, with 2 mongos members, 1 config server (total: 3 replica members) and 2 shards(each shard also has 3 replica members.). Each member is running in one docker container. Unclear to know why, my primary config server was killed by OOM someday.
I have to increase much more memory for the config server to prevent it from OOM again, even I have given a wiredTiger cache limit like below in the config file. At first time, the config server running ok with 2G memory, but now, it's 8G and has been used almost 99%. And each of the two shard hold only 28G data at most.
# wiredTiger memory limit in config file
|
wiredTigerCacheSizeGB = 1 |
---
|
# docker stats $dockerId
|
CONTAINER CPU % MEM USAGE / LIMIT MEM % NET I/O BLOCK I/O PIDS
|
d-f2uxxxp41y 3.86% 7.943 GiB / 8 GiB 99.29% 0 B / 0 B 39.52 GB / 1.539 TB 0 |
Here is what I can find in mongod log and db.serverStatus output. And I know the OS may take over all the memory which wiredTiger can't allocate, but I want to know how the OS could use so much and for what purpuse. So where I can figure it out? I hope any "comrade" here can offer some help:
Â
# ps aux
|
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
|
root 1 0.0 0.0 11684 1308 ? Ss 2020 0:00 /bin/bash /entrypoint.sh |
root 160 0.0 0.0 20612 804 ? Ss 2020 12:33 crond |
root 161 10.8 2.8 15042880 7611028 ? SLl 2020 93735:18 /opt/jmiss_mongo/mongod --config /var/jmiss_mongo/config/configsvr.conf |
---
|
# mongostat
|
insert query update delete getmore command dirty used flushes vsize res qrw arw net_in net_out conn set repl time
|
*0 9 *0 *0 0 11|0 0.4% 73.3% 0 14.3G 7.24G 0|0 1|0 896b 68.7k 30 mgset-1725952967_configsvr PRI Oct 28 08:16:41.036 |
*0 2 2 *0 11 16|0 0.4% 73.3% 0 14.3G 7.24G 0|0 1|0 12.7k 80.1k 30 mgset-1725952967_configsvr PRI Oct 28 08:16:42.036 |
*0 *0 *0 *0 0 11|0 0.4% 73.3% 0 14.3G 7.24G 0|0 1|0 1.21k 70.4k 30 mgset-1725952967_configsvr PRI Oct 28 08:16:43.036 |
*0 *0 *0 *0 0 9|0 0.4% 73.3% 0 14.3G 7.24G 0|0 1|0 793b 69.6k 30 mgset-1725952967_configsvr PRI Oct 28 08:16:44.036 |
*0 *0 *0 *0 1 11|0 0.4% 73.3% 0 14.3G 7.24G 0|0 1|0 1.15k 68.2k 30 mgset-1725952967_configsvr PRI Oct 28 08:16:45.035 |
*0 *0 *0 *0 5 11|0 0.4% 73.3% 0 14.3G 7.24G 0|0 1|0 5.71k 72.2k 30 mgset-1725952967_configsvr PRI Oct 28 08:16:46.036 |
*0 *0 *0 *0 1 11|0 0.4% 73.3% 0 14.3G 7.24G 0|0 1|0 1.15k 68.5k 30 mgset-1725952967_configsvr PRI Oct 28 08:16:47.035 |
*0 *0 *0 *0 0 7|0 0.4% 73.3% 0 14.3G 7.24G 0|0 1|0 1.17k 69.4k 30 mgset-1725952967_configsvr PRI Oct 28 08:16:48.036 |
*0 *0 *0 *0 1 13|0 0.4% 73.3% 0 14.3G 7.24G 0|0 1|0 1.27k 71.0k 30 mgset-1725952967_configsvr PRI Oct 28 08:16:49.034 |
*0 *0 *0 *0 0 6|0 0.0% 73.3% 1 14.3G 7.24G 0|0 1|0 920b 66.7k 30 mgset-1725952967_configsvr PRI Oct 28 08:16:50.036 |
|
Â
Â
mgset-172xxx_configsvr:PRIMARY> show dbs
|
admin 0.000GB |
config 0.149GB |
local 2.596GB |
mgset-172xxx_configsvr:PRIMARY> db.serverStatus().mem
|
{
|
"bits" : 64, |
"resident" : 7432, |
"virtual" : 14690, |
"supported" : true, |
"mapped" : 0, |
"mappedWithJournal" : 0 |
}
|
mgset-172xxx_configsvr:PRIMARY> db.serverStatus().tcmalloc
|
{
|
"generic" : { |
"current_allocated_bytes" : NumberLong("12324 143616"), |
"heap_size" : NumberLong("13836734464") |
},
|
"tcmalloc" : { |
"pageheap_free_bytes" : 154 181632, |
"pageheap_unmapped_bytes" : 1024 569344, |
"max_total_thread_cache_bytes" : NumberLong(1073 741824), |
"current_total_thread_cache_bytes" : 24 271352, |
"total_free_bytes" : 333 839872, |
"central_cache_free_bytes" : 306 076200, |
"transfer_cache_free_bytes" : 3 492320, |
"thread_cache_free_bytes" : 24 271352, |
"aggressive_memory_decommit" : 0, |
"pageheap_committed_bytes" : NumberLong("12812 165120"), |
"pageheap_scavenge_count" : 20 856825, |
"pageheap_commit_count" : 27 858250, |
"pageheap_total_commit_bytes" : NumberLong("21789588 615168"), |
"pageheap_decommit_count" : 20 856825, |
"pageheap_total_decommit_bytes" : NumberLong("21776776 450048"), |
"pageheap_reserve_count" : 6709, |
"pageheap_total_reserve_bytes" : NumberLong("13836 734464"), |
"spinlock_total_delay_ns" : NumberLong("665771 200416"), |
|
|
------------------------------------------------
|
MALLOC: 12290579360 (11721.2 MiB) Bytes in use by application |
MALLOC: + 187674624 ( 179.0 MiB) Bytes in page heap freelist |
MALLOC: + 306094552 ( 291.9 MiB) Bytes in central cache freelist |
MALLOC: + 3133024 ( 3.0 MiB) Bytes in transfer cache freelist |
MALLOC: + 24618024 ( 23.5 MiB) Bytes in thread cache freelists |
MALLOC: + 112812288 ( 107.6 MiB) Bytes in malloc metadata |
MALLOC: ------------
|
MALLOC: = 12924911872 (12326.2 MiB) Actual memory used (physical + swap) |
MALLOC: + 1024634880 ( 977.2 MiB) Bytes released to OS (aka unmapped) |
MALLOC: ------------
|
MALLOC: = 13949546752 (13303.3 MiB) Virtual address space used |
MALLOC:
|
MALLOC: 1696320 Spans in use |
MALLOC: 96 Thread heaps in use |
MALLOC: 4096 Tcmalloc page size |
------------------------------------------------
|
Call ReleaseFreeMemory() to release freelist memory to the OS (via madvise()).
|
Bytes released to the OS take up virtual address space but no physical memory.
|
|
|
}
|
}
|
mgset-172xxx_configsvr:PRIMARY> db.serverStatus().wiredTiger.cache
|
{
|
"application threads page read from disk to cache count" : 23400, |
"application threads page read from disk to cache time (usecs)" : 3238111, |
"application threads page write from cache to disk count" : 44687084, |
"application threads page write from cache to disk time (usecs)" : 1003631736, |
"bytes belonging to page images in the cache" : 261364892, |
"bytes belonging to the cache overflow table in the cache" : 182, |
"bytes currently in the cache" : 773199299, |
"bytes not belonging to page images in the cache" : 511834406, |
"bytes read into cache" : 619883360, |
"bytes written from cache" : 723664179500, |
"cache overflow cursor application thread wait time (usecs)" : 0, |
"cache overflow cursor internal thread wait time (usecs)" : 0, |
"cache overflow score" : 0, |
"cache overflow table entries" : 0, |
"cache overflow table insert calls" : 0, |
"cache overflow table remove calls" : 0, |
"checkpoint blocked page eviction" : 6, |
"eviction calls to get a page" : 4827751, |
"eviction calls to get a page found queue empty" : 4789295, |
"eviction calls to get a page found queue empty after locking" : 1098, |
"eviction currently operating in aggressive mode" : 0, |
"eviction empty score" : 0, |
"eviction passes of a file" : 38655, |
"eviction server candidate queue empty when topping up" : 1661, |
"eviction server candidate queue not empty when topping up" : 1890, |
"eviction server evicting pages" : 0, |
"eviction server slept, because we did not make progress with eviction" : 1709793, |
"eviction server unable to reach eviction goal" : 0, |
"eviction state" : 32, |
"eviction walk target pages histogram - 0-9" : 37762, |
"eviction walk target pages histogram - 10-31" : 598, |
"eviction walk target pages histogram - 128 and higher" : 0, |
"eviction walk target pages histogram - 32-63" : 162, |
"eviction walk target pages histogram - 64-128" : 133, |
"eviction walks abandoned" : 3314, |
"eviction walks gave up because they restarted their walk twice" : 28244, |
"eviction walks gave up because they saw too many pages and found no candidates" : 1131, |
"eviction walks gave up because they saw too many pages and found too few candidates" : 181, |
"eviction walks reached end of tree" : 62763, |
"eviction walks started from root of tree" : 34971, |
"eviction walks started from saved location in tree" : 3684, |
"eviction worker thread active" : 4, |
"eviction worker thread created" : 0, |
"eviction worker thread evicting pages" : 46955, |
"eviction worker thread removed" : 0, |
"eviction worker thread stable number" : 0, |
"failed eviction of pages that exceeded the in-memory maximum count" : 11, |
"failed eviction of pages that exceeded the in-memory maximum time (usecs)" : 44, |
"files with active eviction walks" : 0, |
"files with new eviction walks started" : 34519, |
"force re-tuning of eviction workers once in a while" : 0, |
"hazard pointer blocked page eviction" : 361, |
"hazard pointer check calls" : 49572, |
"hazard pointer check entries walked" : 85708, |
"hazard pointer maximum array length" : 2, |
"in-memory page passed criteria to be split" : 4086, |
"in-memory page splits" : 1977, |
"internal pages evicted" : 2561, |
"internal pages split during eviction" : 26, |
"leaf pages split during eviction" : 2463, |
"maximum bytes configured" : 1073741824, |
"maximum page size at eviction" : 0, |
"modified pages evicted" : 32533, |
"modified pages evicted by application threads" : 0, |
"operations timed out waiting for space in cache" : 0, |
"overflow pages read into cache" : 0, |
"page split during eviction deepened the tree" : 1, |
"page written requiring cache overflow records" : 0, |
"pages currently held in the cache" : 8975, |
"pages evicted because they exceeded the in-memory maximum count" : 1995, |
"pages evicted because they exceeded the in-memory maximum time (usecs)" : 1163695, |
"pages evicted because they had chains of deleted items count" : 285, |
"pages evicted because they had chains of deleted items time (usecs)" : 25953, |
"pages evicted by application threads" : 0, |
"pages queued for eviction" : 37601, |
"pages queued for urgent eviction" : 9950, |
"pages queued for urgent eviction during walk" : 824, |
"pages read into cache" : 23433, |
"pages read into cache after truncate" : 7634, |
"pages read into cache after truncate in prepare state" : 0, |
"pages read into cache requiring cache overflow entries" : 0, |
"pages read into cache requiring cache overflow for checkpoint" : 0, |
"pages read into cache skipping older cache overflow entries" : 0, |
"pages read into cache with skipped cache overflow entries needed later" : 0, |
"pages read into cache with skipped cache overflow entries needed later by checkpoint" : 0, |
"pages requested from the cache" : 2622634339, |
"pages seen by eviction walk" : 356810, |
"pages selected for eviction unable to be evicted" : 478, |
"pages walked for eviction" : 9543938, |
"pages written from cache" : 44692387, |
"pages written requiring in-memory restoration" : 235, |
"percentage overhead" : 8, |
"tracked bytes belonging to internal pages in the cache" : 2325120, |
"tracked bytes belonging to leaf pages in the cache" : 770874179, |
"tracked dirty bytes in the cache" : 4434563, |
"tracked dirty pages in the cache" : 5, |
"unmodified pages evicted" : 14605 |
}
|
Can anyone give some help?