[SERVER-16571] Use Actual Memory Constraint vs. Total System Memory When They Differ Created: 17/Dec/14  Updated: 08/Jan/24  Resolved: 08/Mar/19

Status: Closed
Project: Core Server
Component/s: Storage
Affects Version/s: 2.8.0-rc2
Fix Version/s: 3.6.13, 4.0.9, 4.1.9

Type: Bug Priority: Major - P3
Reporter: Asya Kamsky Assignee: Matt Lord (Inactive)
Resolution: Done Votes: 9
Labels: containers, docker, kubernetes
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Depends
is depended on by SERVER-39966 Report both total actual system memor... Closed
Documented
is documented by DOCS-12599 Docs for SERVER-16571: Use Actual Mem... Closed
Problem/Incident
Related
related to DOCS-4645 Production Notes for WiredTiger Closed
related to DOCS-13072 SELinux and actual memory usage warnings Closed
is related to SERVER-60412 Host memory limit check does not hono... Closed
is related to SERVER-52596 Detect memLimitMB in K8S pod and info... Closed
Backwards Compatibility: Minor Change
Operating System: Linux
Backport Requested:
v4.0, v3.6
Participants:
Case:

 Description   

The memory size is taken into account within various components today:

  1. We set the default WiredTiger cache_size to approximately 1/2 of the memory size
  2. When ephemeral storage is in use we consider the memory limit in place to determine the default oplog size
  3. We set the tcmalloc cache size to 1/8th of the available memory

In all of these cases we should take the actual memory constraint that we're operating under into account rather than the total system memory – for example when running MongoDB within containers.



 Comments   
Comment by Billy Donahue [ 13/Feb/20 ]

I believe the way this was done, reading from "/sys/fs/cgroup/memory/memory.limit_in_bytes" is good enough for Docker but not necessarily for other container systems or daemonization/isolation/jailing scripts that might boot mongod. The aforementioned limit file at the top of the cgroup/memory hierarchy is relying on a Docker-specific VFS namespacing trick.

I think we're more generally supposed to read /proc/self/cgroup to figure out which cgroup your memory is controlled by.
Then read from /sys/fs/cgroup/memory/${cgroup}/memory.limit_in_bytes.

However, this (I think) correct algorithm doesn't work in Docker because they replace the deep /sys/fs/cgroup hierarchy with a flat one containing only the container's limits. But Docker does not replace /proc/self/cgroup, so you're left with an inconsistent isolation. Docker isn't the only container system, so it feels incorrect or at least incomplete to rely on its particular choices.

Comment by Githook User [ 05/Apr/19 ]

Author:

{'email': 'mattalord@gmail.com', 'name': 'Matt Lord', 'username': 'mattlord'}

Message: SERVER-16571 Use Actual Memory Constraint vs. Total System Memory When They Differ

(cherry picked from commit d535bce1bb7df20158fad965142d6b802ea95c60)
Branch: v3.6
https://github.com/mongodb/mongo/commit/bee2203bbaa4899f496b142259a8f6b95b65dd95

Comment by Githook User [ 03/Apr/19 ]

Author:

{'name': 'Matt Lord', 'username': 'mattlord', 'email': 'mattalord@gmail.com'}

Message: SERVER-16571 Use Actual Memory Constraint vs. Total System Memory When They Differ

(cherry picked from commit d535bce1bb7df20158fad965142d6b802ea95c60)
Branch: v4.0
https://github.com/mongodb/mongo/commit/e5998f6c628adcb9b82fd70839ab892e1d01f265

Comment by Githook User [ 08/Mar/19 ]

Author:

{'name': 'Matt Lord', 'email': 'mattalord@gmail.com', 'username': 'mattlord'}

Message: SERVER-16571 Use Actual Memory Constraint vs. Total System Memory When They Differ
Branch: master
https://github.com/mongodb/mongo/commit/fafe4d03edd877e4c022cb3dd714ab1ea6ae4fcd

Comment by Githook User [ 06/Mar/19 ]

Author:

{'name': 'Matt Lord', 'username': 'mattlord', 'email': 'mattalord@gmail.com'}

Message: Revert "SERVER-16571 cache_size should be set based on cgroup available RAM not physical RAM"

This reverts commit a39875e4e060d42a7ce70ec82b07af2850d3bab7.
Branch: master
https://github.com/mongodb/mongo/commit/dbef14de7332bb910200518e80a2b130d2b973f4

Comment by Githook User [ 06/Mar/19 ]

Author:

{'name': 'Matt Lord', 'username': 'mattlord', 'email': 'mattalord@gmail.com'}

Message: Revert "SERVER-16571 Apply linter fixes to original fix"

This reverts commit 602bfb9c52b2274d55492f73eeac8513d9048d10.
Branch: master
https://github.com/mongodb/mongo/commit/d654fe781da630ce7f354ec31931efc58757e690

Comment by Githook User [ 05/Mar/19 ]

Author:

{'name': 'Eric Milkie', 'email': 'milkie@10gen.com', 'username': 'milkie'}

Message: SERVER-16571 Apply linter fixes to original fix
Branch: master
https://github.com/mongodb/mongo/commit/602bfb9c52b2274d55492f73eeac8513d9048d10

Comment by Githook User [ 05/Mar/19 ]

Author:

{'name': 'Matt Lord', 'email': 'mattalord@gmail.com', 'username': 'mattlord'}

Message: SERVER-16571 cache_size should be set based on cgroup available RAM not physical RAM
Branch: master
https://github.com/mongodb/mongo/commit/a39875e4e060d42a7ce70ec82b07af2850d3bab7

Comment by James Broadhead (Inactive) [ 01/Mar/19 ]

acm / matt.lord just noticed that this one ended up on the backlog – is there any chance of getting it for MDB 4.2?
(we have a workaround in the Operator, but it's not ideal)

Comment by Matt Lord (Inactive) [ 10/Jul/18 ]

Hi All,

WiredTiger uses the output of ProcessInfo.getMemSizeMB() when determining the cache size, which in turn calls SystemInfo.memSize.

On linux, we use LinuxSysHelper::getSystemMemorySize() which is implemented using /proc/meminfo.

If we focus on Linux for now (Windows is the only other OS with native kernel [docker] containers today), then the correct way would seem to be:

  1. Check to see if our execution context / shell has a memory cgroup applied:
    NO:

    grep memory /proc/self/cgroup
    8:memory:/
    

    YES:

    grep memory /proc/self/cgroup
    5:memory:/docker/3fd47f54d37af11b6e4144706bb38509adacb10bd08fa419eb4b1e599c016022
    

  2. If NO: then we use the byte value derived from /proc/meminfo as we always do today.
  3. If YES:

Does anyone disagree or have any other comments?

This particular implementation would be relatively easy to implement–my concerns here revolve more around testing this on all combinations of:

  • Supported Linux platforms (distro+version+arch)
  • Docker version

Am I missing anything?

Thanks!

Matt

Comment by Anton Lisovenko (Inactive) [ 02/Jul/18 ]

This ticket is now more important when Mongodb Kubernetes Operator is released to beta and more customers will try Mongodb in containers. Is it possible to revisit the ticket and schedule it if it's possible to get fixed now?

Comment by Ramon Fernandez Marina [ 29/Jul/15 ]

daldoyle, having a way to work with cgroups is definitely desirable. I'd recommend you post on the mongodb-dev group with an outline of the approach you'd like to take, so other developers can comment on it, ask questions, etc.

I'm told that the libcgroup license may be incompatible with MongoDB's, so this issue should be sorted out as well.

Regards,
Ramón.

Comment by Dan Doyle [ 23/Jul/15 ]

If a patch were provided that made this work only with cgroups in linux, would that be accepted and put into mainstream? The lack of resource isolation is starting to make running mongo in production problematic for us when things like OOM start to kill them off if other things on the box start grabbing memory, so we are trying to figure out some options.

I realize this might not be perfect and wouldn't solve the problems with ulimits, but with the growing popularity of cgroups and things like Docker that use them, we're hoping that this might be attractive enough to warrant including.

Comment by Daniel Pasette (Inactive) [ 12/Jan/15 ]

It's not currently possible to detect the container memory in a foolproof way it seems, but will leave open for when and if it does become possible. See: http://fabiokung.com/2014/03/13/memory-inside-linux-containers/

Generated at Thu Feb 08 03:41:31 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.