[SERVER-16571] Use Actual Memory Constraint vs. Total System Memory When They Differ Created: 17/Dec/14 Updated: 08/Jan/24 Resolved: 08/Mar/19 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Storage |
| Affects Version/s: | 2.8.0-rc2 |
| Fix Version/s: | 3.6.13, 4.0.9, 4.1.9 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Asya Kamsky | Assignee: | Matt Lord (Inactive) |
| Resolution: | Done | Votes: | 9 |
| Labels: | containers, docker, kubernetes | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||||||||||||||||||||||||||||||
| Backwards Compatibility: | Minor Change | ||||||||||||||||||||||||||||||||||||||||||||
| Operating System: | Linux | ||||||||||||||||||||||||||||||||||||||||||||
| Backport Requested: |
v4.0, v3.6
|
||||||||||||||||||||||||||||||||||||||||||||
| Participants: | |||||||||||||||||||||||||||||||||||||||||||||
| Case: | (copied to CRM) | ||||||||||||||||||||||||||||||||||||||||||||
| Description |
|
The memory size is taken into account within various components today:
In all of these cases we should take the actual memory constraint that we're operating under into account rather than the total system memory – for example when running MongoDB within containers. |
| Comments |
| Comment by Billy Donahue [ 13/Feb/20 ] | ||||
|
I believe the way this was done, reading from "/sys/fs/cgroup/memory/memory.limit_in_bytes" is good enough for Docker but not necessarily for other container systems or daemonization/isolation/jailing scripts that might boot mongod. The aforementioned limit file at the top of the cgroup/memory hierarchy is relying on a Docker-specific VFS namespacing trick. I think we're more generally supposed to read /proc/self/cgroup to figure out which cgroup your memory is controlled by. However, this (I think) correct algorithm doesn't work in Docker because they replace the deep /sys/fs/cgroup hierarchy with a flat one containing only the container's limits. But Docker does not replace /proc/self/cgroup, so you're left with an inconsistent isolation. Docker isn't the only container system, so it feels incorrect or at least incomplete to rely on its particular choices. | ||||
| Comment by Githook User [ 05/Apr/19 ] | ||||
|
Author: {'email': 'mattalord@gmail.com', 'name': 'Matt Lord', 'username': 'mattlord'}Message: (cherry picked from commit d535bce1bb7df20158fad965142d6b802ea95c60) | ||||
| Comment by Githook User [ 03/Apr/19 ] | ||||
|
Author: {'name': 'Matt Lord', 'username': 'mattlord', 'email': 'mattalord@gmail.com'}Message: (cherry picked from commit d535bce1bb7df20158fad965142d6b802ea95c60) | ||||
| Comment by Githook User [ 08/Mar/19 ] | ||||
|
Author: {'name': 'Matt Lord', 'email': 'mattalord@gmail.com', 'username': 'mattlord'}Message: | ||||
| Comment by Githook User [ 06/Mar/19 ] | ||||
|
Author: {'name': 'Matt Lord', 'username': 'mattlord', 'email': 'mattalord@gmail.com'}Message: Revert " This reverts commit a39875e4e060d42a7ce70ec82b07af2850d3bab7. | ||||
| Comment by Githook User [ 06/Mar/19 ] | ||||
|
Author: {'name': 'Matt Lord', 'username': 'mattlord', 'email': 'mattalord@gmail.com'}Message: Revert " This reverts commit 602bfb9c52b2274d55492f73eeac8513d9048d10. | ||||
| Comment by Githook User [ 05/Mar/19 ] | ||||
|
Author: {'name': 'Eric Milkie', 'email': 'milkie@10gen.com', 'username': 'milkie'}Message: | ||||
| Comment by Githook User [ 05/Mar/19 ] | ||||
|
Author: {'name': 'Matt Lord', 'email': 'mattalord@gmail.com', 'username': 'mattlord'}Message: | ||||
| Comment by James Broadhead (Inactive) [ 01/Mar/19 ] | ||||
|
acm / matt.lord just noticed that this one ended up on the backlog – is there any chance of getting it for MDB 4.2? | ||||
| Comment by Matt Lord (Inactive) [ 10/Jul/18 ] | ||||
|
Hi All, WiredTiger uses the output of ProcessInfo.getMemSizeMB() when determining the cache size, which in turn calls SystemInfo.memSize. On linux, we use LinuxSysHelper::getSystemMemorySize() which is implemented using /proc/meminfo. If we focus on Linux for now (Windows is the only other OS with native kernel [docker] containers today), then the correct way would seem to be:
Does anyone disagree or have any other comments? This particular implementation would be relatively easy to implement–my concerns here revolve more around testing this on all combinations of:
Am I missing anything? Thanks! Matt | ||||
| Comment by Anton Lisovenko (Inactive) [ 02/Jul/18 ] | ||||
|
This ticket is now more important when Mongodb Kubernetes Operator is released to beta and more customers will try Mongodb in containers. Is it possible to revisit the ticket and schedule it if it's possible to get fixed now? | ||||
| Comment by Ramon Fernandez Marina [ 29/Jul/15 ] | ||||
|
daldoyle, having a way to work with cgroups is definitely desirable. I'd recommend you post on the mongodb-dev group with an outline of the approach you'd like to take, so other developers can comment on it, ask questions, etc. I'm told that the libcgroup license may be incompatible with MongoDB's, so this issue should be sorted out as well. Regards, | ||||
| Comment by Dan Doyle [ 23/Jul/15 ] | ||||
|
If a patch were provided that made this work only with cgroups in linux, would that be accepted and put into mainstream? The lack of resource isolation is starting to make running mongo in production problematic for us when things like OOM start to kill them off if other things on the box start grabbing memory, so we are trying to figure out some options. I realize this might not be perfect and wouldn't solve the problems with ulimits, but with the growing popularity of cgroups and things like Docker that use them, we're hoping that this might be attractive enough to warrant including. | ||||
| Comment by Daniel Pasette (Inactive) [ 12/Jan/15 ] | ||||
|
It's not currently possible to detect the container memory in a foolproof way it seems, but will leave open for when and if it does become possible. See: http://fabiokung.com/2014/03/13/memory-inside-linux-containers/ |