[SERVER-46070] getMemorySizeLimit: incorrect constrained memory limits Created: 10/Feb/20  Updated: 29/Aug/23

Status: Backlog
Project: Core Server
Component/s: Internal Code
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Billy Donahue Assignee: Backlog - Service Architecture
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
Assigned Teams:
Service Arch
Operating System: ALL
Sprint: Dev Tools 2020-02-24, Service Arch 2022-05-30
Participants:

 Description   

We read "/sys/fs/cgroup/memory/memory.limit_in_bytes".

Someone can check me on this, but I believe this is incorrect. This file is the resources of the entire system, and not the controller managing the mongod process.

I think you have to read "/proc/self/cgroup" to find out which group our "memory" controller is bound to, and read the stats files under the

/sys/fs/cgroup/memory/{group}/

subdir instead.

e.g.:

$ cat /proc/self/cgroup
12:perf_event:/
11:devices:/user.slice
10:blkio:/user.slice
9:memory:/user.slice    // <== memory is bound to group /user.slice
8:rdma:/
7:pids:/user.slice/user-1000.slice/session-796.scope
6:cpu,cpuacct:/user.slice
5:cpuset:/
4:net_cls,net_prio:/
3:freezer:/
2:hugetlb:/
1:name=systemd:/user.slice/user-1000.slice/session-796.scope
0::/user.slice/user-1000.slice/session-796.scope

src/mongo/util/processinfo_linux.cpp:

    static unsigned long long getMemorySizeLimit() {                                                 
        unsigned long long systemMemBytes = getSystemMemorySize();                                   
        unsigned long long cgroupMemBytes = 0;                                                       
        std::string cgmemlimit = readLineFromFile("/sys/fs/cgroup/memory/memory.limit_in_bytes");    
        if (!cgmemlimit.empty() && mongo::NumberParser{}(cgmemlimit, &cgroupMemBytes).isOK()) {      
            return std::min(systemMemBytes, cgroupMemBytes);                                         
        }                                                                                                                                                        
        return systemMemBytes;                                                                                                                                   
    }                                                                                                                                                            



 Comments   
Comment by Lauren Lewis (Inactive) [ 21/Dec/21 ]

We haven’t heard back from you in at least 1 year, so I'm going to close this ticket. If this is still an issue for you, please provide additional information and we will reopen the ticket.

Comment by Billy Donahue [ 13/Feb/20 ]

Reconsidered, I think it might still be a bug.

I think the only reason it looks like it's not a bug is because Docker replaces the root of the /sys/fs/cgroup/memory/ FS hierarchy with what would normally be at the leaf cgroup directory, in a namespacing trick. If Docker and its namespace trick wasn't involved, we could still be constrained to a cgroup limit and not read it correctly.

(It does NOT however, replace /proc/self/cgroup, so you can tell you're in Docker and it's actually an inconsistent state to be in, but that's not the point here).

Comment by Billy Donahue [ 11/Feb/20 ]

not a bug

Comment by Billy Donahue [ 11/Feb/20 ]

My mistake. I just figured out how to run with a docker memory limit. I guess within docker, these /sys/fs/cgroup/memory root files are the inside view and not system-wide stats. Sorry for the confusion. I could not tell this from the cgroups man page, but docker must be using more isolation mechanisms than just cgroups to present an environment that is more comprehensive.

So:

$ printf '%#x\n' $(docker run -m 16mb busybox cat /sys/fs/cgroup/memory/memory.limit_in_bytes)
0x1000000

$ printf '%#x\n' $(docker run -m 32mb busybox cat /sys/fs/cgroup/memory/memory.limit_in_bytes)
0x2000000

$ printf '%#x\n' $(docker run busybox cat /sys/fs/cgroup/memory/memory.limit_in_bytes)
0x7ffffffffffff000

Comment by Billy Donahue [ 10/Feb/20 ]

We should look over all our cgroup resource code and make sure we're generating correct stat and limit reports.
It seems very easy to get this stuff wrong.

Generated at Thu Feb 08 05:10:25 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.