Memory leak on secondary associated to open cursor count

XMLWordPrintableJSON

    • Type: Bug
    • Resolution: Gone away
    • Priority: Major - P3
    • None
    • Affects Version/s: 8.0.17
    • Component/s: None
    • None
    • ALL
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      Setup with a sharded replicaset with 24 shards spread accross 9 VMs (sh.status attached) with mongo-server 8.0.17:

      be09-be rxxxx:/# dpkg -l | grep mongodb-org
      ii  mongodb-org-mongos             8.0.17                         amd64        MongoDB sharded cluster query router
      ii  mongodb-org-server             8.0.17 

      There is a repeated pattern where a secondary that is otherwise consuming stable levels of memory starts consuming memory linearly up until it exhaust the VM available memory (see screenshot plotting RSS of an affected mongo process).

      The time this increase starts aligns with daily VM snapshots done on the underlying infrastructure.

       

      Note: This is only occurring on secondaries and is random (i.e. does not happen every day nor to the same secondaries)

       

      We have excluded application as the RSS is much higher than WT cache (db.serverStatus output attached)

      BE09   
      
      PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
      2603361 mongodb   20   0   12.4g   8.5g  76672 S  27.7  13.7    4d+23h /usr/bin/mongod --config /etc/mongod-cpe_shard19.conf
      2606513 mongodb   20   0 9968136   3.9g  75136 S  32.7   6.2     5d+2h /usr/bin/mongod --config /etc/mongod-cpe_shard24.conf
      
      db.serverStatus().wiredTiger.cache["bytes currently in the cache"] = 1020866997

      The correlation we found to this sudden memory increase is an abnormal increase in db.serverStatus().wiredTiger.cursor['open cursor count'].

      This value is abnormally high for the secondary when the memory is in that linear increase phase and continues to increase together with used memory. 

      This might not be correlated but looking at this mongo process' logs (time of the logs is UTC+8 vs chart in UTC) it seems that around the time the memory starts there were changes in the replicaset primary (likely caused by the VM snapshots).

      mongod-cpe_shard19.log.1:{"t":{"$date":"2026-01-29T00:09:27.357+08:00"},"s":"I",  "c":"NETWORK",  "id":6006301, "svc":"-", "ctx":"ReplicaSetMonitor-TaskExecutor","msg":"Replica set primary server change detected","attr":{"replicaSet":"lrs19","topologyType":"ReplicaSetWithPrimary","primary":"be09-internal:37039","durationMillis":167232791}}
      mongod-cpe_shard19.log.1:{"t":{"$date":"2026-01-29T00:12:55.429+08:00"},"s":"I",  "c":"NETWORK",  "id":6006301, "svc":"-", "ctx":"ReplicaSetMonitor-TaskExecutor","msg":"Replica set primary server change detected","attr":{"replicaSet":"lrs19","topologyType":"ReplicaSetWithPrimary","primary":"be08-internal:37039","durationMillis":208072}}
      mongod-cpe_shard19.log.1:{"t":{"$date":"2026-01-29T00:13:17.200+08:00"},"s":"I",  "c":"NETWORK",  "id":6006301, "svc":"-", "ctx":"ReplicaSetMonitor-TaskExecutor","msg":"Replica set primary server change detected","attr":{"replicaSet":"lrs19","topologyType":"ReplicaSetNoPrimary","primary":"Unknown","durationMillis":21771}}
      mongod-cpe_shard19.log.1:{"t":{"$date":"2026-01-29T00:13:18.448+08:00"},"s":"I",  "c":"NETWORK",  "id":6006301, "svc":"-", "ctx":"ReplicaSetMonitor-TaskExecutor","msg":"Replica set primary server change detected","attr":{"replicaSet":"lrs19","topologyType":"ReplicaSetWithPrimary","primary":"be07-internal:37039","durationMillis":1247}} 

      Attached:

      • sh.status;
      • top on all servers sorted by memory;
      • db.serverStatus and db.currentOp(true) from the affected mongo node when in memory increase phase
      • grafana chart showing memory process RSS memory increase;
      • pmap output of the affected process (pid 2603361)
      • mongod-cpe_shard19.conf

       

        1. sh_status.txt
          27 kB
        2. server_status_be09_lrs19_29012026.txt
          135 kB
        3. mongod-cpe_shard19.conf
          0.7 kB
        4. mem_29012026.txt
          10 kB
        5. lrs39_port37039.png
          lrs39_port37039.png
          163 kB
        6. currentops_be09_lrs19_29012026.txt
          96 kB
        7. baseOS_pid_memstats.txt
          63 kB

            Assignee:
            Unassigned
            Reporter:
            Diogo Leite
            Votes:
            1 Vote for this issue
            Watchers:
            6 Start watching this issue

              Created:
              Updated:
              Resolved: