Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-84145

Mongodb 5.0.20 process is getting crashed due to higher OS cache memory utilization.

    • Type: Icon: Bug Bug
    • Resolution: Done
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: None
    • Component/s: None
    • Labels:
      None
    • Environment:
      Ubuntu 20.04 and Mongodb 5.0.20
    • Server Triage

      Mongodb process is getting crashed after upgrading to 5.0.20 from 4.4.18. OS cache memory utlization is going higher with in few hours and mongdb process is getting crashed. 

      Setup details:
      Replica sets are having 7 members, 1 primary, 3 secondaries and 3 arbiters. Priamry and Secondary members are distributed across different sites like A,B,C and D. Arbiter members are distributed across A,B,C,D sites. Each site will be having DB VMs and each VM contains mongodb containers. One VM having 1 primary , 3 secondaries and 2 arbiters members. 

      Scenario:
      1. Bring down one Site , say Site B and start the traffic. Send the traffic such a way that linearly increment up to X limit. After reaching the limit X, send same traffic upto 48 hours. 

      Crash Info:

      {"t":\{"$date":"2023-11-28T15:30:56.755+00:00"}

      ,"s":"E",  "c":"STORAGE",  "id":22435,   "ctx":"Checkpointer","msg":"WiredTiger error","attr":{"error":22,"message":"[1701185456:755161][12787:0x7fd437302700], file:collection-45-1301129529321625809.wt, WT_SESSION.checkpoint: __wt_block_checkpoint_resolve, 928: collection-45-1301129529321625809.wt: the checkpoint failed, the system must restart: Invalid argument"}}

      {"t":\{"$date":"2023-11-28T15:30:56.755+00:00"}

      ,"s":"E",  "c":"STORAGE",  "id":22435,   "ctx":"Checkpointer","msg":"WiredTiger error","attr":{"error":-31804,"message":"[1701185456:755177][12787:0x7fd437302700], file:collection-45-1301129529321625809.wt, WT_SESSION.checkpoint: __wt_block_checkpoint_resolve, 928: the process must exit and restart: WT_PANIC: WiredTiger library panic"}}

      {"t":\{"$date":"2023-11-28T15:30:56.755+00:00"}

      ,"s":"F",  "c":"-",        "id":23089,   "ctx":"Checkpointer","msg":"Fatal assertion","attr":{"msgid":50853,"file":"src/mongo/db/storage/wiredtiger/wiredtiger_util.cpp","line":574}}

      {"t":\{"$date":"2023-11-28T15:30:56.755+00:00"}

      ,"s":"F",  "c":"-",        "id":23090,   "ctx":"Checkpointer","msg":"\n\n***aborting after fassert() failure\n\n"}

      {"t":\{"$date":"2023-11-28T15:30:56.755+00:00"}

      ,"s":"F",  "c":"CONTROL",  "id":6384300, "ctx":"Checkpointer","msg":"Writing fatal message","attr":{"message":"Got signal: 6 (Aborted).\n"}}

      Please note that in Mongodb 4.4.18, we have performed the above scenario and not seen the crash and OS cache memory utilization was constant. 
      In mongodb 4.4.18 this flag --enableMajorityReadConcern false was during the startup. 
      In Mongodb 5.0.20, this flag is --enableMajorityReadConcern false is removed and disabled flowControl feature. 
      In 5.0.20, after traffic is max limit , the cache memory is getting incrementing exponentially and mongodb processes are getting crashed across cluster. Only primary members are getting crashed. 
      If we bring up Site B , then mongodb not crashed and running fine more than 48 hours. 

            Assignee:
            chris.kelly@mongodb.com Chris Kelly
            Reporter:
            sreedhar.nalgonda@gmail.com Sreedhar N
            Votes:
            51 Vote for this issue
            Watchers:
            8 Start watching this issue

              Created:
              Updated:
              Resolved: