Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-91466

Memory Usage in noPassthrough Suite increased significantly after upgrading to Windows Server 2022

    • Storage Execution
    • ALL

      While updating tests to run on Windows Server 2022 for Mongo 8.0 platform support, several issues were discovered in the noPassthrough suite:

      https://spruce.mongodb.com/task/mongodb_mongo_master_enterprise_windows_all_feature_flags_required_noPassthrough_1_windows_enterprise_patch_d60231163ae986719f5b012c47fb065331fabdab_6669f1b564e1ae0007c8514b_24_06_12_19_07_21?execution=2&sortBy=STATUS&sortDir=ASC

      https://spruce.mongodb.com/task/mongodb_mongo_master_enterprise_windows_all_feature_flags_required_noPassthrough_1_windows_enterprise_patch_d60231163ae986719f5b012c47fb065331fabdab_6669f1b564e1ae0007c8514b_24_06_12_19_07_21/tests?execution=1&sortBy=STATUS&sortDir=ASC

      https://spruce.mongodb.com/task/mongodb_mongo_master_enterprise_windows_all_feature_flags_required_noPassthrough_1_windows_enterprise_patch_d60231163ae986719f5b012c47fb065331fabdab_6669f1b564e1ae0007c8514b_24_06_12_19_07_21/tests?execution=0&sortBy=STATUS&sortDir=ASC

      The commit this branch is based off of does not have this issue and the only changes are switching the evergreen host distro from "windows-vsCurrent-large" (Windows Server 2019) to "windows-2022-large" (Windows Server 2022)

      The version upgrade will use a workaround to decrease resmoke concurrency to avoid exhausting the system's memory, but it's still unclear why the upgrade caused memory usage to increase.

      max.hirschhorn@mongodb.com's analysis:

      The Evergreen timeout in execution #3 appears to be caused by slow resmoke logging which led to the primary of the replica set stepping down and hitting fassert(7152000) due to being unable to step down quickly enough since the mongod was fsyncLocked.

      [js_test:sharded_pit_backup_restore_simple] d20846| 2024-06-13T01:49:41.751+01:00 I REPL 21809 [S] [ReplCoord-0] "Can't see a majority of the set, relinquishing primary"
      ...
      [js_test:sharded_pit_backup_restore_simple] d20846| 2024-06-13T01:50:11.832+01:00 F REPL 5675600 [S] [ReplCoord-0] "Time out exceeded waiting for RSTL, stepUp/stepDown is not possible thus calling abort() to allow cluster to progress","attr":{"lockRep":{"ReplicationStateTransition":{"acquireCount":

      {"W":1}

      ,"acquireWaitCount":

      {"W":1}

      ,"timeAcquiringMicros":{"W":30079690}}}}
      [js_test:sharded_pit_backup_restore_simple] d20846| 2024-06-13T01:50:11.832+01:00 F ASSERT 23089 [S] [ReplCoord-0] "Fatal assertion","attr":

      {"msgid":7152000,"file":"src\\mongo\\db\\repl\\replication_coordinator_impl.cpp","line":2964}

      https://parsley.mongodb.com/test/mongodb_mongo_master_enterprise_windows_all_feature_flags_required_noPassthrough_1_windows_enterprise_patch_d60231163ae986719f5b012c47fb065331fabdab_6669f1b564e1ae0007c8514b_24_06_12_19_07_21/2/af21249a209a8a57122acbfa50b9bb32?bookmarks=0,118966,137712,239798,242772&filters=10020846%255C%257C.%2A%255C%255BReplCoord-0%255C%255D&shareLine=0

      The Evergreen timeout in execution #2 appears to be caused by out_timeseries_cleans_up_bucket_collections.js though I couldn't say why. The logs are incomplete for the other tests because the flush thread had a MemoryError exception. The memory usage hits ~100% at 22:36 UTC but neither the system logs nor system_resource_info.json can identify what is consuming the excessive memory. Notably, the sum of the memory among the processes listed only totals to 10-13GB of the 33GB available.

      The Evergreen failure in execution #1 has 7 of the 8 tests failing with "out of memory".

            Assignee:
            Unassigned Unassigned
            Reporter:
            zack.winter@mongodb.com Zack Winter
            Votes:
            0 Vote for this issue
            Watchers:
            8 Start watching this issue

              Created:
              Updated:
              Resolved: