histogramCE: Memory leak during histogram construction, server consumes more than storageEngineCacheSizeGB

    • Type: Bug
    • Resolution: Unresolved
    • Priority: Major - P3
    • None
    • Affects Version/s: None
    • Component/s: None
    • None
    • Query Optimization
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      The query_cbr_histogram_aggregation_passthrough passthrough is OOM-ing in evergreen. The reason is apparently that the passthrough constructs a bunch of histograms and the pipeline that is used in histogram creation is leaking memory.

      To reproduce:

      1. Run:

      buildscripts/resmoke.py run   --suites=query_cbr_histogram_aggregation_passthrough  "--mongodSetParameters={heapProfilingEnabled: true, heapProfilingSampleIntervalBytes: 10000}" --storageEngineCacheSizeGB=1 > out-with-histogramCE.log 2>&1
      

      and wait for it to finish, or run it for at least 10 minutes.

      2. Download the t2 tool (https://github.com/10gen/t2/releases) , and run it. You will get an empty window.

      3. Drag the diagnostics file from /data/db/job0/resmoke/diagnostic.data and drop it onto the t2 window. The tool will load the diagnostics and display it.

      4. Scroll down to the "heap profiler" section and observe that the top two most-consuming stack traces show a gradual memory increase. Each stack will have a number associated with it, different on each execution, such as stack230.

      5. In the log produced by resmoke.py that we captured above, we can fish for the definition of stack 230:

      grep 'stackNum":230,'  out-with-histogramCE.log 
      

      6. Using a json beatifier such as https://jsonformatter.org/, obtain the readable stack trace:

      {
        "0": "tcmalloc::tcmalloc_internal::SampleifyAllocation()",
        "1": "tcmalloc::tcmalloc_internal::alloc_small_sampled_hooks_or_perthread<>()",
        "2": "calloc",
        "3": "__wt_calloc",
        "4": "__wt_stat_dsrc_init",
        "5": "__wt_conn_dhandle_open",
        "6": "__wt_session_get_dhandle",
        "7": "__wt_schema_create",
        "8": "__create_colgroup",
        "9": "__create_table",
        "10": "__wt_schema_create",
        "11": "__wt_session_create",
        "12": "__session_create",
        "13": "mongo::WiredTigerSession::create<>()",
        "14": "mongo::WiredTigerIndex::create()",
        "15": "mongo::WiredTigerKVEngine::createSortedDataInterface()",
        "16": "mongo::durable_catalog::createIndex()",
        "17": "mongo::CollectionImpl::prepareForIndexBuild()",
        "18": "mongo::IndexBuildBlock::init()",
        "19": "mongo::IndexBuildBlock::buildEmptyIndex()",
        "20": "mongo::DatabaseImpl::_createCollection()",
        "21": "mongo::DatabaseImpl::userCreateNS()",
        "22": "mongo::write_ops_exec::(anonymous namespace)::makeCollection()::$_0::operator()()",
        "23": "mongo::write_ops_exec::(anonymous namespace)::makeCollection()",
        "24": "mongo::write_ops_exec::performUpdates()",
        "25": "mongo::NonShardServerProcessInterface::update()",
        "26": "std::_Function_handler<>::_M_invoke()",
        "27": "mongo::MergeProcessor::flush()",
        "28": "mongo::DocumentSourceMerge::flush()",
        "29": "mongo::DocumentSourceWriter<>::doGetNext()",
        "30": "mongo::exec::agg::Pipeline::getNext()",
        "31": "mongo::PlanExecutorPipeline::_tryGetNext()",
        "32": "mongo::PlanExecutorPipeline::_getNext()",
        "33": "mongo::PlanExecutorPipeline::getNext()",
        "34": "mongo::(anonymous namespace)::executeUntilFirstBatch()",
        "35": "mongo::(anonymous namespace)::_runAggregate()",
        "36": "mongo::runAggregate()",
        "37": "mongo::(anonymous namespace)::PipelineCommand::Invocation::run()",
        "38": "mongo::CommandHelpers::runCommandInvocation()",
        "39": "mongo::(anonymous namespace)::RunCommandAndWaitForWriteConcern::_runImpl()",
        "40": "mongo::(anonymous namespace)::RunCommandImpl::run()",
        "41": "mongo::(anonymous namespace)::ExecCommandDatabase::_commandExec()",
        "42": "mongo::(anonymous namespace)::executeCommand()",
        "43": "mongo::ServiceEntryPointShardRole::handleRequest()",
        "44": "mongo::(anonymous namespace)::loopbackBuildResponse()",
        "45": "mongo::DBDirectClient::_call()",
        "46": "mongo::DBClientBase::runCommandWithTarget()",
        "47": "mongo::DBClientBase::runCommandWithTarget()",
        "48": "mongo::DBClientBase::runCommand()",
        "49": "mongo::(anonymous namespace)::CmdAnalyze::Invocation::typedRun()",
        "50": "mongo::CommandHelpers::runCommandInvocation()",
        "51": "mongo::(anonymous namespace)::RunCommandAndWaitForWriteConcern::_runImpl()",
        "52": "mongo::(anonymous namespace)::RunCommandImpl::run()",
        "53": "mongo::(anonymous namespace)::ExecCommandDatabase::_commandExec()",
        "54": "mongo::(anonymous namespace)::executeCommand()",
        "55": "mongo::ServiceEntryPointShardRole::handleRequest()",
        "56": "mongo::transport::SessionWorkflow::Impl::_dispatchWork()",
        "57": "mongo::future_details::FutureImpl<>::then<>()::{lambda()#1}::operator()()",
        "58": "mongo::transport::SessionWorkflow::Impl::_doOneIteration()",
        "59": "mongo::unique_function<>::makeImpl<>()::SpecificImpl::call()",
        "60": "mongo::ClientStrand::run<>()",
        "61": "mongo::unique_function<>::makeImpl<>()::SpecificImpl::call()",
        "62": "mongo::transport::service_executor_synchronous_detail::ServiceExecutorSyncImpl::SharedState::WorkerThreadInfo::run()",
        "63": "mongo::unique_function<>::makeImpl<>()::SpecificImpl::call()"
      }
      

      or, in other words, the output of the $merge operator used during histogram construction appears to be retained in memory.

      7. Disable the histogram construction , while keeping everything else in the passthrough the same. Observe that memory usage is flat both as reported by top and as reported by t2

      diff --git a/buildscripts/resmokeconfig/suites/query_cbr_histogram_aggregation_passthrough.yml b/buildscripts/resmokeconfig/suites/query_cbr_histogram_aggregation_passthrough.yml
      index bbed244c1ad..ed9db4ee3dc 100644
      --- a/buildscripts/resmokeconfig/suites/query_cbr_histogram_aggregation_passthrough.yml
      +++ b/buildscripts/resmokeconfig/suites/query_cbr_histogram_aggregation_passthrough.yml
      @@ -42,11 +42,11 @@ executor:
             objcheck: ""
             eval: |
               await import("jstests/libs/override_methods/detect_spawning_own_mongod.js");
      -        await import("jstests/libs/override_methods/implicit_histograms.js");
      +#        await import("jstests/libs/override_methods/implicit_histograms.js");
         fixture:
           class: MongoDFixture
           mongod_options:
             set_parameters:
               enableTestCommands: 1
               # Automatic mode will fallback when a predicate can't be estimated via histogram
      -        planRankerMode: "automaticCE"
      +#        planRankerMode: "automaticCE"
      

        1. screenshot-1.png
          22 kB
          Philip Stoev

            Assignee:
            Matt Olma
            Reporter:
            Philip Stoev
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

              Created:
              Updated: