[SERVER-45137] Increasing memory allocation in Top::record with high rate of collection creates and drops Created: 13/Dec/19  Updated: 29/Oct/23  Resolved: 31/Jan/20

Status: Closed
Project: Core Server
Component/s: Catalog
Affects Version/s: 4.2.1
Fix Version/s: 4.2.4, 4.3.4

Type: Bug Priority: Major - P3
Reporter: Bruce Lucas (Inactive) Assignee: Gregory Wlodarek
Resolution: Fixed Votes: 0
Labels: KP42
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: PNG File sec.png    
Issue Links:
Backports
Depends
Related
related to SERVER-46196 Failed collection creation leaves an ... Closed
related to SERVER-44230 Move top_drop.js from noPassthrough t... Closed
is related to SERVER-45855 The 'Top' command managing its own da... Backlog
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Requested:
v4.2
Sprint: Execution Team 2020-01-27, Execution Team 2020-02-10
Participants:
Case:

 Description   

Heap profiler shows memory usage increasing over time on the secondary. The profile is characteristic of a buffer that grows by doubling each time.

Allocated here:

heapProfile stack155: { 0: "tcmalloc::allocate_full_cpp_throw_oom", 1: "absl::container_internal::raw_hash_set<absl::container_internal::FlatHashMapPolicy<std::__cxx11::basic_string<char, std::char_traits<char>, std::alloc...", 2: "mongo::Top::record", 3: "mongo::AutoStatsTracker::~AutoStatsTracker", 4: "0x5623f753bc58", 5: "0x5623f753caae", 6: "mongo::createCollectionForApplyOps", 7: "0x5623f74c7cc6", 8: "0x5623f74c7ff2", 9: "mongo::repl::applyCommand_inlock", 10: "0x5623f6fa06a4", 11: "mongo::repl::SyncTail::syncApply", 12: "mongo::repl::multiSyncApply", 13: "std::_Function_handler<mongo::Status ", 14: "0x5623f6f9fb5d", 15: "mongo::ThreadPool::_doOneTask", 16: "mongo::ThreadPool::_consumeTasks", 17: "mongo::ThreadPool::_workerThreadBody", 18: "0x5623f87e0a1f", 19: "0x7fa6cc9d26db", 20: "clone" }

Reproduced by a high rate of collection creates and drops using mapReduce directed to the primary:

function repro() {
 
    db.c.insert({})
 
    nthreads = 10
    threads = []
    for (var t=0; t<nthreads; t++) {
        thread = new ScopedThread(function(t) {
            count = 1000000
            for (var i=0; i<count; i++) {
                db.c.mapReduce(
                    function () {},
                    function () {}, 
                    {out: {merge: "d"}}
                )
            }
        }, t)
        threads.push(thread)
        thread.start()
    }
    for (var t = 0; t < nthreads; t++)
        threads[t].join()
 
}

In this repro the mapReduce commands are done on the primary, but the memory increase only seems to happen on the secondary.

I don't know whether it is specific to collections created and dropped by mapReduce, and I don't know if it's a 4.2 regression (but I note there appear to have been some considerable code changes in this area).



 Comments   
Comment by Githook User [ 11/Feb/20 ]

Author:

{'username': 'GWlodarek', 'name': 'Gregory Wlodarek', 'email': 'gregory.wlodarek@mongodb.com'}

Message: SERVER-45137 Remove namespaces from Top when collections are renamed

create mode 100644 jstests/noPassthroughWithMongod/top_rename.js

(cherry picked from commit c2d35dd6214978959a9cfc5dcb813d62ae8981ef)

create mode 100644 jstests/noPassthroughWithMongod/top_rename.js
Branch: v4.2
https://github.com/mongodb/mongo/commit/e94778714638cb20f93ae94d4e16c38ed2d987bc

Comment by Githook User [ 31/Jan/20 ]

Author:

{'username': 'GWlodarek', 'name': 'Gregory Wlodarek', 'email': 'gregory.wlodarek@mongodb.com'}

Message: SERVER-45137 Remove namespaces from Top when collections are renamed

create mode 100644 jstests/noPassthroughWithMongod/top_rename.js
Branch: master
https://github.com/mongodb/mongo/commit/c2d35dd6214978959a9cfc5dcb813d62ae8981ef

Comment by Gregory Wlodarek [ 27/Jan/20 ]

After taking another look at this, I've figured out what was causing the _usage map in Top to keep growing in size. 

Recently, we've refactored the rename collection helpers and replaced OldClientContext with AutoStatsTracker. In the rename code, we create a top-level AutoStatsTracker and after the renaming logic was done, we'd remove the source collection namespace from the _usage map in Top. However, the destructor of our top-level AutoStatsTracker would run afterwards, which calls Top::record() and records the source collection namespace into the _usage map again.

Comment by Gregory Wlodarek [ 21/Jan/20 ]

After taking an initial look at this, I believe we never erase from the _usage map in Top, which causes the map's memory footprint to keep growing until the server OOMs.

We perform inserts into the map on a hashed namespace string here: https://github.com/mongodb/mongo/blob/b7846ff4dceec36e344b0f87c48783dffa2c6a32/src/mongo/db/stats/top.cpp#L85-L95

But when we perform erase operations on the map, we use an unhashed namespace string here: https://github.com/mongodb/mongo/blob/b7846ff4dceec36e344b0f87c48783dffa2c6a32/src/mongo/db/stats/top.cpp#L145

 

Edit: this assumption turned out to be false because the StringMap will automatically hash strings if they weren't already hashed.

Comment by Eric Milkie [ 04/Jan/20 ]

We scheduled some investigation work for this starting Jan 13 (next week).

Comment by Kuan Huang [ 03/Jan/20 ]

Hi, I'd like to follow up on this ticket. In order to keep our operation going, we are doing lots of extra work to bypass this issue. Any eta on when this issue can be resolved? Thank you and we'd really appreciate your response.

Generated at Thu Feb 08 05:07:59 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.