[CXX-2306] mongocxx::cursor won't free cache memory Created: 15/Jul/21  Updated: 27/Oct/23  Resolved: 26/Aug/21

Status: Closed
Project: C++ Driver
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Critical - P2
Reporter: norbert NNN Assignee: Jesse Williamson (Inactive)
Resolution: Works as Designed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   

Problem description:

in our application, the incoming requests are queued and served by a thread from a thread pool.

In this case, if we iterate over a result set with mongocxx::cursor, it seems, that cursor won't free up allocated memory and the process memory consumption will increase even that the cursor created locally and will be destroyed when it goes out of scope.

(Note: we have developed a counter example, where every requests spawn a new thread and served on it (Please note: new thread, not the same thread with reuse), we couldn't observe such a huge memory consumption in this case)

The sample code that performs a find and iteration:

vector<Result> Query() {
  initializePoolOneTime();
 
  auto client =_pool->acquire();
  mongocxx::database db = (*client)["Database"];
  mongocxx::collection collection = db["keyword"];
  mongocxx::cursor cursor = collection.find(document{} << "XXX" << finalize);
 
  vector<Result> result;
  for(auto&& m : cursor) {
      Result c;
      c.chunk = getIntValue(m["YYY"]);
      chunks.push_back(c);
  }
}

Running our application by valgrind shows the following stack trace:

==20124== 524,288 bytes in 1 blocks are still reachable in loss record 1,065 of 1,065
==20124== at 0x4C33D2F: realloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==20124== by 0x77C999: bson_realloc (bson-memory.c:154)
==20124== by 0x77CA04: bson_realloc_ctx (bson-memory.c:194)
==20124== by 0x75E162: _mongoc_buffer_fill (mongoc-buffer.c:254)
==20124== by 0x73324D: mongoc_stream_buffered_readv (mongoc-stream-buffered.c:240)
==20124== by 0x733D09: mongoc_stream_readv (mongoc-stream.c:237)
==20124== by 0x733E45: mongoc_stream_read (mongoc-stream.c:281)
==20124== by 0x75DF38: _mongoc_buffer_append_from_stream (mongoc-buffer.c:200)
==20124== by 0x6FB71C: mongoc_cluster_run_opmsg (mongoc-cluster.c:3468)
==20124== by 0x6F5A7B: mongoc_cluster_run_command_monitored (mongoc-cluster.c:544)
==20124== by 0x70A27A: _mongoc_cursor_run_command (mongoc-cursor.c:1052)
==20124== by 0x70BAA1: _mongoc_cursor_response_refresh (mongoc-cursor.c:1673)
==20124== by 0x70D09C: _prime (mongoc-cursor-find-cmd.c:36)
==20124== by 0x70CEA4: _prime (mongoc-cursor-find.c:61)
==20124== by 0x70A73E: _call_transition (mongoc-cursor.c:1204)
==20124== by 0x70A962: mongoc_cursor_next (mongoc-cursor.c:1280)
==20124== by 0x6528F0: mongocxx::v_noabi::cursor::iterator::operator++() (cursor.cpp:45)
==20124== by 0x652C02: mongocxx::v_noabi::cursor::iterator::iterator(mongocxx::v_noabi::cursor*) (cursor.cpp:80)
==20124== by 0x652B2D: mongocxx::v_noabi::cursor::begin() (cursor.cpp:67)
==20124== ...

Valgrind command:

valgrind --tool=memcheck --leak-check=full --show-leak-kinds=all --track-origins=yes --num-callers=200 ./executable

 

Mongocxx version:

  • custom build from c315129c7b70c304d894ea60b7df71d1f3a71acf (somewhere 3.6.3)
  • related cmake flags:
    • -DBUILD_SHARED_AND_STATIC_LIBS:BOOL=ON
    • -DBUILD_SHARED_LIBS_WITH_STATIC_MONGOC:BOOL=ON
    • -DMONGOCXX_ENABLE_SSL:BOOL=ON
    • -DBSONCXX_POLY_USE_MNMLSTC:BOOL=ON

Mongoc version:

  • custom build from 1.17.2
  • related cmake flags:
    • -DENABLE_AUTOMATIC_INIT_AND_CLEANUP:BOOL=OFF


 Comments   
Comment by Jesse Williamson (Inactive) [ 26/Aug/21 ]

The issue appears to have related to memory being handled differently than the user expected, but in accordance with the driver's design.

Comment by Jesse Williamson (Inactive) [ 26/Aug/21 ]

As it sounds like the original issue is resolved, I'm going to go ahead and close this ticket-- please reach out with any further issues.

Comment by Jesse Williamson (Inactive) [ 23/Aug/21 ]

Hi Mihaly,

It looks like you're trying to change the batch_size parameter on the projection, which I think is why it's being ignored. You should be able to
set them on the aggregate object:
http://mongocxx.org/api/mongocxx-3.6.5/classmongocxx_1_1options_1_1aggregate.html

I hope that's helpful! Thanks again for your patience, and good luck with your project.

-Jesse

Comment by Jesse Williamson (Inactive) [ 20/Aug/21 ]

Hi Mihaly,

I am investigating this and gathering more information-- thank you for reporting it! I'll reply
when I've discovered more.

-Jesse

Comment by Mihály Benda [ 18/Aug/21 ]

Hi again,

I ran some tests, and the problem persists with version r3.6.5.

The batch size test was also done with version 3.6.5.

Regarding batch size I ran into an issue I hope you can help me with. I tried something along the lines:

// Create the find options with the projection
mongocxx::options::find opts{};
opts.projection(document{} << "batch_size" << 2 << finalize);
 
mongocxx::stdx::optional<mongocxx::cursor> cursor = collection->find(document{} << "m" << filterString << finalize, opts);

The issue is, while all documents are found, only the first field in a document is returned regardless of batch size (if it is set), the ObjectId. I cannot figure out how to get the rest of the fields. So as per the example a couple of comments above, I can not get m["data"] as that field is not in the cursor view.

 

Best,

Mihaly

Comment by Mihály Benda [ 12/Aug/21 ]

Hi Jesse, sorry for the long wait!

 

I assume by particular version of the C++ library you mean the mongocxx driver version, and in that case we are using the commit with the hash:

c315129c7b70c304d894ea60b7df71d1f3a71acf

 Which was from around 02.2021 if I recall correctly.

Also, I think I tried upgrading to the 3.6.5 version, and still got the same issue, but I will need to verify that again.

 

Regarding the local cursors behavior, we experienced an increase in memory usage when we started iterating through the cursor, but no further increase before going out of scope.

 

"It sounds from your original description that you're also seeing the memory get released after the cursor's lifetime ends, "

Exactly this is the issue, the memory usage levels do not decrease to the point they should after cursor deletion, and the local cursors are still reachable several minutes after the mongo related functions are done.

 

Regarding resource consumption I have some thoughts to share as well: we ran several tests, doing the exact same task (reading from mongo) 4-5 times one after the other, with 4 working threads. All read tasks should fill a global variable shared between the threads, so our assumption was that the memory usage should no increase since the loaded object is the same size every time. However, there was a gradual memory increase after each run. The memory increase would remain even if we delete the loaded object after loading it (so if nothing is stored in the application after the reading from mongo).

We figured out that the number of working threads and the size of the loaded mongo object correlates to the memory increase, and the cursor object was shown as still reachable from valgrind, so we assume that each thread created a local cursor at one of the repetitions of the mongo read task, and that somehow the cache was not freed, the cursor was not deleted correctly, or that we are doing something wrong with mongocxx.

 

Thanks for the batchSize suggestion, I will be able to test it Monday/Tuesday, then I will get back to you with the results!

Thanks for the help!

Best,

Mihaly

 

 

Comment by Jesse Williamson (Inactive) [ 11/Aug/21 ]

Hi mihaly.benda@gmail.com, I have some further thoughts to share.

The behavior of the cursor objects is to locally cache data from MongoDB, and when the cursor object is destroyed the associated resources should also be released-- I've been able to check that this is the case with the C++ and C clients. I'm wondering if you could be seeing higher than expected memory usage before the cursor object itself is deleted? It sounds from your original description that you're also seeing the memory get released after the cursor's lifetime ends, so I'm wondering if maybe there is just more resource consumption than anticipated.

You might try tuning batchSize using mongocxx::options::find::batch_size, especially if you have many active cursors, and see if this lowers the memory consumption to a more acceptable level.

http://mongocxx.org/api/mongocxx-3.6.5/classmongocxx_1_1options_1_1find.html#ad158811d7d71e905dcbe9d2eb527bdd8

I hope that helps!

-Jesse

Comment by Jesse Williamson (Inactive) [ 05/Aug/21 ]

Hi mihaly.benda@gmail.com, thank you for the extra information!

It looks to me like the resources are indeed tied to the cursor (as opposed to any of the cursor iterators), and that
when the cursor itself is destroyed, the correct underlying C library function is called to delete the internal data
structures related to the cursor. It's puzzling that after your getDataFromMongo() function you would still be seeing
reachable data; I'll continue to investigate, but so far I'm not seeing this behavior.

Apologies if you've already answered this above, but is there a particular version of the C++ client library that you're using?

Comment by Mihály Benda [ 02/Aug/21 ]

Hi Jesse, I can provide some additional info on our application.
By direct calls do you mean creating the pool and getting the entry & cursor from the same thread? It is not feasible in our case, but I will check it anyway.

As my colleague mentioned, valgrind is not listing the issue as memory leak, but as still reachable.  The issue comes from the fact that this still reachable cursor is using up a lot of memory (we have large collections that are loaded from mongo). To elaborate a bit, we have an application structure like:

int main() {
 
//Create a static pool
 createPool();
 
//Create a couple of threads with
vector<std::thread> threads(4);
for(std::thread& th: threads) {
 th = std::thread([&](){};
}
 
//These new threads handle everything
//The app is not closed normally, these threads will keep on running
 while(!appClosed){   
//Get a thread randomly, whichever is available   
th.handleCommand();   
}
 
return 0;
}
 
void createPool(){   
//Set up a pool once, all threads acquire an entry from this later on     
 
static mongocxx::uri mongo_uri("mongodb://localhost:27017");     
static auto _pool = mongocxx::pool(mongo_uri);   
}
 
void handleCommand(){     
//Based on the command do whatever, most of the time getting a collection from mongo and doing something with it   
 
vector<struct loadedData> data = getDataFromMongo();       
//Do various stuff with "data" 
...
 
return;
}
 
vector<struct loadedData> getDataFromMongo(){
 
auto client =_pool.acquire(); 
   
  mongocxx::database db = (*client)["database"]; 
   
  mongocxx::collection collection = db["coll"]; 
   
  auto cursor = db["coll"].find({}); 
   
vector<struct loadedData> requestedData;
 
for(bsoncxx::document::view m : cursor) {
struct loadedData d;
d.data = m["data"].get_string().value.to_string();
d.key = m["key"].get_int64();
requestedData.push_back(d);
}
 
return requestedData;
}

 

 

The memory remains allocated to the cursor even after the "handleCommand" function is finished, even though the client, database, collection and cursor are all local variables in the "getDataFromMongo" function.
Unless the threads are deleted the memory remains allocated, and the size is somehow related to the largest loaded mongo collection. If the threads are deleted the allocated memory is freed.

 

Edit:

I just remembered, that I have already tried the direct calls, when the pool & cursor were created right after each other by the same thread. The problem remained unless either he pool or the calling thread were destroyed after accessing the database.

Comment by Jesse Williamson (Inactive) [ 30/Jul/21 ]

Hi moravas8051@gmail.com, I've been working on this, but I'm so far not able to reproduce the leak.  Since

I'm not sure what your application's functions are doing I have approximated as below. valgrind is not

reporting any leaking resources. Are you able to get this to occur with direct calls to mongocxx?

struct Result {};

std::vector<Result> Query() {
  
 std::vector<Result> result;
  
 static mongocxx::uri mongo_uri("mongodb://localhost:27017");
 static auto _pool = mongocxx::pool(mongo_uri);
  
 auto client =_pool.acquire();
  
 mongocxx::database db = (*client)["database"];
  
 mongocxx::collection collection = db["coll"];
  
 auto cursor = db["coll"].find({});
  
 for (auto&& doc : cursor)  
  std::cout << bsoncxx::to_json(doc) << std::endl;

 return result;
}

Comment by Kevin Albertson [ 16/Jul/21 ]

Thank you for the detailed report moravas8051@gmail.com! We will look into this soon.

Comment by norbert NNN [ 15/Jul/21 ]

We tried to release the memory by guarding the mongocxx::stdx::optional<> without success (mentioned in CXX-1217)

Generated at Wed Feb 07 22:05:30 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.