[CDRIVER-3879] Pointer being freed not allocated in mongoc_cursor_destroy Created: 28/Jan/21  Updated: 27/Oct/23  Resolved: 10/Feb/21

Status: Closed
Project: C Driver
Component/s: libmongoc
Affects Version/s: 1.17.3
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Thijs Cadier Assignee: Clyde Bazile III (Inactive)
Resolution: Works as Designed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Tried on MacOs Catalina.



 Description   

I'm upgrading the Mongoc version used in the Mongo Rust driver: https://github.com/thijsc/mongo-rust-driver/pull/59.

It seems something changed. When running the cleanup logic in the correct order, as enforced by Rusts ownership system there is now a double free:

 collection-4a23c99cd16c3bd9(28304,0x700001eb0000) malloc: *** error for object 0x7fc6ca40d3e0: pointer being freed was not allocated
collection-4a23c99cd16c3bd9(28304,0x700001eb0000) malloc: *** set a breakpoint in malloc_error_break to debug

This code worked correctly in earlier C driver versions. I don't see any notice about a change in the docs. Is this a bug?



 Comments   
Comment by Thijs Cadier [ 15/Feb/21 ]

Thanks for the help! That would have been tricky to figure out. Fix based on this comment is here.

Comment by Clyde Bazile III (Inactive) [ 10/Feb/21 ]

thijs@appsignal.com, closing this ticket but feel free to re-open if this problem persists.

Comment by Clyde Bazile III (Inactive) [ 10/Feb/21 ]

I was able to reproduce this issue in the C driver by freeing the BSON object passed to mongoc_cursor_next. That is, result in the example above.

int
main (int argc, char *argv[])
{
   mongoc_init();
 
   mongoc_uri_t* uri = mongoc_uri_new("mongodb://localhost:27017/");
   mongoc_client_pool_t* pool = mongoc_client_pool_new(uri);
   mongoc_client_t* client = mongoc_client_pool_pop(pool);
   mongoc_collection_t* collection = mongoc_client_get_collection(client, "rust_driver_test", "items");
   mongoc_cursor_t* cursor = NULL;
 
   bson_t* command = BCON_NEW ("ping", BCON_INT32 (1));
 
   bson_t* result = NULL; // bad: should be "const"
   mongoc_query_flags_t flags = 0;
   cursor = mongoc_collection_command(collection, flags,0,0,0,command,NULL,NULL);
 
   BSON_ASSERT(mongoc_cursor_next(cursor, &result));
   BSON_ASSERT(bson_has_field(result, "ok"));
   printf("result: %s\n", bson_as_json(result,0));
 
   mongoc_client_pool_push(pool, client);
 
   bson_destroy(result); // bad: do not free "result"
   mongoc_cursor_destroy(cursor); // this will also attempt to free "result"
   mongoc_collection_destroy(collection);
   mongoc_client_pool_destroy(pool);
   mongoc_uri_destroy(uri);
 
   mongoc_cleanup();
 
   return EXIT_SUCCESS;
}

The BSON object passed to mongoc_cursor_next is ephemeral and only good until the next call to mongoc_cursor_next or the cursor is destroyed with mongoc_cursor_destroy. In other words, the lifecycle of the BSON object is handled by the cursor. This is why this function takes a const bson_t*.

In cursor.rs:106, the BSON object passed to mongoc_cursor_next is used to construct a Bsonc object (here) and consequently destroyed at the end of the function by Bsonc::Drop. Commenting out the code within Bsonc::Drop (making it a no-op function) fixes this test since the ephemeral BSON object is no longer being freed twice.

I'm not sure how this should be handled Rust, but ultimately you should avoid freeing the BSON object passed to mongoc_cursor_next here and allow mongoc_cursor_t to manage its lifetime.

Comment by Thijs Cadier [ 10/Feb/21 ]

I do push the client back on the pool here. The Drop trait in Rust ensures that this cleanup logic always runs at the right moment.

I'm happy to provide a stack trace, but it's much easier to run the test yourself. Run this with a recent Rust stable installed and a Mongo server running on localhost:

cargo test test_command

On the main branch this test is green, on the mongoc_1_17 branch the double-free happens.

Comment by Clyde Bazile III (Inactive) [ 08/Feb/21 ]

thijs@appsignal.com I was not able to reproduce this issue locally using a similar example (below)

int
main (int argc, char *argv[])
{
 mongoc_init();
 
mongoc_uri_t* uri = mongoc_uri_new("mongodb://localhost:27017/");
 mongoc_client_pool_t* pool = mongoc_client_pool_new(uri);
 mongoc_client_t* client = mongoc_client_pool_pop(pool);
 mongoc_collection_t* collection = mongoc_client_get_collection(client, "rust_driver_test", "items");
 mongoc_cursor_t* cursor = NULL;
 
bson_t* command = BCON_NEW ("ping", BCON_INT32 (1));
 
const bson_t* result = NULL;
 mongoc_query_flags_t flags = 0;
 cursor = mongoc_collection_command(collection, flags,0,0,0,command,NULL,NULL);
 
BSON_ASSERT(mongoc_cursor_next(cursor, &result));
 BSON_ASSERT(bson_has_field(result, "ok"));
 printf("result: %s\n", bson_as_json(result,0));
 
mongoc_client_pool_push(pool, client);
 
mongoc_cursor_destroy(cursor);
 mongoc_collection_destroy(collection);
 mongoc_client_pool_destroy(pool);
 mongoc_uri_destroy(uri);
 
mongoc_cleanup();
 
return EXIT_SUCCESS;
}

However, it isn't obvious from your example (here) if the client is being returned to the pool (as required, see this and my example above). That could certainly cause issues, but it's difficult to see where the issue lies without a complete stack trace.

Comment by Thijs Cadier [ 01/Feb/21 ]

So the crash happens here for example.

This triggers the cleanup logic in the drop of the cursor here. The order of cleanup is ensured by some lifetime relations that are present, starting here.

So a client's destroy cannot be executed before a cursor is. Is there a scenario in which cleanup would happen automatically now?

Comment by Clyde Bazile III (Inactive) [ 29/Jan/21 ]

Hey thijs@appsignal.com ! I'm currently investigating this issue. To answer your question, pointing us to the problematic test could help us better understand what Rust defines as the "correct order" of the cleanup logic and possibly point out anything that's obviously fishy.

Additionally, a stack trace or debugging info could help us see what was happening under our driver's hood just before the crash. Both are helpful, but the latter (the stack trace) more so.

Comment by Thijs Cadier [ 28/Jan/21 ]

Thanks for your reply! I'm indeed going from 1.8.2 all the way up to the latest version. I know exactly where this call is made in a test. Is that helpful? You could run the tests of this project.

Comment by Kevin Albertson [ 28/Jan/21 ]

Thank you for the report thijs@appsignal.com! We are unaware of an existing double free in mongoc_cursor_destroy, though we will investigate.

Correct me if I am wrong, but from the PR, it looks like this is upgrading from 1.8.2.

If there is a stack trace available, or debug info, that would be helpful in our investigation.

Sincerely,
Kevin

Generated at Wed Feb 07 21:19:17 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.