[CDRIVER-3674] _mongoc_handshake_build_doc_with_application core dumps with strlen call Created: 15/May/20 Updated: 27/Oct/23 Resolved: 26/May/20 |
|
| Status: | Closed |
| Project: | C Driver |
| Component/s: | bsd, libmongoc |
| Affects Version/s: | 1.14.0, 1.15.0, 1.16.2, 1.17.0-beta |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Sergey Baranov | Assignee: | Kevin Albertson |
| Resolution: | Works as Designed | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: |
Openbsd 6.6. driver 1.16.2 |
||
| Description |
|
Hi, i migrated to 1.16.2 (with MongoDB 3.2) from very very old 1.0.2 release (was MongoDB 2.6). Im using driver for years with the simple setup, so i completed migration with no changes in mongoc snippets at all. The build is compiled from the sources with cmake / gcc, with no additional cmake options. And as far as i run new build ive got my application core dump. it is when i call mongoc_collection_remove() or mongoc_collection_insert() and this does not always happens. With the same query it may dumped or may not, and crashes about once out of ten times.
gdb trace here #0 strlen () at /usr/src/lib/libc/arch/amd64/string/strlen.S:125
gcc -v CVS to this strlen
PS: im new here, so i could missing something in rules. youre welcome to ask me. |
| Comments |
| Comment by Sergey Baranov [ 26/May/20 ] | ||||||||||||||||||||||
|
thank you much for investigation and focused support. | ||||||||||||||||||||||
| Comment by Kevin Albertson [ 26/May/20 ] | ||||||||||||||||||||||
|
Great, glad to hear it is working now! | ||||||||||||||||||||||
| Comment by Sergey Baranov [ 26/May/20 ] | ||||||||||||||||||||||
|
i think you are right! i made code review and noticed that one contributor commited new function with mongoc_init/mongoc_cleanup calls, and it called few steps before my mongoc routine, whitch contains these too. I knew that init and cleanup must be once application starts and terminates, but since the new function actually do nothing with the database yet (its I just moved mongoc_init/cleanup calls outside of that local fuctions and i get 100+ application cycles with zero segfaults. So i would ask you to close the issue. | ||||||||||||||||||||||
| Comment by Kevin Albertson [ 26/May/20 ] | ||||||||||||||||||||||
|
Thanks for the response.
That's true. mongoc_collection_insert is documented as being superseded by mongoc_collection_insert_one and mongoc_collection_insert_many. But mongoc_collection_insert is a light wrapper around mongoc_collection_insert_one. Though it's probably best to change those calls to the newer API, that should not change behavior. One other guess... is it possible mongoc_cleanup is called before the application terminates? The snippets do not include those calls, but perhaps it is elsewhere. mongoc_cleanup cleans up all global state. If it was called (from any thread) that would invalidate libmongoc's global state (including the handshake). | ||||||||||||||||||||||
| Comment by Sergey Baranov [ 26/May/20 ] | ||||||||||||||||||||||
|
At the first glance, it seems that simple looped snippets arent going with segfault for me... Kevin, lets hold this issue, i hope i will catch something new if the root cause leads to my code around. | ||||||||||||||||||||||
| Comment by Sergey Baranov [ 26/May/20 ] | ||||||||||||||||||||||
|
I do multiple threads, but threads do not work with mongoc ever (neither mongoc nor the data which i use with mongoc), all the mongoc operations are in parent thread only. Moreover, i thought there may be some kind of race condition when system is running multiple processes (multiple parent threads doing with mongoc and one of them segfaults), but single process had segfault too. i dont know, probably i should try to make simple program, which will loop my snippet to check will it crash or no. No, i do not call `mongoc_handshake_data_append`. | ||||||||||||||||||||||
| Comment by Kevin Albertson [ 25/May/20 ] | ||||||||||||||||||||||
|
Thank you for the snippets. Those seem reasonable to me. Modifying them slightly and running them on my end did not reproduce the same segfault. So I may need more information to diagnose. Given that this does not reproduce consistently, is your application creating multiple threads? Looking through mongoc-handshake.c, I do see a minor data race if multiple single-threaded mongoc_client_t were to be running in separate threads (as opposed to multiple mongoc_client_t obtained from a mongoc_client_pool_t). Sure enough running an example creating 100 threads with single threaded mongoc_client_t produces a warning from a thread-sanitizer:
That is due to a boolean being written by both threads. That seems worthwhile to fix in its own right, so I filed a separate ticket: CDRIVER-3685. But I don't see how that connects to the crash in strlen you are observing. Is it possible to include a compilable example? Including the the cmake/make commands and their output may additionally help diagnose. Additionally, I suspect the answer is "no", but is any part of your code calling `mongoc_handshake_data_append`? | ||||||||||||||||||||||
| Comment by Sergey Baranov [ 21/May/20 ] | ||||||||||||||||||||||
|
And, as i said, these two may work fine most of times. | ||||||||||||||||||||||
| Comment by Sergey Baranov [ 21/May/20 ] | ||||||||||||||||||||||
|
Hi Kevin! Sure for example this two snippets get segfault mongoc_client_t *mongoclient = NULL; mongoc_init(); if(mongoclient) { "); if(!bson_init_from_json(doc, cidstr, -1, &error)) { printf("BSON init error: %s\n", error.message); } bson_oid_init(&oid, NULL); if(options&F_INTERNAL) { sstr = bson_as_json (doc, &len); printf("bson_as_json %s\n", sstr); bson_free (sstr); }//this ends with core dump bson_destroy(doc); } #ONE MORE SNIPPET cursor = mongoc_collection_find_with_opts(mongocollection, query, opts, NULL); //this ends with core dump } mongoc_cursor_destroy(cursor); | ||||||||||||||||||||||
| Comment by Kevin Albertson [ 21/May/20 ] | ||||||||||||||||||||||
|
Hello asuwish.def@gmail.com , apologies for the delayed response. The handshake pointed to in the included stack trace is initialized globally upon calling mongoc_init, which must be called at the start of the application (see http://mongoc.org/libmongoc/current/mongoc_init.html). Is it possible there is a missing call to mongoc_init? Though, that is just a speculation. If that does not resolve the issue, can you include any relevant snippet of your application code?
That is not unexpected based on the stack trace. The cursor returned by mongoc_collection_find_with_opts is lazy. It will not send the find command until the first call to mongoc_cursor_next. | ||||||||||||||||||||||
| Comment by Sergey Baranov [ 20/May/20 ] | ||||||||||||||||||||||
|
is someone here? Just to update. | ||||||||||||||||||||||
| Comment by Sergey Baranov [ 15/May/20 ] | ||||||||||||||||||||||
|
gdb trace for insertion is the similar | ||||||||||||||||||||||
| Comment by Sergey Baranov [ 15/May/20 ] | ||||||||||||||||||||||
|
insert query (i replaced the real data) { "key1" : "val1", "key2" : int_val1, "key3" : "val3", "key4" : { "key1_1" : intval1_1, "key2_2" : "val2_2", "key3_3" : "val3_3", "key4_4" : int_val4_4, "key5_5" : "val5_5", "key6_6" : [ "val7_1", "val7_2", int_val7_3, "val7_4", "val7_5", "val7_6", "val7_7" ] }, "_id" : { "$oid" : "5ebe90e36359dc3d01255953" } } and remove query is simple { "key1" : "val1" }
| ||||||||||||||||||||||
| Comment by Sergey Baranov [ 15/May/20 ] | ||||||||||||||||||||||
|
i also tried 1.15.0, 1.14.0, 1.17.0-beta with the same issue. |