[CDRIVER-1211] Get Segmentation fault (11) when using mongoc_bulk_operation_execute Created: 27/Apr/16 Updated: 03/May/17 Resolved: 17/May/16 |
|
| Status: | Closed |
| Project: | C Driver |
| Component/s: | libmongoc |
| Affects Version/s: | 1.3.5 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Critical - P2 |
| Reporter: | tianlei.shi | Assignee: | A. Jesse Jiryu Davis |
| Resolution: | Done | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: |
In CentOS7.0, RAID5, xfs filesystem, NUMA system. |
||
| Attachments: |
|
| Description |
|
Accidently we get this stacktrace when using mongodb, and my application keeps crashing: Got signal: Segmentation fault (11), address is 0xb0 from 0x7f9ea3a010a2 The size of each bulk is limited to 1000 in my application, and I have 4 bulks which insert into 4 different collections in the same database. |
| Comments |
| Comment by A. Jesse Jiryu Davis [ 17/May/16 ] | |||
|
This code cannot compile because you have redefined "collection" in insert_bulk. There are too many other bugs for me to determine which is actually the cause of your crash. For example, when insert_bulk does:
... that does not affect the caller. Therefore each call to insert_bulk creates a new bulk operation, and only executes the bulk for one document out of 999. I'm not able to diagnose your issue without code that I can compile and run, but I'm convinced it is not a bug in the driver. | |||
| Comment by tianlei.shi [ 17/May/16 ] | |||
|
I think the stream->writev points to _mongoc_stream_socket_writev, an I right? | |||
| Comment by tianlei.shi [ 17/May/16 ] | |||
|
The attached code is more or less the same logic as the code in our software. | |||
| Comment by tianlei.shi [ 17/May/16 ] | |||
|
This can only be reproduced in a special environment, and it happened randomly. The bulk is created in this way: mongoc_collection_create_bulk_operation(dbOpt->collection, false, NULL); According to the backtrace, it seems the crash happened in ret = stream->writev(stream, iov, iovcnt, timeout_msec); | |||
| Comment by A. Jesse Jiryu Davis [ 17/May/16 ] | |||
|
Thanks for the backtrace, but I still need to see your code in order to diagnose the crash. Can you please provide a Short, Self Contained, Compilable Example (http://sscce.org/) of code that reproduces this crash? Otherwise I cannot diagnose it. Thanks! | |||
| Comment by tianlei.shi [ 17/May/16 ] | |||
|
Hi Davis, Having recompiled libmongoc in debug mode, I get this stacktrace by gdb. Would you please take a look? (gdb) bt | |||
| Comment by tianlei.shi [ 29/Apr/16 ] | |||
|
Thanks, I'll try that when I reproduce it again. | |||
| Comment by A. Jesse Jiryu Davis [ 28/Apr/16 ] | |||
|
Hi, sending me an .so file won't help, thanks. If you can send me a program that reproduces the error, which I can compile and run, then I can diagnose the error. If you're unable to do that, please at least recompile libmongoc in debug mode:
Then the stack trace will include function names. | |||
| Comment by tianlei.shi [ 28/Apr/16 ] | |||
|
I was using mongoc_bulk_operation_execute in a large system. And this issue isn't 100% reproducable. Sometimes it happens and after rebooting the OS it works fine again. Could you find which line of code is for /opt/NetSensor/lib/libmongoc-1.0.so.0(+0x27ea1) [0x7f9ea3a02ea1]? | |||
| Comment by A. Jesse Jiryu Davis [ 27/Apr/16 ] | |||
|
Can you please provide a Short, Self Contained, Compilable Example (http://sscce.org/) of code that reproduces this crash? Otherwise I cannot diagnose it. Thanks! |