[SERVER-8891] Simple client fail with segmentation fault in mongoclient library Created: 07/Mar/13 Updated: 11/Jul/16 Resolved: 25/Jul/13 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Internal Client |
| Affects Version/s: | 2.2.2, 2.2.3 |
| Fix Version/s: | 2.4.6, 2.5.2 |
| Type: | Bug | Priority: | Critical - P2 |
| Reporter: | Stanislav Ievlev | Assignee: | Tad Marshall |
| Resolution: | Done | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: |
Linux |
||
| Issue Links: |
|
||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||
| Operating System: | ALL | ||||||||
| Participants: | |||||||||
| Description |
|
Example code below fails with segmentation fault at exit:
|
| Comments |
| Comment by Gerry F [ 22/Aug/13 ] | |||||||||||||||||||
|
The official 2.4.6 release has fixed this bug. | |||||||||||||||||||
| Comment by Gerry F [ 29/Jul/13 ] | |||||||||||||||||||
|
The above fix appears to solve this specific crash in my testing.
| |||||||||||||||||||
| Comment by auto [ 29/Jul/13 ] | |||||||||||||||||||
|
Author: {u'username': u'tadmarshall', u'name': u'Tad Marshall', u'email': u'tad@10gen.com'}Message: Change the order of some static objects and group them near the There may be other cases that are not solved by this fix, and there | |||||||||||||||||||
| Comment by auto [ 25/Jul/13 ] | |||||||||||||||||||
|
Author: {u'username': u'tadmarshall', u'name': u'Tad Marshall', u'email': u'tad@10gen.com'}Message: Change the order of some static objects and group them near the There may be other cases that are not solved by this fix, and there | |||||||||||||||||||
| Comment by Gerry F [ 22/Jul/13 ] | |||||||||||||||||||
|
I have this bug in 2.2.0, 2.2.4, and 2.4.5 C++ clients. In my debugger, the first reference to the ReplicaSetMonitor is destroyed correctly. When the second 'shared' reference tries to destroy the same object, the memory has already been freed. The second reference is made when the ReplicaSetMonitor's BackgroundJob creates a heartbeat thread. Another way to exercise this bug is to use mongo::ReplicaSetMonitor::remove() to destroy the ReplicaSetMonitor, and then wait up to 10 seconds for the heartbeat thread to reference the freed memory. There is a race condition, as the process occasionally exits successfully, more often on windows than linux. My first idea for a fix would be to eliminate the BackgroundJob and the stray thread. | |||||||||||||||||||
| Comment by Erik Snider [ 17/Jul/13 ] | |||||||||||||||||||
|
I just tested it on 2.5.1-pre-, and the problem still exists. I ran the example program through valgrind, and this blurb looks like it may be useful:
| |||||||||||||||||||
| Comment by Stanislav Ievlev [ 27/May/13 ] | |||||||||||||||||||
|
I've tested on 2.4.3. Problem still exists. | |||||||||||||||||||
| Comment by Stanislav Ievlev [ 28/Mar/13 ] | |||||||||||||||||||
|
Sorry, I have no time to do it. | |||||||||||||||||||
| Comment by Ian Whalen (Inactive) [ 26/Mar/13 ] | |||||||||||||||||||
|
Hi Stanislav, have you been able to try our most recent stable release (2.4.1) to see if the problem goes away? | |||||||||||||||||||
| Comment by Tad Marshall [ 10/Mar/13 ] | |||||||||||||||||||
|
The call to _exit() is to work around a problem with static constructors/destructors. Because the C++ standard does not specify an order in which static constructors and destructors run, we can segfault on exit when objects get destroyed in the "wrong" order. This is a separate problem from making sure that the "global initializers" have run before using the DBReplicaSet class. The symptom of this other problem is that everything works great right up until you call exit() (no underscore) or return from main(), at which point instead of exiting cleanly you get a segfault. This is a bug, and the calls to _exit() are just a way to avoid hitting it. Can you try a version 2.4.0-rc1 or newer (such as 2.4.0-rc2) and see if your problem goes away? Does your program then work fine, but segfault on exit? | |||||||||||||||||||
| Comment by Stanislav Ievlev [ 10/Mar/13 ] | |||||||||||||||||||
|
I've tested this sample both with 2.2.2 and 2.2.3 versions. I haven't tried to call runGlobalInitializersOrDie(), but according to a last comment in Where can I find an example of a right usage of DBReplicaSet class? I've seen that you are using an ugly hack with _exit() in your command line utilities, but I cannot do the same hack in my application. | |||||||||||||||||||
| Comment by sam.helman@10gen.com [ 08/Mar/13 ] | |||||||||||||||||||
|
Hello, Have you tried calling runGlobalInitializersOrDie, as specified in | |||||||||||||||||||
| Comment by Ian Whalen (Inactive) [ 07/Mar/13 ] | |||||||||||||||||||
|
moving from Blocker to Critical as this is does not appear to be a release-blocking issue. |