[SERVER-10372] ReplicaSetMonitor creates a thread that references memory it does not own Created: 29/Jul/13 Updated: 10/Dec/14 Resolved: 06/Aug/13 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Internal Client |
| Affects Version/s: | 2.2.4, 2.4.5 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Gerry F | Assignee: | Tad Marshall |
| Resolution: | Duplicate | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: |
Windows, Linux confirmed |
||
| Issue Links: |
|
||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||
| Operating System: | ALL | ||||||||||||
| Steps To Reproduce: | The simplest approach is to call mongo::ReplicaSetMonitor::remove() to destroy the referenced memory in a running process. The new thread will reference the freed memory within 10 seconds. Using Microsoft's ODBCtest, start and end a connection to a replica set. When the last connection to that driver .dll is closed, the program will drop the .dll and crash the next time the new thread wakes up. |
||||||||||||
| Participants: | |||||||||||||
| Description |
|
The ReplicaSetMonitor constructor creates a new thread that references memory belonging to the constructor thread. If the constructor thread exits or frees the ReplicaSetMonitor, the new thread references freed memory. This is likely related to We need this fixed in 2.4. The 2.4.5 C++ API is working well with older Mongo servers so we don't need this backported into 2.2. |
| Comments |
| Comment by Tad Marshall [ 06/Aug/13 ] | |||||||||||||||||||||
|
This was fixed by the commits made for linked tickets | |||||||||||||||||||||
| Comment by Tad Marshall [ 06/Aug/13 ] | |||||||||||||||||||||
|
Hi Gerry, Thanks! I'll resolve this ticket. Can you file tickets against the C++ driver documentation and a suggestion to make ReplicaSetMonitorWatcher more manageable? I think that Randolph is also hoping to address ReplicaSetMonitorWatcher ... it's the thread that crashes in the remaining problem in Tad | |||||||||||||||||||||
| Comment by Gerry F [ 06/Aug/13 ] | |||||||||||||||||||||
|
Hi Tad, I think this ticket can be closed. There are two issues remaining, but maybe they should get their own tickets:
Looking forward to 2.4.6, | |||||||||||||||||||||
| Comment by Tad Marshall [ 06/Aug/13 ] | |||||||||||||||||||||
|
Hi Gerry, The remaining issues in All of the fixes found so far have been backported to the 2.4 branch and will be in version 2.4.6. If you don't call ReplicaSetMonitor::remove() (which I don't think is something that users of the C++ driver should do), are there still issues to be fixed for this ticket? Tad | |||||||||||||||||||||
| Comment by Gerry F [ 31/Jul/13 ] | |||||||||||||||||||||
|
I was using ReplicaSetMonitor::remove() based on some older sample code in the hopes that it would prevent our ReplicaSetMonitor related crashes. In testing today, it has become clear that the ReplicaSetMonitor was not the problem. Removing the ReplicaSetMonitor caused the Watcher thread to do nothing every ten seconds, instead of referencing damaged memory every ten seconds. Basically, it hid the symptoms of an unrelated bug. Sometimes when I started a second connection to the same replica server, I would get memory trashing. This doesn't show up in Dr Memory or valgrind, because it is valid memory being changed. If I copy the dbclient_rs.cpp and dbclient_rs.h fixes for In summary, the | |||||||||||||||||||||
| Comment by Tad Marshall [ 31/Jul/13 ] | |||||||||||||||||||||
|
Hi Gerry, Thanks for the sample program! I discussed the correct usage of ReplicaSetMonitor with another developer here, and he shares my feeling that this API (ReplicaSetMonitor::remove) was never intended to be called by C++ driver users. I also talked to people in our documentation group and confirmed my impression that the C++ driver was more-or-less undocumented, and so users are probably reading MongoDB source code to try to figure out how to use the C++ driver. I apologize for the poor state of information on proper use of callable functions in the C++ driver. I'll try your sample program and see if I can make MongoDB more defensive against calls like this. In the meantime, can you describe what you are trying to accomplish in removing the replica set that is being used by the DBClientReplicaSet that you created? There may be a better way to do it that will not require additional defensive code in the C++ driver. Thanks! Tad | |||||||||||||||||||||
| Comment by Gerry F [ 31/Jul/13 ] | |||||||||||||||||||||
|
This simple test case works correctly, except for a rare crash on exit (1 crash in over 30 runs). Running in valgrind seems to get luckier, at over 60 runs without an exit crash or invalid memory access. Clearly, the added complexity of the real application matters.
| |||||||||||||||||||||
| Comment by Gerry F [ 31/Jul/13 ] | |||||||||||||||||||||
|
My current test case is too embedded in a tool environment to share nicely. While I am trying to cut my test case down, you might try this: If I am right about what the problem is, this simple test will read freed memory and be detectable in valgrind. It will take me a while to convert my test case to something you can run, but it is priority one. | |||||||||||||||||||||
| Comment by Tad Marshall [ 31/Jul/13 ] | |||||||||||||||||||||
|
Hi Gerry, Thanks for this bug report. As you know from There is a remaining problem with the ReplicaSetMonitorWatcher thread, because the thread doesn't terminate until after the object containing the data it uses has been destroyed. This only causes problems intermittently, because of the 10 second sleep that this thread does on each pass; if the process termination happens while the thread is sleeping, nothing bad happens. When the termination happens when the thread is not in the sleep, we get the problems seen in I think that you are describing a different problem. You mentioned calling remove() on a ReplicaSetMonitor and then waiting 10 seconds (the duration of the sleep in ReplicaSetMonitorWatcher's loop) so it sounds like this is not the process termination case described in Can you describe the steps to reproduce the problem you are seeing, or post sample code that we can use to reproduce the problem? Thanks! Tad |