[SERVER-6839] SECONDARY keeps crashing Created: 24/Aug/12 Updated: 21/Sep/12 Resolved: 21/Sep/12 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Replication |
| Affects Version/s: | 2.0.6 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Kerim Satirli | Assignee: | Unassigned |
| Resolution: | Incomplete | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Attachments: |
|
| Operating System: | Linux |
| Participants: |
| Description |
|
A SECONDARY in our replica set keeps crashing for an unknown reason. Our setup looks like this:
The first secondary (2.0.6) keeps crashing at, seemingly, random moments. A look at the log file only shows this as the final part:
Here's some more information:
|
| Comments |
| Comment by Ian Whalen (Inactive) [ 21/Sep/12 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
closing, but please reopen if you have the core dumps (or other info) available to attach | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Jeremy Mikola [ 05/Sep/12 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Is there a core dump available for either crash? | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Kerim Satirli [ 27/Aug/12 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
full log of the failing SECONDARY. Disregard the filename; it stems from a point when we used to use replicasets. We did not yet get around to updating the name of the config file and log file. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Kerim Satirli [ 27/Aug/12 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Sure can. Here's rs.conf();
I will attach another log from when it crashed. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Jeremy Mikola [ 24/Aug/12 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Can you share rs.conf() as well? Based on the log you shared, it appears that index creation (for the collection with 44589575 documents) runs from 1-99% and presumably completes, but then restarts again, ultimately hanging at 95% from "Thu Aug 23 13:00:07" until the time of the crash at "Thu Aug 23 13:19:08". Was there a core dump from the crash, or is the log the only evidence we have? | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Kerim Satirli [ 24/Aug/12 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Hey Scott, thanks for getting this. What I mean by "crash" is an actual application crash. As in: mongod will not show up in the current process list and, naturally, will be marked as unhealthy / unreachable in the rs.status() call. Here's the output from "free -ltm":
I indeed was building an index on the primary and the process finished there, just fine. Running some tests, it also showed that queries were faster, so that leads me to believe that everything worked out. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Scott Hernandez (Inactive) [ 24/Aug/12 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Can you please include the full logs showing the time the "crash" happens and the restart? Can you define what you mean by "crash" if there is no error in the logs? Also, what does free -ltm show? It looks like from the logs you are building an index, is this something you did on the primary and which is replicating to the secondaries and being applied at the time of the "crash"? | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Kerim Satirli [ 24/Aug/12 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
For the sake of completion, here's an output of rs.status();
I also forgot to mention that the 2.2 SECONDARY is set up to never become a primary based on priorities. The reason for that is that we want to test 2.2 mainly im terms of playing nice with others. |