[SERVER-2113] repeatable server crash (fixed by --repair) Created: 17/Nov/10 Updated: 29/May/12 Resolved: 02/Sep/11 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Drew Perttula | Assignee: | Unassigned |
| Resolution: | Incomplete | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: |
1.6.3 64bit ubuntu linux |
||
| Operating System: | ALL |
| Participants: |
| Description |
|
FYI, I was getting this crash when my {t: {$gte: someTime}} query was at just the wrong time, i.e. when a certain document was going to be in the result set. I tried --repair, and while the first attempt froze my whole machine during indexing (bad disk? power?), a successful --repair run seems to have fixed the problem. Tue Nov 16 23:25:24 Backtrace: Tue Nov 16 23:25:24 dbexit: |
| Comments |
| Comment by Eliot Horowitz (Inactive) [ 02/Sep/11 ] |
|
Just a note that for this setup you probably want to run with journalling |
| Comment by Drew Perttula [ 21/Nov/10 ] |
|
I have nothing auto-removing lock files. When we get leftover lock files, we usually run repair. But the crash shown in this ticket never left a lock file. "mongod --dbpath /db --port 11021" would get autorestarted and be back up within a second or two. |
| Comment by Eliot Horowitz (Inactive) [ 21/Nov/10 ] |
|
After a crash until 1.8 you have to run --repair. |
| Comment by Drew Perttula [ 21/Nov/10 ] |
|
Nope. I don't have separate startup commands for 'normal' and 'after a crash', so I didn't include --repair since it would slow down all the normal startups. |
| Comment by Eliot Horowitz (Inactive) [ 18/Nov/10 ] |
|
Does the automatic restarter run --repair? |
| Comment by Drew Perttula [ 18/Nov/10 ] |
|
I'm not exactly sure what you mean. The system was crashing a lot over the weekend for unrelated reasons (attempted hardware upgrades). Since I run mongod under supervisord with automatic restarts, even total mongod crashes like the above are actually pretty unnoticeable to me unless I go digging. I think the crashes had been happening every 30 minutes for more than a day by the time I started investigating why my one query always seemed to fail with "could not find master/primary". I mostly posted this trace so you could notice if you were getting a lot of reports in that one method, or if the bug was obvious enough that it could be fixed just from the trace. I don't think there's much to be done about my particular situation. Sorry I didn't clone the corrupt db before the repair. |
| Comment by Eliot Horowitz (Inactive) [ 17/Nov/10 ] |
|
Had this system/mongod ever crashed without a full repair before? |