[SERVER-6488] Server crash Created: 17/Jul/12 Updated: 15/Aug/12 Resolved: 23/Jul/12 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Internal Code |
| Affects Version/s: | 2.0.6 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Critical - P2 |
| Reporter: | Nic Cottrell (Personal) | Assignee: | Eric Milkie |
| Resolution: | Duplicate | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: |
RHEL5 |
||
| Attachments: |
|
||||||||
| Issue Links: |
|
||||||||
| Operating System: | Linux | ||||||||
| Participants: | |||||||||
| Description |
|
Server has been running well and had an uptime of 60 days or so. Then suddenly a crash. Mo ngod.log shows: {{
}} and later... {{ Tue Jul 17 17:19:20 Got signal: 6 (Aborted). Tue Jul 17 17:19:20 Backtrace: Tue Jul 17 17:19:20 Invalid access at address: 0x4 Tue Jul 17 17:19:20 Got signal: 11 (Segmentation fault). Tue Jul 17 17:19:20 Backtrace: Logstream::get called in uninitialized state , f: { $gte: 0 }, t: { $in: [ "un cadre", "digitale sans", "Excerpt", "fil", "communiquant son", "photo numérique", "photo digitale", "recherche un", "cadre", "propose", "fil et", "pour", "qui recherche", "cadre photo", "qui", "numérique", "et communicant", "communicant", "propose pour", "numérique communiquant", "Telefunken propose", "son", "DPF", "un", "photo", "recherche", "sans", "sans fil", "ceux", "communiquant", "Telefunken", "Wifi", "et", "tous", "digitale" ] }, lc: { $in: [ "ukr", "slk", "cat", "zho", "deu", "fra", "por", "fin", "hin", "ces", "slv", "nld", "kor", "est", "jap", "rus", "ara", "pol", "eng", "vie", "sve", "esl", "hun", "isl", "lav", "ell", "tur", "bel", "nor", "bul", "dan", "sqi", "ita", "mkd" ] } } nscanned:371 nreturned:101 reslen:5961 634ms |
| Comments |
| Comment by Eric Milkie [ 23/Jul/12 ] |
|
Oh of course, I should have noticed that. I didn't even release 2.0.6 until June 4th, so there's no way a server that's been up for 60 days could be running that version. This bug was definitely fixed in 2.0.6 though, so just make sure you bounce all your old servers to get the new version with the bugfix. |
| Comment by Nic Cottrell (Personal) [ 23/Jul/12 ] |
|
Hmm... pretty sure 2.0.6 was installed on the machine, since we have yum upgrade running nightly via cron. But since the mongo process wasn't restarted, it might have been running off the 2.0.5 version. |
| Comment by Eric Milkie [ 23/Jul/12 ] |
|
The server crashed when trying to translate a query into a string, for a log entry for a long-running query. There was a bug that was fixed in 2.0.6 that may have overrun the buffer when doing translations for numbers with large numbers of digits, so I would check to make sure you are indeed using 2.0.6. |
| Comment by Nic Cottrell (Personal) [ 23/Jul/12 ] |
|
Unfortunately no. The server had been up for about 60 days and the logs had been rotated away. And no, I'm not sure which query trigger this. It's on a live server and so could have been anything... |
| Comment by Eric Milkie [ 23/Jul/12 ] |
|
Ok do you have the beginning of the log when the server was started? And also the query that crashed the server, if you have it. Thanks! |
| Comment by Nic Cottrell (Personal) [ 23/Jul/12 ] |
|
Here's the log file for the day of the crash. |
| Comment by Eric Milkie [ 23/Jul/12 ] |
|
Also, can you attach the full server log, if you still have it? |
| Comment by Eric Milkie [ 23/Jul/12 ] |
|
Hi Nic, |