[SERVER-4926] segfault in mongo shell Created: 10/Feb/12 Updated: 15/Aug/12 Resolved: 04/Mar/12 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Shell |
| Affects Version/s: | 2.0.2 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Zac Witte | Assignee: | Tad Marshall |
| Resolution: | Duplicate | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: |
ubuntu 11.04 |
||
| Issue Links: |
|
||||||||
| Participants: | |||||||||
| Description |
|
I issued a query in a sharded environment, hit Ctrl+C in the shell and got this strange error message about "unknown shell/collection.js". Hit Ctrl+C again and got this segfault.
|
| Comments |
| Comment by Tad Marshall [ 03/Mar/12 ] | |||||||||||
|
@Zac, Are you able to answer the questions I asked in the final paragraph of my earlier response? It would be helpful in reproducing this problem, which in turn would help us fix it. Thanks! Tad Marshall | |||||||||||
| Comment by Tad Marshall [ 10/Feb/12 ] | |||||||||||
|
The error message "error doing query: unknown shell/collection.js:151" is an error message and a location: "error doing query: unknown" is the error and "shell/collection.js:151" is the location. shell/collection.js is a JavaScript helper function compiled into the code as text and 151 is the line number in that source file. Here is the relevant bit of the file shell/collection.js:
So, the error happened in the findOne() helper function when it had called the .find() function and that call was interrupted by the ^C. Ideally, the ^C should have caused a clean abort of the operation and more polite and sensible messages ending with a return to the prompt. Apparently, it continued to process something, and the second ^C interrupted the MessagingPort::call() function. The actual crash happened after control had returned to the main prompt loop, where the sayReplSetMemberState() routine is used to generate text to display at the prompt (e.g. "ReplSetName:PRIMARY> "). As a guess, it seems like whatever was making the findOne() query take too long was also making the query to get the replication state take too long (hence the second ^C), but the shell's internal state was not consistent due to incorrect handling of the first ^C, leading to the crash. I assume that the findOne() operation was taking a long time and that's why you hit ^C. To help us reproduce this, can you tell us a little about the configuration of the database and the hourly_stats collection? How long had you been waiting for a response and do you know why the response would have been delayed (e.g. known problems at the time of the query)? Have you used ^C on slow responses like this before and had it work properly? Anything else you can think of that might help us set up and environment to reproduce this? Thanks! |