[SERVER-20741] Primary crash after N hours of running as primary Created: 02/Oct/15 Updated: 08/Jan/24 Resolved: 12/Oct/15 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | JavaScript |
| Affects Version/s: | 3.0.6 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Julien Durillon | Assignee: | Unassigned |
| Resolution: | Done | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Attachments: |
|
| Operating System: | ALL |
| Participants: |
| Description |
|
I manage a sharded cluster for my company. That cluster is used by clients as a free cluster: they provision a db and can use it (with some limitations) with their applications. I moved from 2.6 to 3.0.6 a week ago (on Thursday 2015-09-24), and ever since I have this strange behavior: after being elected as primary, a node will last a few hours (between 2 and 5) and then crash. We have systemd restarting the node automatically, and in the meantime, a new node is elected as primary and run for a few more hours then crashes, and another one is elected primary, etc. The cluster is composed of 3 config servers, 3 mongos, and 5 mongod all within a single RS and handling a single shard. All 3 data nodes crash a few hours after being elected master. I attached the log of a primary starting 30 seconds before the segfault happens. /sys/kernel/mm/transparent_hugepage/defrag does not exist on 2 of the 3 servers, and I set it to "never" on the third one. |
| Comments |
| Comment by Ramon Fernandez Marina [ 29/Jan/16 ] | ||
|
For the record, | ||
| Comment by Ramon Fernandez Marina [ 12/Oct/15 ] | ||
|
Thanks for the additional information judu. Since the issue you describe does not point to a bug in the server I'm going to close this ticket. If you need assistance building MongoDB from sources you can post in the mongodb-dev group; please make sure to provide information about the version of the tools and libraries you're using. In particular, we don't yet support compiling with gcc 4.9 or older. For user support discussions please post on the mongodb-user group. See also our Technical Support page for additional support resources. Regards, | ||
| Comment by Julien Durillon [ 12/Oct/15 ] | ||
|
Ok, so here is the build log of my currently running instances, which are still encountering the same crash stacktrace. | ||
| Comment by Julien Durillon [ 09/Oct/15 ] | ||
|
Build log of mongodb. | ||
| Comment by Julien Durillon [ 08/Oct/15 ] | ||
|
Sorry, I'm testing something about the build, so build log is coming. I'm not forgetting! | ||
| Comment by Julien Durillon [ 05/Oct/15 ] | ||
|
Crash log for conn1606 with all the logs. (Same as crash.log, but with all the logs from all the connections in case I erased a bit too much in crash.log.) | ||
| Comment by Julien Durillon [ 05/Oct/15 ] | ||
|
Crash log with all (and only) the [conn1606] in it. | ||
| Comment by Ramon Fernandez Marina [ 05/Oct/15 ] | ||
|
judu, can you please send a longer part of the log? In particular I'm looking for more details about conn447, which is the one involved in the segfault:
Also, how did you install this mongod instance? Did you use a package manager or did you build it from sources? If the latter, can you please send the command line used to build it? Thanks, | ||
| Comment by Julien Durillon [ 02/Oct/15 ] | ||
|
Thanks for your quick answer Ok, first, we do not use SELinux nor grsecurity. So, I'm attaching the result of `ldd mongod` if you can find anything strange in it. In the log I attached, there is a mapReduce at the beginning. It's the last operation of that kind. I ran that mapReduce again, and succeeded to crash the primary node by doing so. I included the log starting at the first mapReduce command. | ||
| Comment by Ramon Fernandez Marina [ 02/Oct/15 ] | ||
|
Sorry you've run into this judu. The crash shows the problem is happening inside the V8 engine:
We've seen similar issues in V8 when the machine is configured with SELinux, grsecurity, or imposes other limitations that affect V8's memory management. Can you please elaborate on the configuration for the affected node? Also, can you provide details of what operations this node was running when it crashed? I'm looking for javascript-related operations like using $where or mapReduce. Thanks, |