Loading...

XML

Word

Printable

JSON

Type: Bug
Resolution: Done
Priority: Major - P3
Fix Version/s: None
Affects Version/s: 3.0.6
Component/s: JavaScript
Labels:
None

Operating System:
ALL
Confidence Status:
None
Work Order:
3
CAR Domain/s:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

I manage a sharded cluster for my company. That cluster is used by clients as a free cluster: they provision a db and can use it (with some limitations) with their applications.

I moved from 2.6 to 3.0.6 a week ago (on Thursday 2015-09-24), and ever since I have this strange behavior: after being elected as primary, a node will last a few hours (between 2 and 5) and then crash.
The crash is a segmentation fault.

We have systemd restarting the node automatically, and in the meantime, a new node is elected as primary and run for a few more hours then crashes, and another one is elected primary, etc.

The cluster is composed of 3 config servers, 3 mongos, and 5 mongod all within a single RS and handling a single shard.
The 5 mongod are 2 arbiters and 3 data nodes.
The 3 data nodes are 1 MMAPv1 and 2 wiredTiger.

All 3 data nodes crash a few hours after being elected master.

I attached the log of a primary starting 30 seconds before the segfault happens.

/sys/kernel/mm/transparent_hugepage/defrag does not exist on 2 of the 3 servers, and I set it to "never" on the third one.

- - Sort By Name
  - Sort By Date
  - Ascending
  - Descending
  - Thumbnails
  - List
  - Download All

crash.log
31 kB
Oct 05 2015 12:43:34 PM UTC
full-crash.log
1.63 MB
Oct 05 2015 12:44:37 PM UTC
mapReduce-crash.log
21 kB
Oct 02 2015 09:48:45 PM UTC
mongodb-build-server.log
2.24 MB
Oct 09 2015 07:14:25 PM UTC
mongod-ldd.txt
2 kB
Oct 02 2015 09:13:46 PM UTC
primary-crash.log
61 kB
Oct 02 2015 05:52:37 PM UTC

Assignee:: Unassigned
Reporter:: Julien Durillon
Participants:: Julien Durillon, Ramon Fernandez Marina
Votes:: 0 Vote for this issue
Watchers:: 4 Start watching this issue

Created:: Oct 02 2015 05:52:37 PM UTC
Updated:: Jan 08 2024 03:22:58 PM UTC
Resolved:: Oct 12 2015 09:35:05 PM UTC

Details

Description

Attachments

Attachments

Activity

People

Dates