[SERVER-29152] Segfault in multiple shard primaries under regular load Created: 12/May/17 Updated: 30/Oct/23 Resolved: 30/May/17 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Networking |
| Affects Version/s: | 3.2.13 |
| Fix Version/s: | 3.2.14, 3.4.5, 3.5.9 |
| Type: | Bug | Priority: | Critical - P2 |
| Reporter: | Meni Livne | Assignee: | Samantha Ritter (Inactive) |
| Resolution: | Fixed | Votes: | 2 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Attachments: |
|
||||||||||||||||||||||||
| Issue Links: |
|
||||||||||||||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||||||||||||||
| Operating System: | ALL | ||||||||||||||||||||||||
| Backport Requested: |
v3.4, v3.2
|
||||||||||||||||||||||||
| Participants: | |||||||||||||||||||||||||
| Description |
|
Our database is divided into 4 shards, each having one primary, secondary and arbiter. Primaries are r4.2xlarge servers on AWS EC2, and secondaries are r4.xlarge. Our work load is intensive in both reads and writes, but these servers usually handle the load without a problem. However during their regular work, primaries of 3 of the 4 shards suddenly crashed, within a very short time of each other. We don't know what could have caused this. Attached are the logs of the segfaults from the primary servers. The one from shard1 seems different that the other two. |
| Comments |
| Comment by Githook User [ 30/May/17 ] |
|
Author: {u'username': u'samantharitter', u'name': u'samantharitter', u'email': u'samantha.ritter@10gen.com'}Message: |
| Comment by Samantha Ritter (Inactive) [ 26/May/17 ] |
|
It appears our hook did not catch the 3.2 commit, it's here: Author: samantharitter |
| Comment by Githook User [ 26/May/17 ] |
|
Author: {u'username': u'samantharitter', u'name': u'samantharitter', u'email': u'samantha.ritter@10gen.com'}Message: |
| Comment by Samantha Ritter (Inactive) [ 22/May/17 ] |
|
Hi Meni, I wanted to update you on the status of this bug. New logging code that was added by As to what actual event may have triggered the thread to exit here in your case, can you provide complete log files from these crashes? The stack traces you've linked have been very helpful, and it would also help us to see what the system was doing up until things went south. Thank you, |
| Comment by Meni Livne [ 13/May/17 ] |
|
We're using the mongodb-org-server packages for ubuntu from the official mongodb repositories. As far as we know these don't add any log rotation settings, and we haven't implemented any ourselves, and never noticed the log file being rotated. |
| Comment by Samantha Ritter (Inactive) [ 12/May/17 ] |
|
Hi there, Thanks for opening this ticket, I'm sorry you experienced these crashes. I'm looking into what might have happened on these servers. Given the stack traces, it's possible that we have a bug in our logging subsystem. Are you running with rotating log files? If so, is there any chance that these servers' log files were being rotated around the time the crash occurred? Thanks, |