[SERVER-29318] Crash node with Segmentation fault error Created: 22/May/17  Updated: 12/Jul/17  Resolved: 16/Jun/17

Status: Closed
Project: Core Server
Component/s: WiredTiger
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Emmanuel Guidez Assignee: Kelsey Schubert
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: File diagnostic_data.tar.gz     Text File ompd1rep1.log     HTML File sa22    
Operating System: ALL
Participants:

 Description   

On replicaSet environments we have sometimes a crash standby node with the following error :
2017-05-22T14:04:18.973+0200 F - [conn140242] Invalid access at address: 0x3cabb4904251
2017-05-22T14:04:19.286+0200 F - [conn140242] Got signal: 11 (Segmentation fault).

Exemple
Replicaset with 3 nodes, Proxmox virtual server with 2 CPU and 6Go Ram.
Version 3.2.12, engine WT
We had this error today on recette environnement at 14:04.
I enclosed in thise ticket :

  • the node logfile,
  • the diagnostic data
  • sar file of the day where you will find the metrics system.


 Comments   
Comment by Kelsey Schubert [ 16/Jun/17 ]

Thank you for the update, eguidez@voyages-sncf.com.

Comment by Emmanuel Guidez [ 16/Jun/17 ]

Hi Thomas,

We have a problem of corruption datas with proxmox VM and our SAN storage.
This pb occured also with the proxmox VM MySQL.
So is not a Mongo issue.

We can close the ticket.

Emmanuel

Comment by Kelsey Schubert [ 09/Jun/17 ]

Hi eguidez@voyages-sncf.com,

We still need additional information to diagnose the problem. If this is still an issue for you, would you please provide the mongod logs, diagnostic.data, and syslogs when the segfault occurred?

Thank you,
Thomas

Comment by Alexander Gorrod [ 22/May/17 ]

On replicaSet environments we have sometimes a crash standby node with the following error

Have you have seen a similar error in the past as well, or is the only instance? We haven't seen any errors with similar characteristics before, so if you have seen other instances it would be very valuable to get the same set of logs and diagnostic information from them. Was there a core file

The only thing I can see that could cause a failure around the particular code is a memory allocation failure, though I'd expect to see different symptoms from a memory allocation failure. Could you review the content of the system log, and let us know if there were any issues logged around the time of the segfault.

Comment by Kelsey Schubert [ 22/May/17 ]

Hi eguidez@voyages-sncf.com,

Thank you for the detailed bug report. We're investigating and will update this ticket when we know more.

Kind regards,
Thomas

Generated at Thu Feb 08 04:20:29 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.