[SERVER-46192] mongodb crash with Got signal: 11 (Segmentation fault). Created: 14/Feb/20 Updated: 06/Dec/22 |
|
| Status: | Backlog |
| Project: | Core Server |
| Component/s: | WiredTiger |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Manan Shah | Assignee: | Backlog - Storage Engines Team |
| Resolution: | Unresolved | Votes: | 2 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||
| Assigned Teams: |
Storage Engines
|
||||||||
| Operating System: | ALL | ||||||||
| Participants: | |||||||||
| Description |
|
Hello We didn't get much clear resolution in
|
| Comments |
| Comment by Manan Shah [ 28/Mar/22 ] | |
|
Maybe setup a job to auto-start every few minutes if the instance pid is not running? | |
| Comment by Mahesh Birajdar [ 28/Mar/22 ] | |
|
@Manan or @Sanjeeth do you have a fix or workaround for this issue. | |
| Comment by Sanjeeth Mallesh [ 02/Sep/21 ] | |
|
We're encountering same error on `4.2.12` & `4.2.15` version Mongod. Signal 11 is raised and mongo service entering failed/crashed state. Segementation Fault is observed on all other related nodes having same mongod `4.2.12` & `4.2.15`
| |
| Comment by Manan Shah [ 21/Aug/20 ] | |
|
@alexander / @kelsey Can I know if you got a chance to look thru the diagnostic files I uploaded previously here? The issue is still plaguing once every 2-3 weeks on a few specific hosts of this replica set. I just want to know with evidence if this is peculiar to the hardware issue on these hosts or something else? For some reason, it hasn't occurred on three other replicas of this same set that are in two another data centers. The only difference is the read traffic is extremely light in those two data center replicas (1/1000th qps reads). | |
| Comment by Manan Shah [ 26/Apr/20 ] | |
|
While this does not happen every day, it is still a problem in production. This happened after 20 days again today. I hope you can get me some information from the diagnostic data files. Do you think re-syncing data from scratch can mitigate the problem? | |
| Comment by Manan Shah [ 06/Apr/20 ] | |
|
Alexander, this is still happening. Can you please provide us a clue or a work around? Is that anything found from the diagnostic data or other files I uploaded on Feb 16? | |
| Comment by Manan Shah [ 21/Feb/20 ] | |
|
I verified we are not using $sample in any of the application code. | |
| Comment by Alexander Gorrod [ 21/Feb/20 ] | |
|
manan@indeed.com I wonder - do you use $sample in your application? There is one known failure that could be related to this, but it's specific to cases where $sample (which use WiredTiger random cursors underneath) is used. | |
| Comment by Manan Shah [ 16/Feb/20 ] | |
|
Hi there, I've uploaded the requested files on the upload portal directly via curl. Please let me know if you notice anything is missing. | |
| Comment by Kelsey Schubert [ 14/Feb/20 ] | |
|
Thanks for opening a new ticket. To continue to investigate, would you please provide the following:
I've created an upload portal for you to use here. Thank again, | |
| Comment by Manan Shah [ 14/Feb/20 ] | |
|
cc @alexander Gorrod |