[SERVER-52926] [3.6] Mongo db crash with Got signal: 11 - KVDatabaseCatalogEntryBase::AddCollectionChange::rollback Created: 18/Nov/20 Updated: 16/Oct/21 Resolved: 04/Dec/20 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Catalog |
| Affects Version/s: | 3.6.10 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Chao Xu | Assignee: | Benety Goh |
| Resolution: | Duplicate | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Attachments: |
|
||||||||||||||||||||
| Issue Links: |
|
||||||||||||||||||||
| Operating System: | ALL | ||||||||||||||||||||
| Sprint: | Execution Team 2020-12-14 | ||||||||||||||||||||
| Participants: | |||||||||||||||||||||
| Description |
|
MongoDB shell version v3.6.10
|
| Comments |
| Comment by apocarteres [ 10/May/21 ] |
|
btw also affects 3.6.18 |
| Comment by Benety Goh [ 04/Dec/20 ] |
|
Symptoms match those reported in |
| Comment by Chao Xu [ 20/Nov/20 ] |
|
Hi Dima, I'm so sorry about this case, My colleague’s handling at that time only kept the situation at that time not try to restart that node. but I mistakenly thought they restarted and this error log was the result after restart. so you already explained this crash root cause. I restarted the node in this morning. I think it's not a bug. we could close this case. Thanks again. have a good life. Chao |
| Comment by Chao Xu [ 19/Nov/20 ] |
|
Hi Dimtry, Thanks a lot. but that's so wired, I don’t think we have such a huge amount of data. so how did it happened? Unfortunately, a few hours ago, another node of this cluster crashed too. so how to get this cluster back to work is my priority. could you give me some solutions? thanks again. mongodb.log Have a good day. |
| Comment by Dmitry Agranat [ 19/Nov/20 ] |
|
xuchao528610@gmail.com I will keep looking at the potential cause of the segmentation fault. Can you upload the full mongod log covering the time of the reported event to the same secure location? |
| Comment by Dmitry Agranat [ 19/Nov/20 ] |
|
I think this is a rare circumstance where the Segmentation fault you are reporting might just be a symptom and not the cause. Looking at your cluster, you have PSA deployment, with read concern majority = true and with the Secondary member being in recovery for the last 70 days. This creates an enormous cache pressure on the Primary which is barely operational with almost 100% cache full. Another indication that the system is struggling is the amount of cache overflow table entries which is 6 billion. I do not believe the system under such extreme conditions is supposed to operate w/o issues, you of which you have experienced. For this issue and the overall sizing and tuning of your cluster, we'd like to encourage you to start by asking our community for help by posting on the MongoDB Developer Community Forums. Thanks, |
| Comment by Chao Xu [ 19/Nov/20 ] |
|
@Dmitry Agranat Hi Dmitry, Thanks for your helping. I upload two files (metrics.2020-11-17T21-49-16Z-00000, metrics.interim). maybe will help you to investigate. Thanks, Chao |
| Comment by Dmitry Agranat [ 18/Nov/20 ] |
|
I think I understand what's going on here but to validate my theory we'll also need archived diagnostic.data located under the dbpath. You can upload it into this secure uploader. Thanks, |