[SERVER-23615] Mongod crashes with a signal 11 (Segmentation fault) Created: 08/Apr/16 Updated: 06/Dec/22 Resolved: 19/Jul/16 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Querying |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | shiyingwang | Assignee: | Backlog - Query Team (Inactive) |
| Resolution: | Incomplete | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Attachments: |
|
| Assigned Teams: |
Query
|
| Operating System: | ALL |
| Sprint: | Query 14 (05/13/16) |
| Participants: |
| Description |
|
| Comments |
| Comment by Ian Whalen (Inactive) [ 18/Jul/16 ] | ||
|
Given the lack of activity on this issue we're going to have to resolve as Gone Away, but please do feel free to re-open with the additional requested logs and information if you'd like us to continue this investigation. | ||
| Comment by Kelsey Schubert [ 21/Jun/16 ] | ||
|
Hi shiyingwang, We still need logs with a higher level of verbosity to diagnose the problem. If this is still an issue for you, would you please clarify how frequently this issue occurs and increase the logging level per Max's suggestion so we can continue to investigate? Thank you, | ||
| Comment by Max Hirschhorn [ 02/Jun/16 ] | ||
|
Hi shiyingwang, The entire query team spent an hour investigating the behavior of the update subsystem in the areas surrounding those from the reported backtrace. We don't have much more information regarding what the underlying cause of the memory corruption issue is at this time. However, to Rassi's earlier comment
we've realized that since the mmapv1 storage engine is being used, the BSONObj representing the document is should be "unowned". This means that the memory for the object is backed by the memory-mapped data file and not a separate copy of that data. This is peculiar because segfaulting when destructing the SharedBuffer underlying the BSONObj would suggest that the object was in fact an owned object. We have yet to determine whether it's possible for a serialized element not at the root to be owned. Additionally, based on where we are within the UpdateStage::transformAndUpdate() function that an in-place update to the document isn't being performed. This means that the update subsystem will be building up a new in-memory representation of the document instead of modifying bytes within the existing object. We have two theories about what is causing the segmentation fault:
From the logs you've uploaded, we see that you've experience this same segmentation fault multiple times - with occurrences on March 25th, April 2nd, and April 8th.
If not, then we'd like to request that you increase the verbosity of your mongod to log-level 1 and the log-level for commands to 3. You can do so by running the following commands in the mongo shell:
This way the mongod process will log what update operation is about to be performed prior to actually reading and modifying the document (and thus potentially triggering the segfault). Once we determine the offending document and update operation, we'll have a much better chance of determining what the issue in the code is. We appreciate your patience. Thanks, | ||
| Comment by Ramon Fernandez Marina [ 14/Apr/16 ] | ||
|
Thanks for the additional information shiyingwang, this ticket is being investigated by the Query team. | ||
| Comment by shiyingwang [ 14/Apr/16 ] | ||
|
storage engine is MMAPv1 | ||
| Comment by shiyingwang [ 14/Apr/16 ] | ||
|
the complete logs | ||
| Comment by J Rassi [ 08/Apr/16 ] | ||
|
It looks like memory corruption is being encountered during a user update here, when serializing the updated document BSONObj from the corresponding MutableDocument. I believe that MutableDocument is holding on to a copy of the update pre-image BSONObj, and my working theory is that something happens to this BSONObj before the MutableDocument is serialized. In particular, I think that the serialization process makes a copy of this BSONObj here (which happens to be owned, in this case), and then segfaults when trying to decrement the refcount for this BSONObj's shared buffer when it goes out of scope here. shiyingwang: please provide the above requested information when you get a chance. It will be particularly helpful for the debugging process. | ||
| Comment by Kelsey Schubert [ 08/Apr/16 ] | ||
|
shiyingwang, in addition, can you please answer the following questions:
Thank you, | ||
| Comment by Kelsey Schubert [ 08/Apr/16 ] | ||
|
Hi shiyingwang, We are investigating this issue. Can you please clarify which storage engine you are using, MMAPv1 or WiredTiger? Thank you, |