[SERVER-19573] MongoDb crash due to segfault Created: 24/Jul/15 Updated: 24/Aug/15 Resolved: 12/Aug/15 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Admin, WiredTiger |
| Affects Version/s: | 3.0.4 |
| Fix Version/s: | 3.0.6, 3.1.7 |
| Type: | Bug | Priority: | Critical - P2 |
| Reporter: | Shankar Karuppiah | Assignee: | Ramon Fernandez Marina |
| Resolution: | Done | Votes: | 0 |
| Labels: | RF | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Attachments: |
|
||||||||
| Issue Links: |
|
||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||
| Operating System: | Linux | ||||||||
| Backport Completed: | |||||||||
| Participants: | |||||||||
| Description |
|
MongoDb crash due to segfault
|
| Comments |
| Comment by Githook User [ 06/Aug/15 ] | |||||||||||||||||||||||||||||||||||||||||||
|
Author: {u'username': u'michaelcahill', u'name': u'Michael Cahill', u'email': u'michael.cahill@mongodb.com'}Message: Merge pull request #2094 from wiredtiger/row-insert-nolock | |||||||||||||||||||||||||||||||||||||||||||
| Comment by Githook User [ 04/Aug/15 ] | |||||||||||||||||||||||||||||||||||||||||||
|
Author: {u'username': u'michaelcahill', u'name': u'Michael Cahill', u'email': u'michael.cahill@mongodb.com'}Message: Merge pull request #2094 from wiredtiger/row-insert-nolock
| |||||||||||||||||||||||||||||||||||||||||||
| Comment by Githook User [ 04/Aug/15 ] | |||||||||||||||||||||||||||||||||||||||||||
|
Author: {u'username': u'michaelcahill', u'name': u'Michael Cahill', u'email': u'michael.cahill@mongodb.com'}Message: Also switch to always using an atomic operation to swap inserts into the skiplist. This fixes a potential race where readers may see a partially initialized WT_INSERT object, because the spinlock we used in the past for all inserts did not imply a write barrier. | |||||||||||||||||||||||||||||||||||||||||||
| Comment by Ramon Fernandez Marina [ 30/Jul/15 ] | |||||||||||||||||||||||||||||||||||||||||||
|
shankar.k, thank you for uploading the log file, unfortunately it doesn't contain information about the segfault. If you manage to reproduce the problem and can send logs those may help understand what's happening. In the mean time we're working on a theory on our end, please continue to watch the ticket for updates. Thanks, | |||||||||||||||||||||||||||||||||||||||||||
| Comment by Shankar Karuppiah [ 28/Jul/15 ] | |||||||||||||||||||||||||||||||||||||||||||
|
Hello Ramon, I have upload the log for that day, please confirm if you have received it. About the operations, based on the log for the connection that caused the crash, it came from one of our batch processors. Schema for the collection:
The collection has one index, { "sdt.eo": -1, "apid": 1 }besides the default index on _id Documents are added to the collections in batch of 50. Then each batch process will fetch one document at a time from the collection using following command
Each document goes through steps of processing, after each step, the state is update using the following command
Once all the processing step is completed the document is removed, using following command
On average each document will be processed within 300ms, and 100 documents are processed per second at peak Side note : | |||||||||||||||||||||||||||||||||||||||||||
| Comment by Ramon Fernandez Marina [ 28/Jul/15 ] | |||||||||||||||||||||||||||||||||||||||||||
|
shankar.k, after further examination of the stack trace you sent we haven't been able to find the root cause of this issue, so we'll need to try to reproduce it on our end. Are you able to provide more details of the kind of operations you were running on this node? Can you perhaps share the application code or mongo shell scripts you were using when you triggered this error? Thanks, | |||||||||||||||||||||||||||||||||||||||||||
| Comment by Ramon Fernandez Marina [ 28/Jul/15 ] | |||||||||||||||||||||||||||||||||||||||||||
|
shankar.k, I've created an upload portal for the log file. Have you been able to reproduce the segfault by any chance? We're still looking at the stack trace, but if you happen to have found a reliable way to trigger this problem that would speed up the analysis. Thanks, | |||||||||||||||||||||||||||||||||||||||||||
| Comment by Shankar Karuppiah [ 24/Jul/15 ] | |||||||||||||||||||||||||||||||||||||||||||
|
With this node, we went directly from 2.6.8 to 3.0.4. The log file is pretty big, ~1GB, what would be the best way to give it to you guys ? if you guys need it. Thank you, | |||||||||||||||||||||||||||||||||||||||||||
| Comment by Ramon Fernandez Marina [ 24/Jul/15 ] | |||||||||||||||||||||||||||||||||||||||||||
|
Thanks for the additional information shankar.k. Apologies if I wasn't clear about the "data from other versions" question: when migrated from 2.6.8, was that migration to 3.0.4 directly or to via some other version? The dump for the connection shows some operations with large writeConflicts numbers, but that should not be an issue I believe. We'll keep looking, but may need full logs for the affected node – we'll let you know. Thanks, | |||||||||||||||||||||||||||||||||||||||||||
| Comment by Shankar Karuppiah [ 24/Jul/15 ] | |||||||||||||||||||||||||||||||||||||||||||
|
Hi Ramon, Details on setup, we running 5 member replica set (1 hidden member and 1 arbiter). Hosted on google compute engine, n1-standard-16 instance type, running debian wheezy The particular connection was running findAndModify command before the segfault. I have attached part of log dump for that specific connection. This instance did not contain any data from previous version, when we migrated from 2.6.8, we used the initial sync method to copy the data from another replica set member. Thank you for looking into this issue. | |||||||||||||||||||||||||||||||||||||||||||
| Comment by Shankar Karuppiah [ 24/Jul/15 ] | |||||||||||||||||||||||||||||||||||||||||||
|
startup.txt - Startup log snippet | |||||||||||||||||||||||||||||||||||||||||||
| Comment by Ramon Fernandez Marina [ 24/Jul/15 ] | |||||||||||||||||||||||||||||||||||||||||||
|
shankar.k, in addition to the information requested above, can you please specify whether this instance contains any data created with earlier versions of MongoDB or is it a new 3.0.4 installation? Thanks, | |||||||||||||||||||||||||||||||||||||||||||
| Comment by Ramon Fernandez Marina [ 24/Jul/15 ] | |||||||||||||||||||||||||||||||||||||||||||
|
Thanks for your report shankar.k. Can you please send up the full logs since the last restart? I'm looking for startup options. Also, can you provide any details on the kind of setup you have and what operations were you performed when you saw this segfault? Thanks, |