[SERVER-28084] WT_PANIC: WiredTiger library panic during bulk update Created: 23/Feb/17 Updated: 27/Jul/18 Resolved: 06/Mar/17 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | WiredTiger |
| Affects Version/s: | 3.4.2 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Henri-Maxime Ducoulombier | Assignee: | Mark Agarunov |
| Resolution: | Done | Votes: | 0 |
| Labels: | envns, rns, wtc | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: |
Ubuntu Server 16.04 / LUKS filesystem |
||
| Attachments: |
|
| Operating System: | Linux |
| Participants: |
| Description |
|
When doing a lot of of update operations, mongod processes randomly crash (not always, not often). This is a development shard server (with core sharding) :
I suspect LUKS to be the source of the problem. The running updates are using $addToSet / $elemMatch. I attached the error information and backtrace, let me know if you need more info. |
| Comments |
| Comment by Henri-Maxime Ducoulombier [ 28/Mar/17 ] | |
|
Little follow up about this issue. We tweaked the server and downgraded the RAM frequency (it's 2400Mhz DDR3 RAM) because the Motherboard considers 1800+ frequency as overclocking, and it's working fine so far (a complete memtest helped us identify instability in the RAM). Thanks again for you time spent on investigating the issue. Henri-Maxime | |
| Comment by Henri-Maxime Ducoulombier [ 06/Mar/17 ] | |
|
Hello @Mark Agarunov, Thanks for investigating. I'm hoping to be able to build a new development server with more hardware (and not 3 volumes per disk) and to build the LUKS encrypted volumes over xfs and not ext4 filesystem this time (does this issue happen with not encrypted ext4 too ?). Can't do that right now, we would need to invest both in hardware and time to migrate the datas, but this should happen in 2017. If the problem still occurs then, I'll post a new bug report. I've been looking of the web for info on LUKS and possible problems like that, but it seems that there are not so many people building systems like that, and literature is quite thin. Last but not least, the problem occurred today while only reading data, whereas it was occurring with massive updates/write before, this is why I thought it was important to post the logs this time. Thx | |
| Comment by Mark Agarunov [ 06/Mar/17 ] | |
|
Hello hmducoulombier@marketing1by1.com, Thank you for providing these logs. Looking over the output, it appears that the corruption is outside of Mongodb. Errors of this type frequently indicate an underlying hardware/storage layer issue. The error messages indicate that the data has changed between when it was written by Mongodb and when it was read:
You are correct that a repair may not be needed with journaling enabled to get MongoDB running again, it may be the case there is a previously corrupted portion of the database, and the errors only occur when it is accessed, not on startup. A repair may fix this issue. Thanks, | |
| Comment by Henri-Maxime Ducoulombier [ 06/Mar/17 ] | |
|
Here is a recent backtrace (just happened now) while doing a "simple" aggregate on datas (same environment, 3 shards reading at the same time on the same disk) Hope it helps. | |
| Comment by Henri-Maxime Ducoulombier [ 23/Feb/17 ] | |
|
Another backtrace of another (yet similiar) issue. | |
| Comment by Henri-Maxime Ducoulombier [ 23/Feb/17 ] | |
|
Hello Mark, Here are the answsers to your questions :
I'll post the complete log as soon as I can. Thanks for investigating this issue. | |
| Comment by Mark Agarunov [ 23/Feb/17 ] | |
|
Hello hmducoulombier@marketing1by1.com, Thank you for the report. After looking over the output you've provided, I have a few questions and requests so that we can better investigate this:
Additionally, please send the complete logs from any affected mongod instances by this behavior. Thanks, |