[SERVER-12257] improve mongod behavior on storage write errors during msync Created: 06/Jan/14 Updated: 21/Sep/17 Resolved: 09/Jan/14 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Storage |
| Affects Version/s: | None |
| Fix Version/s: | 2.5.5 |
| Type: | Bug | Priority: | Critical - P2 |
| Reporter: | Bruce Lucas (Inactive) | Assignee: | Eliot Horowitz (Inactive) |
| Resolution: | Done | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||
| Backwards Compatibility: | Major Change | ||||||||
| Operating System: | ALL | ||||||||
| Participants: | |||||||||
| Case: | (copied to CRM) | ||||||||
| Description |
|
On Linux, storage errors, specifically failed writes, can result in unnecessary and (mostly) unsignaled data loss and database corruption. This is due to a combination of two factors:
The net result is that after a storage write error a customer may continue operating for some time with data loss and a corrupted database without being aware of it, until the data in question is next acccessed, at which point they may see an error, typically a bson object size error. Possible mongod improvements:
KERNEL BEHAVIOR ON FAILED WRITES The following behavior was observed by doing SCSI error injects using systemtap on Linux (CentOS 6.5, 2.6.32-431.el6.centos.plus.x86_64 kernel).
Observed that
On the other hand, if the C program is modified to re-dirty the pages before the second msync by rewriting them, the second msync triggers a new attempt to write the previously failed pages, and examining the data for the initially failed pages does make it to disk on the second attempt. (Similar behavior was also reproduced by using mongod and two fsync db commands, observing msync error log message for first but not second fsync, no reattempt of failed page writes, and observing corrupted db file content by inspection.) LOGGING Mongod log message is as follows (can be on different threads depending on how msync is triggered):
Suggest improving, at least in non-recoverable case, to indicate likely data loss and database corruption due to storage errors, and need for corrective action. Kernel logs messages such as the following (details depend on type of error) to syslog and dmesg:
|
| Comments |
| Comment by Githook User [ 09/Jan/14 ] |
|
Author: {u'username': u'erh', u'name': u'Eliot Horowitz', u'email': u'eliot@10gen.com'}Message: |
| Comment by Eric Milkie [ 09/Jan/14 ] |
|
I think the closeall.js test is now failing because there is a race between unmapping the region and msyncing the region, so msync() is returning an error. |
| Comment by Githook User [ 09/Jan/14 ] |
|
Author: {u'username': u'erh', u'name': u'Eliot Horowitz', u'email': u'eliot@10gen.com'}Message: |