[SERVER-18640] Wiredtiger does not recover from unclean shutdown Created: 23/May/15 Updated: 26/Aug/15 Resolved: 29/May/15 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | WiredTiger |
| Affects Version/s: | 3.0.3 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Critical - P2 |
| Reporter: | Dharshan Rangegowda | Assignee: | Ramon Fernandez Marina |
| Resolution: | Done | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Attachments: |
|
||||||||
| Issue Links: |
|
||||||||
| Operating System: | ALL | ||||||||
| Participants: | |||||||||
| Description |
|
My server was shut down uncleanly. Now the database does not start
|
| Comments |
| Comment by Ramon Fernandez Marina [ 26/Aug/15 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
dixenon, currently there's no automated way to recover from this scenario. We've created Regards, | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Denys [ 02/Aug/15 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Hi Ramon Despite the reason why files are missing is there any way to restore other databases from the data directory? Thanks, | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Ramon Fernandez Marina [ 29/May/15 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
dharshanr@scalegrid.net, as per Michael's explanation above we believe this issue is not a bug in MongoDB, but something related to the storage subsystem and how fsync() calls are handled. If the virtualized storage does not properly implement SCSI commands or flushes things properly that could easily explain why some files are missing despite WiredTiger calling fsync() on file creation. I'm going to resolve this issue, but if you have some additional information that points to a problem in MongoDB please comment back and we can re-open the ticket. Regards, | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Michael Cahill (Inactive) [ 29/May/15 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Hi Darshan, WiredTiger calls fsync on the file when it is first created, plus fsync on the directory just after the file is created. If the filesystem and storage system are working properly, these calls should guarantee that the files will exist with a valid header after a crash. Repair does not automatically drop collections if the underlying files are missing, so in this case there is not much repair can do because the expected files don't exist. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Dharshan Rangegowda [ 28/May/15 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Hi Michael, It is ext4 running on a RAID 0. Fairly standard options. I will provide more details soon When is the fysnc call made? does it make it immediatly or when there is memory pressure on the instance? Also why was the repair not able to recover the remaining data? | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Michael Cahill (Inactive) [ 28/May/15 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Hi dharshanr@scalegrid.net, can you please tell us more about the filesystem you are using, including mount options? At the moment, it looks to us as if either the fsync calls that WiredTiger makes are not being respected by the filesystem across a crash and restart, or that some files were somehow removed after they were created. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Dharshan Rangegowda [ 26/May/15 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Hi Ramon, The mongod process is the only one that accesses this directory -so there was no restore. So you mean the databases recorded the presence of the file but did not complete the write operation? Is there a way to force mongod with wiredtiger to flush fully to disk? The only thing that happened was that the servers were stopped from outside. The server is a VM in Azure and the disk is a RAID0 disk with two underlying disks. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Ramon Fernandez Marina [ 26/May/15 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
The error message above says that file collection-2--5585092568808516308.wt, which is mentioned in the WiredTiger metadata, is not present in your data path – which seems to be confirmed by the fact that the file is not present in the path you sent. Since WiredTiger fsync's the database directory when files are created, this scenario can be triggered when the storage layer is not providing enough durability guarantees (i.e.: the fsync is ignored and MongoDB crashes before the file is written). Another possibility restore operations that affected the files in dbpath. Can you please provide more information as to the type and configuration of the storage layer and clarify whether some other process was accessing this dbpath? | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Ramon Fernandez Marina [ 25/May/15 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Hi dharshanr@scalegrid.net, thanks for opening this ticket and uploading the data files. I think this could be a duplicate of | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Dharshan Rangegowda [ 23/May/15 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
I tried to do a repair and it failed as well
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Dharshan Rangegowda [ 23/May/15 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
A zip of the data files is attached. |