[SERVER-21540] Cannot obtain a clean shutdown on a device mapper device Created: 18/Nov/15 Updated: 09/Jun/16 Resolved: 10/Feb/16 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | WiredTiger |
| Affects Version/s: | 3.0.7 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Michele Franceschini | Assignee: | Kelsey Schubert |
| Resolution: | Incomplete | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Operating System: | ALL |
| Steps To Reproduce: | Setup a mongo cluster with a replica set. In one machine encrypt with DM a disk, prepare for snapshotting also with DM, format ext4, mount and run the mongo RS member on it. |
| Participants: |
| Description |
|
We have a replicated mongo cluster (no sharding). On a member we have mongo running on a snapshot target device which is based on a DM encrypted device. The snapshot procedure consists of shutting down mongo with the proper command, followed by stopping and removal of the container. In this stage the mongo.lock is empty but wired tiger lock is not. Restarting the member signals a file corruption but proceeds (I'm assuming because it's an RS). If I take the same snapshot and run a mongo standalone on it it will fail signaling that there are corrupted files (including often wiredtiger.wt). |
| Comments |
| Comment by Ramon Fernandez Marina [ 09/Jan/16 ] |
|
michele, are you still seeing this behavior on your end? If this is still an issue for you, can you please follow up with the information requested by Thomas above for further investigation? Thanks, |
| Comment by Kelsey Schubert [ 16/Dec/15 ] |
|
Hi michele, I haven't been able to reproduce an unclean shutdown on my local machine yet. To continue to investigate, can you please upload the logs including the shutdown command and subsequent restart? Were you able to look into reproducing this behavior without encryption? Thank you for your help, |
| Comment by Michele Franceschini [ 01/Dec/15 ] |
|
Hi Thomas, 1) I'm not using LVM, just DM. The snapshot ready device is formatted with ext4, which afaik supports fsync. There are a bunch of threads out there on fsync being tricky to use e.g.: |
| Comment by Kelsey Schubert [ 24/Nov/15 ] |
|
Hi michele, Thank you for the report. Can you please ensure that the file system supports fsync() on directories? As noted here, this operation must be supported. If that is not the cause of the problem we will need a few more details to continue to investigate this behavior.
Kind regards, |