[SERVER-21540] Cannot obtain a clean shutdown on a device mapper device Created: 18/Nov/15  Updated: 09/Jun/16  Resolved: 10/Feb/16

Status: Closed
Project: Core Server
Component/s: WiredTiger
Affects Version/s: 3.0.7
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Michele Franceschini Assignee: Kelsey Schubert
Resolution: Incomplete Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Operating System: ALL
Steps To Reproduce:

Setup a mongo cluster with a replica set. In one machine encrypt with DM a disk, prepare for snapshotting also with DM, format ext4, mount and run the mongo RS member on it.
Shutdown mongo and restart standalone.

Participants:

 Description   

We have a replicated mongo cluster (no sharding). On a member we have mongo running on a snapshot target device which is based on a DM encrypted device.
System: x86 server with Ubuntu 14.04 bare metal, mongo running inside a docker container, docker 1.9, mongodb 3.0.7 (3.0.4 shows the same issue), the mongo dbpath is bind mounted on the above mentioned snapshotting device (formatted with ext4).

The snapshot procedure consists of shutting down mongo with the proper command, followed by stopping and removal of the container.

In this stage the mongo.lock is empty but wired tiger lock is not. Restarting the member signals a file corruption but proceeds (I'm assuming because it's an RS). If I take the same snapshot and run a mongo standalone on it it will fail signaling that there are corrupted files (including often wiredtiger.wt).



 Comments   
Comment by Ramon Fernandez Marina [ 09/Jan/16 ]

michele, are you still seeing this behavior on your end? If this is still an issue for you, can you please follow up with the information requested by Thomas above for further investigation?

Thanks,
Ramón.

Comment by Kelsey Schubert [ 16/Dec/15 ]

Hi michele,

I haven't been able to reproduce an unclean shutdown on my local machine yet.

To continue to investigate, can you please upload the logs including the shutdown command and subsequent restart?

Were you able to look into reproducing this behavior without encryption?

Thank you for your help,
Thomas

Comment by Michele Franceschini [ 01/Dec/15 ]

Hi Thomas,

1) I'm not using LVM, just DM. The snapshot ready device is formatted with ext4, which afaik supports fsync. There are a bunch of threads out there on fsync being tricky to use e.g.:
https://lwn.net/Articles/322823/
2)
sudo docker exec -i <container name> mongo -u <username> -p <password> --ssl --sslAllowInvalidCertificates admin --eval "db.getSiblingDB('admin').shutdownServer()"
sleep 10
sudo docker stop <container name>
sudo docker rm <container name>
3) I'll work on it
4) I have not tried. I will look into it. I would say that no, on a non DM encrypted device my experience is that the system starts fine (but this was in a non sharded env)
5) it's a docker run with net=host and mongo is run in a script protected by supervisord. Please let me know if you need more info

Comment by Kelsey Schubert [ 24/Nov/15 ]

Hi michele,

Thank you for the report. Can you please ensure that the file system supports fsync() on directories? As noted here, this operation must be supported.

If that is not the cause of the problem we will need a few more details to continue to investigate this behavior.

  1. I assume you are using LVM, can you confirm?
  2. What specific command are you using to shut down the mongod instance and the container?
  3. Can you provide the logs starting when you initiate the shutdown command as well as the log when you restart the node that includes the error message?
  4. Does this behavior appear without encryption
  5. Can you provide the invocation that you used to start the docker container? Also, if you can provide an example of how you created the docker container?

Kind regards,
Thomas

Generated at Thu Feb 08 03:57:39 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.