[SERVER-35267] db on mounted GlusterFS remote filesystem causes kernel panic (task stuck for 120 seconds) during mongod shutdown Created: 29/May/18 Updated: 23/Jul/18 Resolved: 21/Jun/18 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | 3.6.5 |
| Fix Version/s: | None |
| Type: | Question | Priority: | Major - P3 |
| Reporter: | Ferhat Savc? | Assignee: | Ramon Fernandez Marina |
| Resolution: | Incomplete | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: |
Kubernetes 1.9.5 |
||
| Participants: |
| Description |
|
Running a mongo pod on Kubernetes (6 CentOS 7.4 nodes) with preStop command ["mongod", "–shutdown"] and the db volume is mounted on a remote GlusterFS volume (with a subdir-mount option) using a PersistentVolume and a PV-Claim for the mongo deployment. The Gluster volume is striped onto 3 hosts without replicas and is mounted using the user-space (FUSE) client. All the nodes and the gluster hosts are setup to NTP their time from the same time servers. When the remote mount is empty, mongo starts, populates the directory, operates without errors and shuts down without errors. On second start, mongo will start without errors and operates without errors. On shutdown, mongo will log the event that WiredTiger is shutting down and after two minutes the host's kernel will panic with task mongod stuck for more than 120 seconds. 3.6.0 using MMAPv1 works as intended. I previously had problems using GlusterFS volumes with replication for Elasticsearch storage (regarding the creation and access times on lock files), but none with striped volumes. |
| Comments |
| Comment by Kelsey Schubert [ 21/Jun/18 ] | ||||||||||||||||||||
|
Hi ferhat.savci, We haven’t heard back from you for some time, so I’m going to mark this ticket as resolved. If this is still an issue for you, please provide additional information and we will reopen the ticket. Regards, | ||||||||||||||||||||
| Comment by Ramon Fernandez Marina [ 07/Jun/18 ] | ||||||||||||||||||||
|
ferhat.savci, there should not be any difference on the MongoDB side between the two shutdown events you mention, so this points to an issue with the underlying storage layer. Can you please collect gdb stack traces during the shutdown process to see if they provide any useful information? I'm attaching a shell script below that collects top, iostat and gdb stack traces – run it at the same time you initiate the shutdown on your mongod. Thanks,
|