[SERVER-37473] mongo can't start in kubernetes after pod restart Created: 04/Oct/18 Updated: 16/Oct/18 Resolved: 12/Oct/18 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Henrique Barcellos | Assignee: | Kelsey Schubert |
| Resolution: | Done | Votes: | 0 |
| Labels: | RF | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Attachments: |
|
||||||||
| Issue Links: |
|
||||||||
| Operating System: | ALL | ||||||||
| Participants: | |||||||||
| Description |
|
I use mongo 4.0.3 to start mongo in kubernetes with the following arguments:
the volume contains the old data, copy from other instance, but I got following error:
.strace file is in attachment
PersistentVolume is an NFS with the following flags:
|
| Comments |
| Comment by Henrique Barcellos [ 16/Oct/18 ] | |
|
What I posted is a temporary solution, since mongod starts and try to create lock files with unsupported flags on open(2) I really think this is a bug that should be fixed. mongod should run on any filesystem even covering support for those incompatible flags. | |
| Comment by Kelsey Schubert [ 12/Oct/18 ] | |
|
Thanks for the follow up, henrique.barcellos. Please note that the SERVER project is for reporting bugs or feature suggestions for the MongoDB server. For MongoDB-related support discussion please post on the mongodb-user group or Stack Overflow with the mongodb tag. Additional discussion like this would be best posted on the mongodb-users group. Kind regards, | |
| Comment by Henrique Barcellos [ 09/Oct/18 ] | |
|
I have increased the spec.template.specterminationGracePeriodSeconds to 120 seconds and it performs graceful restart but the error still occurs, so what I did to solve the problem is: lifecycle: | |
| Comment by Nick Brewer [ 08/Oct/18 ] | |
|
henrique.barcellos After looking at the strace information and the other details provided in this ticket, we do not believe this is a bug on the MongoDB end - the error you're encountering simply indicates that a requested file is not accessible. I still suspect that this is a problem related to unexpected shutdown and subsequent reuse of the same dbpath, in particular due to the fact that this works once the lock file is deleted. -Nick | |
| Comment by Henrique Barcellos [ 08/Oct/18 ] | |
|
Yes, I just delete the pod and k8s recreates-it. When the pod is started I got the error. I'm using kubectl delete to delete the pod. | |
| Comment by Nick Brewer [ 05/Oct/18 ] | |
|
henrique.barcellos I suspect that the mongod is not being shut down correctly, which is leaving the dbpath in a state that is causing these errors - can you elaborate the specific method (kubectl delete, for example) that you're using to delete the pod? Thanks, | |
| Comment by Henrique Barcellos [ 05/Oct/18 ] | |
|
I have tried to use the following flags on NFS: bg,nolock,noatime Still no success. With this flags it mounts NFS PersistentVolume with this options:
I just created a StatefulSet, and the first time it works because there is no locks, when I delete the pod, k8s re-creates the pod using the same nfs volume and it fails with the error that I put on this issue description.
| |
| Comment by Nick Brewer [ 04/Oct/18 ] | |
|
henrique.barcellos Sorry, I overlooked that line previously - it does appear that the mount command is using different fstab settings (rw, relatime) than what we recommend in our production notes: bg, nolock, and noatime. However I believe the use of local_lock=none covers the nolock requirement. I'm glad to hear you were able to get it working. Was copied from another mongod while that instance was running? Thanks, | |
| Comment by Henrique Barcellos [ 04/Oct/18 ] | |
|
Deleting WiredTiger and WiredTiger.lock and re-creating the pod makes mongod start successfully. Maybe it is a permission problem with this files or NFS flag that is incompatible with some parameter used in open(2). | |
| Comment by Henrique Barcellos [ 04/Oct/18 ] | |
|
The version is the same. Since the persistent volume is managed by kubernetes deployment (nfs-server-provisioner), there is nothing in fstab about this NFS. I can provide the following info from mount command (mount | grep pvc-ff3):
I've attached WiredTiger.wt and WiredTiger.turtle. | |
| Comment by Nick Brewer [ 04/Oct/18 ] | |
|
henrique.barcellos Thanks for your report. I'd like to confirm a few details:
Thanks, |