[SERVER-30709] mongo can't start in kubernetes when start from last clean shutdown Created: 17/Aug/17 Updated: 04/Oct/18 Resolved: 07/Sep/17 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Jennings | Assignee: | Mark Agarunov |
| Resolution: | Cannot Reproduce | Votes: | 0 |
| Labels: | RF | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Attachments: |
|
||||||||
| Issue Links: |
|
||||||||
| Operating System: | ALL | ||||||||
| Participants: | |||||||||
| Description |
|
I use mongo:3.4 to start mongo in kubernetes with arguments the volumes contains the old data, copy from other instance (which is clean shutdown), but still got following error:
and the content of WiredTiger.turtle are:
|
| Comments |
| Comment by Henrique Barcellos [ 04/Oct/18 ] | |||||||||||||||||||
|
Hello, I have the same problem running mongo on k8s as StatefulSet with NFS PersistentVolume and I can provide the .strace file: mongod.SERVER-30709.strace | |||||||||||||||||||
| Comment by Mark Agarunov [ 07/Sep/17 ] | |||||||||||||||||||
|
Hello jenningsloy318, Thank you for the additional information. Unfortunately without the strace I can't proceed with a diagnoses of this issue. However, if adding bg,nolock,noatime to the mount options helps this issue, this is a good indicator that the underlying cause of this issue is outside of MongoDB. As we cannot diagnose or reproduce this issue, I've closed this ticket as "Cannot Reproduce" for the time being. If any additional information comes to light, please let me know and we will continue investigating this. Thanks, | |||||||||||||||||||
| Comment by Jennings [ 22/Aug/17 ] | |||||||||||||||||||
|
Hi Mark Agarunov, since I am running mongodb in pod, it is not convinent to use strace insdie pod. but after I add "bg,nolock,noatime" to the nfs mount option, things got better. I can restart/recreate mongo via deployment/replicaset. but statefulset is still not working. I will continure to investigate it. if you have experence in running it inside statefulset, can you share something about it? | |||||||||||||||||||
| Comment by Mark Agarunov [ 18/Aug/17 ] | |||||||||||||||||||
|
Hello jenningsloy318, Thank you for providing the additional information. As you mentioned the filesystem is over NFS, I suspect there may be a flag causing open(2) to fail. If possible, please strace the mongod process and provide the resulting file:
This should let us see which calls are being made with which flags, and if any are failing. Thanks, | |||||||||||||||||||
| Comment by Jennings [ 18/Aug/17 ] | |||||||||||||||||||
|
Hi @Mark Agarunov Thanks for your reply. I still got the same error:
for the questions, here are my reply: 1. I am running kubernetes, and the backend storage is exported nfs volume which the backend storage a raid1 btree fs, it is HDD. 2. currently the disk is working well, since we have other applications running on these share nfs volumes. 3. Yes, we always running the same version. 4. no, I didn't modify it. 5. no restore. 6. none. 7. since this filesystem is used only for 2 months, I think it is healthy. And I may have same situations that running mongodb in a statefulset in a kubernetes cluster, when some pods encounter some problems, and it can't restart anymoe, I enable preStop in each pod so each shutdown of the pod is clean. since the monod.lock is 0 which means the shutdown is clean. I am not sure if it is correct . I used following three methods to shtudown the mongodb in the preStop hook, not sure if other parameter needed. and what if we want to start from the the last shutdown, what is the paramter shall we use?
| |||||||||||||||||||
| Comment by Mark Agarunov [ 17/Aug/17 ] | |||||||||||||||||||
|
Hello jenningsloy318, Thank you for the report. I've attached a repair attempt of the files you've provided. Would you please extract these files and replace them in your $dbpath and let us know if it resolves the issue? If you are still seeing errors after replacing these files, please provide the complete logs from mongod so that we can further investigate. Additionally, if this issue persists, please provide the following information:
Thanks, |