[SERVER-40088] WiredTiger has failed to open its metadata Created: 12/Mar/19 Updated: 12/Mar/19 Resolved: 12/Mar/19 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | 3.2.10 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Christian Cremer | Assignee: | Danny Hatcher (Inactive) |
| Resolution: | Done | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Attachments: |
|
| Operating System: | ALL |
| Steps To Reproduce: | 2019-03-12T11:58:57.249+0000 I CONTROL [initandlisten] MongoDB starting : pid=27 port=27017 dbpath=/var/lib/mongodb/data 64-bit host=frame-4-bcfcl , replication: { oplogSizeMB: 64 }, security: { authorization: "disabled" }, storage: { dbPath: "/var/lib/mongodb/data", wiredTiger: { engineConfig: { cacheSizeGB: 1 }} }, systemLog: { quiet: true } } |
| Participants: |
| Description |
|
We run MongoDB 3.2 in an OpenShift cluster using the official RedHat pod (hence the not quite up-to-date version). For reasons unknown yet, the pod failed to stop cleanly, corrupting the WiredTiger data files. The error is basically the same as described in We have tried to run an image with 3.6, that did not help. When running a 4.0 image, the output is more and it tries to repair, but only to fail with "** IMPORTANT: UPGRADE PROBLEM: The data files need to be fully upgraded to version 3.6 before attempting an upgrade to 4.0". A file-based restore does not help either, mongodb fails to start with checksum errors. Our customer did not create dumps. Can you help us? The database is currently down... Reading from the other logs, I have attached some files in order to get repair files. Please indicate if those are enough. Best Regards Christian Cremer |
| Comments |
| Comment by Danny Hatcher (Inactive) [ 12/Mar/19 ] |
|
Chris, Thanks for letting us know the repair worked! Have a great day, Danny |
| Comment by Christian Cremer [ 12/Mar/19 ] |
|
Hello Daniel Thank you very much for the superfast response and repair files! The repair worked and the server is now up again. We're waiting for the customer to verify the data, but I don't expect any more troubles in this regard. Yes, that or running databases outside of Openshift altogether. We have had DB corruptions with PostgreSQL and other systems when used with GlusterFS in the past before. However, since we are running Openshift as a public cluster, we don't always know what our customers are running. Nevertheless, thanks again! Cheers, Chris |
| Comment by Danny Hatcher (Inactive) [ 12/Mar/19 ] |
|
Hello Christian, We recommend using replica sets for situations like this. If one node goes down due to corrupt data, you can perform a clean sync from a healthy node without a service interruption. I've attached repair-attempt.tar Thank you, Danny |