[SERVER-38047] Mongo 3.4.17 crash (WiredTiger error) Created: 09/Nov/18 Updated: 30/Nov/18 Resolved: 30/Nov/18 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | 3.4.17 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Tzach Yarimi | Assignee: | Danny Hatcher (Inactive) |
| Resolution: | Done | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: |
Ubuntu 16.04.5 LTS |
||
| Participants: |
| Description |
|
Log: 018-11-08T22:22:49.397+0000 E STORAGE [thread2] WiredTiger error (22) [1541715769:397467][110545:0x7f3a43f80700], file:impl_condeco_group_l_30329/collection-8059533--382801488 ,{"b":"55B6FA334000","o":"158B4F9"},{"b":"55B6FA334000","o":"158B9DD"},{"b":"7F3A49355000","}} |
| Comments |
| Comment by Danny Hatcher (Inactive) [ 30/Nov/18 ] |
|
Hello Tzach, As I have not heard back from you and this appears to have been resolved in Thank you, Danny |
| Comment by Danny Hatcher (Inactive) [ 15/Nov/18 ] |
|
Hello Tzach, Unfortunately, due to the nature of the issue it may be difficult to prove beyond all doubt that you will never encounter this issue again. The safest method would be to perform a rolling initial sync across your cluster to ensure that all data files have no chance of having the problem. That being said, if your other nodes have been running on 3.4.x for a significant period of time without failing then you will most likely be fine with the snapshot. Would it be possible to bring your replica set up to full using the snapshot and then schedule some maintenance time over the next few weeks to initial sync the nodes one at a time? Thank you, Danny |
| Comment by Tzach Yarimi [ 15/Nov/18 ] |
|
Thanks Danny, We did create the failing node from snapshot, however we don't have a way of knowing which of our existing nodes is "healthy", since they were all created a long time ago on Mongo 3.2.8 and were upgraded to 3.4.X. Is there a check we can run to validate that a node is healthy? |
| Comment by Danny Hatcher (Inactive) [ 14/Nov/18 ] |
|
Hello Tzach, If you are only experiencing the problem on one node and the rest of the nodes in the replica set are fine, you should be able to use your normal instance creation procedure as long as the snapshot is taken from a healthy node. Thank you, Danny |
| Comment by Tzach Yarimi [ 11/Nov/18 ] |
|
Hi Danny, Yes, this instance was created on 3.2.8, then upgraded to 3.2.13, then 3.4.X. In our use case, doing an initial sync requires a long downtime, as the DB is 2TB and we are write heavy. Usually when we need a new instance, we create one from an AWS EBS snapshot. I guess that this won't fix the issue as the data files are not cleared, correct? Is there a different solution that will not require an initial sync? Thanks, Tzach |
| Comment by Danny Hatcher (Inactive) [ 09/Nov/18 ] |
|
Hello Tzach, This looks similar to the issue fixed in If you have healthy replica set nodes, I recommend clearing the data files and performing an initial sync. Thank you, Danny |
| Comment by Tzach Yarimi [ 09/Nov/18 ] |
|
Restarting the mongo service didn't help - it kept crashing repeatedly. |