[SERVER-16023] Open file limit causes shutdown of WT but mongod stull running Created: 15/Oct/14 Updated: 04/Dec/14 Resolved: 10/Nov/14 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Storage |
| Affects Version/s: | None |
| Fix Version/s: | 2.8.0-rc0 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Alvin Richards (Inactive) | Assignee: | Mark Benvenuto |
| Resolution: | Done | Votes: | 0 |
| Labels: | 28qa | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Operating System: | ALL |
| Participants: |
| Description |
|
Problem: On a long run, open file limit was reached, with this in the logs
However, mongod left running. |
| Comments |
| Comment by Keith Bostic [ 10/Nov/14 ] | ||||||||||
|
Yes, that looks correct to me. | ||||||||||
| Comment by Eliot Horowitz (Inactive) [ 10/Nov/14 ] | ||||||||||
|
Keith - take a look at this: https://github.com/mongodb/mongo/commit/4ab029e3e68ea5f7aa89b90a47fd814d17b142bf | ||||||||||
| Comment by Keith Bostic [ 10/Nov/14 ] | ||||||||||
|
Mark, we made a minor change to pass WT_PANIC as the error to the error handler, which means applications can figure out if a panic has happened by testing the error value. Is that sufficient for your needs? Or are there additional advantages to having a separate panic handler? | ||||||||||
| Comment by Keith Bostic [ 10/Nov/14 ] | ||||||||||
|
We've pushed changes into the WiredTiger develop branch that panic the engine if a thread fails, causing all future calls to return failure, reference https://github.com/wiredtiger/wiredtiger/pull/1356. | ||||||||||
| Comment by Githook User [ 10/Nov/14 ] | ||||||||||
|
Author: {u'username': u'erh', u'name': u'Eliot Horowitz', u'email': u'eliot@10gen.com'}Message: | ||||||||||
| Comment by Keith Bostic [ 07/Nov/14 ] | ||||||||||
|
> A handler_panic function is what we are looking for. OK, let us think about that a little and get back to you. > In this state, would you expect us to call storage engine shutdown, or would it be a problem if we just called _exit()? It's safer not to call storage engine shutdown, something very, very bad has happened and continuing to run is risky at best. | ||||||||||
| Comment by Mark Benvenuto [ 07/Nov/14 ] | ||||||||||
|
A handler_panic function is what we are looking for. Something like (mocked up from handle_error)
We just want the callback for logging purposes, and so that we can go through our own abrupt shutdown routines. We were not going to continue the process at this point. In this state, would you expect us to call storage engine shutdown, or would it be a problem if we just called _exit()? | ||||||||||
| Comment by Keith Bostic [ 07/Nov/14 ] | ||||||||||
|
Generally, there's no choice about what to do (for example, a panic might indicate corrupted memory, unexpected data format, or lack of resources); the only option is to exit and restart. We could probably offer a special "we're going to panic" notification that would allow the application to immediately throw up its hands, but once we decide to panic, no future calls to the engine will succeed, they will all immediately return WT_PANIC. | ||||||||||
| Comment by Eric Milkie [ 07/Nov/14 ] | ||||||||||
|
Could we have a special fatal or severe message callback function, in which we could abort the process? Then you could allow engine consumers the choice of what to do when such things happen. | ||||||||||
| Comment by Keith Bostic [ 07/Nov/14 ] | ||||||||||
|
I suspect that for now, the right approach for us is to panic the engine if a thread fails so that all future calls return failure. It sounds to me like that is acceptable to you? Working threads that see resource failure won't fail the engine (for example, an attempt to open a new table might fail), this is only the backing server threads. Does that make sense? | ||||||||||
| Comment by Mark Benvenuto [ 07/Nov/14 ] | ||||||||||
|
A simple repro
In mongo shell, run the following command
Here are the errors I see in the log:
There are errors in both the archive thread, and the thread allocating the table that backs the collection. |