[CDRIVER-386] Support recovery from Out of Memory condition Created: 27/Jun/14 Updated: 03/Jan/18 Resolved: 24/Apr/17 |
|
| Status: | Closed |
| Project: | C Driver |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Improvement | Priority: | Major - P3 |
| Reporter: | Alex Lerner | Assignee: | Backlog - C Driver Team |
| Resolution: | Won't Fix | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||
| Case: | (copied to CRM) | ||||||||||||||||
| Comments |
| Comment by A. Jesse Jiryu Davis [ 07/Mar/17 ] |
|
This improvement will not, realistically, ever reach a priority level that will justify the effort. We'll reopen this ticket if we see convincing evidence otherwise. |
| Comment by Christian Hergert [ 17/Jul/14 ] |
|
Can you throw a minimal example patch together so I can see what that would look like? It sounds reasonable to me, but I don't exactly know what I'm saying yes to! |
| Comment by Itay Neeman [ 17/Jul/14 ] |
|
Thanks for the update. I saw the vtable additions for memory, and they are useful, though they don't give a full fix to the problem. One thing I was wondering: how difficult would it be to have an option to compile the driver in C++ mode? This may allow a custom function (via the vtable implementation) that would throw an std::bad_alloc exception, thus allowing us to at least exit the stack. Is this totally nuts? |
| Comment by Christian Hergert [ 16/Jul/14 ] |
|
Hi Itay, I think the ability to recover from OOM in a 100% way is out of scope for what we want in the C driver. It will add significant complexity to the code base for something that is quite special purpose (and again, libdbus is the only library I've seen successfully handle OOM to this day on Linux). If you read the source to that library, you'll see why I'm concerned. However, we do have a new VTable for memory operations in Libbson. This means that you can keep a separate heap around for the OOM situations if you like and manage that. If you still want to take it another step further, you can perform driver work in a setjmp/longjmp context and jump back in the failure to malloc from a custom malloc handler. This would allow you to recover at some point in the future when there is more memory available. Probably not the answer you are looking for, but I think this will handle a good amount more than you'd expect. |
| Comment by Itay Neeman [ 02/Jul/14 ] |
|
Christian - thanks for responding. You obviously maintain the library, so it's up to you, but I'd like to provide my (uninvited) two cents. While the suggestion above (to have a memory pressure relieving "valve") is good and would be a welcome addition, I don't think it's sufficient. It's very hard to use and link against a library that will cause your program to abort without recourse in this condition. It takes the control from the application and puts it into the library. While I agree that it is likely that a full "scrubbing" and a check of every malloc (and the tree of usages of it) will likely miss some cases, I don't think that puts us in a worse position or makes "all the work lost". The reason for this is that if the work is done but we miss some things, those things are bugs that can be fixed. As it stands today, there is no way for us to fix the issue at all - we're at the mercy of the library. I think you can still maintain your current "abort" semantics for those that want it by simply having it as a compile time switch. However, it would be very nice to see us have a safe malloc option and check it (and its tree of usages) as well. I realize this is a big undertaking, but better now than later, and better now than never. Hope this makes sense - happy to explain more. |
| Comment by Christian Hergert [ 01/Jul/14 ] |
|
This is going to need some planning. What worries me is littering the codebase with things like checking if formatting a new allocated string worked. I really don't trust code that does that, because one slip and all your work is lost. What seems more tenable, is adding a callback upon memory pressure that allows the host application to release malloc memory so the library can continue to make progress. This could be plumbed into the malloc wrappers fairly easily. |