[SERVER-5152] Windows unhandled exception filter should report thread and fault address Created: 01/Mar/12 Updated: 11/Jul/16 Resolved: 04/Mar/12 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Internal Code, Logging |
| Affects Version/s: | None |
| Fix Version/s: | 2.1.1 |
| Type: | Improvement | Priority: | Major - P3 |
| Reporter: | Tad Marshall | Assignee: | Tad Marshall |
| Resolution: | Done | Votes: | 0 |
| Labels: | Windows | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: |
Windows |
||
| Backwards Compatibility: | Fully Compatible |
| Participants: |
| Description |
|
The Windows version of mongod sets up an "unhandled exception filter" which is called when an exception occurs that is not trapped by any of our regular try/catch code. The main reason it exists and the main way that it gets to execute is on "access violations", the Windows term for a segfault. Attempts to read from address 0 or 0 plus a structure offset will pass through this exception filter on their way to a quick exit. All the code does is record the fact that it happened. But the existing code garbles its output ("unhandled Windows ex" is output followed by a timestamp and "access violation" with no newline so it looks really bad) and, worse, it doesn't tell us which thread had the access violation or what the faulting address was, so we have absolutely nothing to go on in debugging it. The code should instead display a readable output line, make sure that it goes to the log file so we get it when mongod.exe is running as a service, and it should tell us which thread faulted and what the faulting address was. This would at least give us a starting point in finding out how a crash happened. I am posting this because of an access violation that happened in buildbot for the 32-bit Windows version that was not reproducible when tested on my machine and which passed the test on the next buildbot run. So all we know is that the 32-bit Windows version can crash but nothing about how it can happen. |
| Comments |
| Comment by auto [ 04/Mar/12 ] | |||||||||||||
|
Author: {u'login': u'tadmarshall', u'name': u'Tad Marshall', u'email': u'tad@10gen.com'}Message: This change improves the reporting of unhandled exceptions in | |||||||||||||
| Comment by Tad Marshall [ 03/Mar/12 ] | |||||||||||||
|
I edited the description to remove my incorrect claim about catch ( ... ) and set the Fix Version to 2.1.1 since the code is written and moving through code review. This will be very helpful for debugging access violations in the Windows version in the field. | |||||||||||||
| Comment by Tad Marshall [ 01/Mar/12 ] | |||||||||||||
|
Good question, and what I said may be wrong. I have used __try { } __finally { } and noticed that C++ exceptions on Windows seem to use SEH, but I don't have actual practice trying to use SEH and C++ exceptions together so I am probably wrong. Update ... I tested it and I am wrong.
| |||||||||||||
| Comment by Andy Schwerin [ 01/Mar/12 ] | |||||||||||||
|
Interesting, does (...) catch access violations if you don't have structured exception handling (SEH) enabled? |