[SERVER-6705] 32-bit Windows hits access violation running test.exe (Buildbot) Created: 03/Aug/12 Updated: 11/Jul/16 Resolved: 05/Aug/12 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Testing Infrastructure |
| Affects Version/s: | None |
| Fix Version/s: | 2.2.0-rc1 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Tad Marshall | Assignee: | Tad Marshall |
| Resolution: | Done | Votes: | 0 |
| Labels: | buildbot | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: |
Crash in 32-bit Windows, stack traces and no crash in 32-bit Linux |
||
| Issue Links: |
|
||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||
| Operating System: | Windows | ||||||||
| Participants: | |||||||||
| Description |
|
The 32-bit Windows version of test.exe crashes at the same place every time on Buildbot and the crash is reproducible on my machine. The debug version does not crash. http://buildbot.mongodb.org/builders/Windows%2032-bit/builds/5306/steps/test/logs/stdio
http://buildlogs.mongodb.org/build/501b311bd2a60f1ffc0007a5/test/501b311cd2a60f1ffc0007a7/
The log above ends there ... no message about what happened. The error code reported in the other log file above is -1073741819, which is 0xC0000005, the Windows code for EXCEPTION_ACCESS_VIOLATION (i.e. segfault). We have no unhandled exception filter in test.exe, so access violations cause program exit and no error message saying what happened. The 32-bit Linux version of test.exe looks unhappy running the same test, but it doesn't hit a segfault. http://buildlogs.mongodb.org/build/501b15ced2a60f4e26000424/test/501b177ad2a60f4ab500071f/
64-bit Windows behaves like 32-bit Linux. From a local test:
|
| Comments |
| Comment by Erich Siedler [ 05/Aug/12 ] |
|
Great! Thanks for the explanation. |
| Comment by auto [ 05/Aug/12 ] |
|
Author: {u'date': u'2012-08-04T13:52:10-07:00', u'email': u'tad@10gen.com', u'name': u'Tad Marshall'}Message: Disable Frame Pointer Omission/Optimization in Windows |
| Comment by Tad Marshall [ 05/Aug/12 ] |
|
Hi Erich, Thanks for your note. Yes, it was the change to the stack trace code that caused this problem. The fundamental problem is that the RtlCaptureContext() Windows API uses the ebp register in 32-bit builds to read parameters from the stack, but it doesn't set up this frame pointer itself. That means that this API is unusable (safely) if the code was built with the /Oy linker switch (omit frame pointer) or if this switch is set implicitly (and not disabled with /Oy-) by an optimization switch such as the /O2 that we use in release builds. Somewhat contrary to the documentation, enabling FPO (frame pointer omission or optimization) does not mean that frame pointers (using ebp to point to a stack frame) will not be generated, just that the compiler is free to omit the frame pointer and use ebp as a general purpose register in generated code. What happened with my code change (that you referenced above) is that I changed printStackTrace() from a large routine to a small one, and this led to the compiler and linker inlining it with a routine that was using ebp as a scratch register. When the smaller printStackTrace() routine called RtlCaptureContext(), the ebp register had the value zero, causing an access violation when RtlCaptureContext() tried to read from [ebp+4]. I tried disabling FPO optimization in selected routines, but printStackTrace() is called from so many places (especially now that it is inline-able) that it is not a safe approach. The better solution is to just disable frame pointer optimization altogether for the 32-bit release build. It turns out that this doesn't increase code size, and the effect on performance is negligible (undetectable, actually) in the tests that I have run. This only affects the 32-bit builds; the 64-bit version of RtlCaptureContext() doesn't have this weakness, and FPO was turned off (by /Od) in the debug 32-bit build anyway. Thank you for your support, for reproducing the problem and for letting us know what you learned about it; this is greatly appreciated! Tad |
| Comment by Erich Siedler [ 04/Aug/12 ] |
|
Reproduced every time here too, with Vista SP2 x64. 64bit release runs fine. I reverted the following and then the 32bit release runs the same as the others: 5300139ceb188361f773c7a6f3c24f9fe9affd6a ( I hope this helps, best regards. |