[SERVER-6705] 32-bit Windows hits access violation running test.exe (Buildbot) Created: 03/Aug/12  Updated: 11/Jul/16  Resolved: 05/Aug/12

Status: Closed
Project: Core Server
Component/s: Testing Infrastructure
Affects Version/s: None
Fix Version/s: 2.2.0-rc1

Type: Bug Priority: Major - P3
Reporter: Tad Marshall Assignee: Tad Marshall
Resolution: Done Votes: 0
Labels: buildbot
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Crash in 32-bit Windows, stack traces and no crash in 32-bit Linux


Issue Links:
Related
related to SERVER-6715 The Windows version of the 'test' pro... Closed
Backwards Compatibility: Fully Compatible
Operating System: Windows
Participants:

 Description   

The 32-bit Windows version of test.exe crashes at the same place every time on Buildbot and the crash is reproducible on my machine. The debug version does not crash.

http://buildbot.mongodb.org/builders/Windows%2032-bit/builds/5306/steps/test/logs/stdio

running C:\10gen\buildslaves\mongo\Windows_32bit\mongo\mongod.exe --port 27999 --dbpath /data/db/sconsTests/
 *******************************************
         Test : test.exe ...
      Command : C:\10gen\buildslaves\mongo\Windows_32bit\mongo\test.exe
         Date : Thu Aug  2 22:02:04 2012
                (output suppressed; see http://buildlogs.mongodb.org/build/501b311bd2a60f1ffc0007a5/test/501b311cd2a60f1ffc0007a7/)
                32413.000107ms
test C:\10gen\buildslaves\mongo\Windows_32bit\mongo\test.exe exited with status -1073741819
0 tests succeeded
The following tests failed (with exit code):
C:\10gen\buildslaves\mongo\Windows_32bit\mongo\test.exe	-1073741819
scons: building terminated because of errors.
Traceback (most recent call last):
  File "C:\10gen\buildslaves\mongo\Windows_32bit\mongo\buildscripts\smoke.py", line 861, in <module>
    main()
  File "C:\10gen\buildslaves\mongo\Windows_32bit\mongo\buildscripts\smoke.py", line 829, in main
    run_old_fails()
  File "C:\10gen\buildslaves\mongo\Windows_32bit\mongo\buildscripts\smoke.py", line 722, in run_old_fails
    report() # exits with failure code if there is an error
  File "C:\10gen\buildslaves\mongo\Windows_32bit\mongo\buildscripts\smoke.py", line 546, in report
    raise Exception("Test failures")
Exception: Test failures
scons: *** [smokeFailingTests] Error 1
program finished with exit code 2
elapsedTime=200.678000

http://buildlogs.mongodb.org/build/501b311bd2a60f1ffc0007a5/test/501b311cd2a60f1ffc0007a7/

Thu Aug 02 22:02:05 [testsuite] build info: windows sys.getwindowsversion(major=6, minor=0, build=6002, platform=2, service_pack='Service Pack 2') BOOST_LIB_VERSION=1_49
Thu Aug 02 22:02:05 [testsuite] 32bit
Thu Aug 02 22:02:05 [testsuite] git version: 4aaed4ccfb608db56a269e0422f6d48385c28445
Thu Aug 02 22:02:05 [testsuite] random seed: 1343959325
Thu Aug 02 22:02:05 [testsuite] going to run suite: accumulator
Thu Aug 02 22:02:05 [testsuite] going to run test: class AccumulatorTests::Avg::None
Thu Aug 02 22:02:05 [testsuite] going to run test: class AccumulatorTests::Avg::OneInt
Thu Aug 02 22:02:05 [testsuite] going to run test: class AccumulatorTests::Avg::OneDouble
// ... snip ...
Thu Aug 02 22:02:30 [testsuite] going to run test: class CursorTests::BtreeCursor::RangeEq
Thu Aug 02 22:02:30 [testsuite] going to run test: class CursorTests::BtreeCursor::RangeIn
Thu Aug 02 22:02:30 [testsuite] going to run test: class CursorTests::BtreeCursor::AbortImplicitScan
Thu Aug 02 22:02:30 [testsuite] going to run test: class CursorTests::ClientCursor::HandleDelete
Thu Aug 02 22:02:30 [testsuite] going to run test: class CursorTests::ClientCursor::AboutToDelete
Thu Aug 02 22:02:30 [testsuite] going to run test: class CursorTests::ClientCursor::AboutToDeleteDuplicate
Thu Aug 02 22:02:30 [testsuite] going to run test: class CursorTests::ClientCursor::AboutToDeleteDuplicateNextClause
Thu Aug 02 22:02:30 [testsuite] going to run test: class CursorTests::ClientCursor::Pin::PinCursor

The log above ends there ... no message about what happened. The error code reported in the other log file above is -1073741819, which is 0xC0000005, the Windows code for EXCEPTION_ACCESS_VIOLATION (i.e. segfault). We have no unhandled exception filter in test.exe, so access violations cause program exit and no error message saying what happened.

The 32-bit Linux version of test.exe looks unhappy running the same test, but it doesn't hit a segfault.

http://buildlogs.mongodb.org/build/501b15ced2a60f4e26000424/test/501b177ad2a60f4ab500071f/

Thu Aug 2 20:12:43 [testsuite] build info: Linux domU-12-31-39-01-70-B4 2.6.21.7-2.fc8xen #1 SMP Fri Feb 15 12:39:36 EST 2008 i686 BOOST_LIB_VERSION=1_49
Thu Aug 2 20:12:43 [testsuite] git version: 4aaed4ccfb608db56a269e0422f6d48385c28445
Thu Aug 2 20:12:43 [testsuite] 32bit
Thu Aug 2 20:12:43 [testsuite] going to run suite: accumulator
Thu Aug 2 20:12:43 [testsuite] going to run test: AccumulatorTests::Avg::None
Thu Aug 2 20:12:43 [testsuite] random seed: 1343952762
Thu Aug 2 20:12:43 [testsuite] going to run test: AccumulatorTests::Avg::OneInt
Thu Aug 2 20:12:43 [testsuite] going to run test: AccumulatorTests::Avg::OneDouble
// ... snip ...
Thu Aug 2 20:13:57 [testsuite] going to run test: CursorTests::BtreeCursor::RangeEq
Thu Aug 2 20:13:57 [testsuite] going to run test: CursorTests::BtreeCursor::RangeIn
Thu Aug 2 20:13:57 [testsuite] going to run test: CursorTests::BtreeCursor::AbortImplicitScan
Thu Aug 2 20:13:57 [testsuite] going to run test: CursorTests::ClientCursor::HandleDelete
Thu Aug 2 20:13:57 [testsuite] going to run test: CursorTests::ClientCursor::AboutToDelete
Thu Aug 2 20:13:57 [testsuite] going to run test: CursorTests::ClientCursor::AboutToDeleteDuplicate
Thu Aug 2 20:13:57 [testsuite] going to run test: CursorTests::ClientCursor::AboutToDeleteDuplicateNextClause
0x8d65eba 0x90442c5 0x8c134f7 0x83f04fe 0x83f198a 0x8403dd6 0x8294f1b 0x8a46baf 0x8a47b12 0x901643b 0x8426972 0x132390 0x8294b61 
Thu Aug 2 20:13:57 [testsuite] going to run test: CursorTests::ClientCursor::Pin::PinCursor
/mnt/slaves/Linux_32bit/mongo/test(_ZN5mongo15printStackTraceERSo+0x2a) [0x8d65eba]
/mnt/slaves/Linux_32bit/mongo/test(_ZN5mongo10logContextEPKc+0xa5) [0x90442c5]
/mnt/slaves/Linux_32bit/mongo/test(_ZN5mongo11msgassertedEiPKc+0xc7) [0x8c134f7]
/mnt/slaves/Linux_32bit/mongo/test(_ZN11CursorTests12ClientCursor3Pin9PinCursor3runEv+0x7a) [0x83f198a]
/mnt/slaves/Linux_32bit/mongo/test(_ZN5mongo12ClientCursor4findExb+0x14e) [0x83f04fe]
/mnt/slaves/Linux_32bit/mongo/test(_ZN5boost6detail8function22void_function_invoker0IPFvvEvE6invokeERNS1_15function_bufferE+0xb) [0x8294f1b]
/mnt/slaves/Linux_32bit/mongo/test(_ZN5mongo8unittest5Suite13runTestObjectIN11CursorTests12ClientCursor3Pin9PinCursorEEEvv+0x36) [0x8403dd6]
/mnt/slaves/Linux_32bit/mongo/test(_ZN5mongo8unittest5Suite3runERKSsi+0x41f) [0x8a46baf]
/mnt/slaves/Linux_32bit/mongo/test(_ZN5mongo7dbtests10runDbTestsEiPPcSs+0x1afb) [0x901643b]
/mnt/slaves/Linux_32bit/mongo/test(_ZN5mongo8unittest5Suite3runERKSt6vectorISsSaISsEERKSsi+0x4c2) [0x8a47b12]
/mnt/slaves/Linux_32bit/mongo/test(main+0x72) [0x8426972]
/lib/i686/nosegneg/libc.so.6(__libc_start_main+0xe0) [0x132390]
/mnt/slaves/Linux_32bit/mongo/test [0x8294b61]
0x8d65eba 0x90442c5 0x8c134f7 0x8c136a6 0x8d912f3 0x83f1adb 0x8403dd6 0x8294f1b 0x8a46baf 0x8a47b12 0x901643b 0x8426972 0x132390 0x8294b61 
/mnt/slaves/Linux_32bit/mongo/test(_ZN5mongo10logContextEPKc+0xa5) [0x90442c5]
/mnt/slaves/Linux_32bit/mongo/test(_ZN5mongo15printStackTraceERSo+0x2a) [0x8d65eba]
/mnt/slaves/Linux_32bit/mongo/test [0x8c136a6]
/mnt/slaves/Linux_32bit/mongo/test(_ZN5mongo11msgassertedEiPKc+0xc7) [0x8c134f7]
/mnt/slaves/Linux_32bit/mongo/test(_ZN11CursorTests12ClientCursor3Pin9PinCursor3runEv+0x1cb) [0x83f1adb]
/mnt/slaves/Linux_32bit/mongo/test(_ZN5mongo12ClientCursor5eraseEx+0x3c3) [0x8d912f3]
/mnt/slaves/Linux_32bit/mongo/test(_ZN5boost6detail8function22void_function_invoker0IPFvvEvE6invokeERNS1_15function_bufferE+0xb) [0x8294f1b]
/mnt/slaves/Linux_32bit/mongo/test(_ZN5mongo8unittest5Suite13runTestObjectIN11CursorTests12ClientCursor3Pin9PinCursorEEEvv+0x36) [0x8403dd6]
/mnt/slaves/Linux_32bit/mongo/test(_ZN5mongo8unittest5Suite3runERKSsi+0x41f) [0x8a46baf]
/mnt/slaves/Linux_32bit/mongo/test(_ZN5mongo7dbtests10runDbTestsEiPPcSs+0x1afb) [0x901643b]
/mnt/slaves/Linux_32bit/mongo/test(_ZN5mongo8unittest5Suite3runERKSt6vectorISsSaISsEERKSsi+0x4c2) [0x8a47b12]
/lib/i686/nosegneg/libc.so.6(__libc_start_main+0xe0) [0x132390]
/mnt/slaves/Linux_32bit/mongo/test(main+0x72) [0x8426972]
/mnt/slaves/Linux_32bit/mongo/test [0x8294b61]
/mnt/slaves/Linux_32bit/mongo/test(_ZN5mongo15printStackTraceERSo+0x2a) [0x8d65eba]
0x8d65eba 0x90442c5 0x8c134f7 0x83f04fe 0x83f1b7e 0x8403dd6 0x8294f1b 0x8a46baf 0x8a47b12 0x901643b 0x8426972 0x132390 0x8294b61 
/mnt/slaves/Linux_32bit/mongo/test(_ZN5mongo10logContextEPKc+0xa5) [0x90442c5]
/mnt/slaves/Linux_32bit/mongo/test(_ZN5mongo11msgassertedEiPKc+0xc7) [0x8c134f7]
/mnt/slaves/Linux_32bit/mongo/test(_ZN11CursorTests12ClientCursor3Pin9PinCursor3runEv+0x26e) [0x83f1b7e]
/mnt/slaves/Linux_32bit/mongo/test(_ZN5mongo12ClientCursor4findExb+0x14e) [0x83f04fe]
/mnt/slaves/Linux_32bit/mongo/test(_ZN5boost6detail8function22void_function_invoker0IPFvvEvE6invokeERNS1_15function_bufferE+0xb) [0x8294f1b]
/mnt/slaves/Linux_32bit/mongo/test(_ZN5mongo8unittest5Suite13runTestObjectIN11CursorTests12ClientCursor3Pin9PinCursorEEEvv+0x36) [0x8403dd6]
/mnt/slaves/Linux_32bit/mongo/test(_ZN5mongo8unittest5Suite3runERKSsi+0x41f) [0x8a46baf]
/mnt/slaves/Linux_32bit/mongo/test(_ZN5mongo7dbtests10runDbTestsEiPPcSs+0x1afb) [0x901643b]
/mnt/slaves/Linux_32bit/mongo/test(_ZN5mongo8unittest5Suite3runERKSt6vectorISsSaISsEERKSsi+0x4c2) [0x8a47b12]
/mnt/slaves/Linux_32bit/mongo/test(main+0x72) [0x8426972]
/lib/i686/nosegneg/libc.so.6(__libc_start_main+0xe0) [0x132390]
/mnt/slaves/Linux_32bit/mongo/test [0x8294b61]
Thu Aug 2 20:13:57 [testsuite] going to run test: CursorTests::ClientCursor::Pin::CursorDeleted
Thu Aug 2 20:13:57 [testsuite] going to run test: CursorTests::ClientCursor::Pin::PinTwice
Thu Aug 2 20:13:57 [testsuite] DONE running tests

64-bit Windows behaves like 32-bit Linux. From a local test:

13:52:34.31 G:\MongoDB\tadmarshall\mongo> test
Fri Aug 03 13:52:41 [testsuite] git version: 4aaed4ccfb608db56a269e0422f6d48385c28445
Fri Aug 03 13:52:41 [testsuite] build info: windows sys.getwindowsversion(major=6, minor=1, build=7601, platform=2, service_pack='Service Pack 1') BOOST_LIB_VERSION=1_49
Fri Aug 03 13:52:41 [testsuite] random seed: 1344016361
Fri Aug 03 13:52:41 [testsuite] journal dir=/tmp/unittest/journal
Fri Aug 03 13:52:41 [testsuite] recover : no journal files present, no recovery needed
Fri Aug 03 13:52:41 [testsuite] going to run suite: accumulator
Fri Aug 03 13:52:41 [testsuite]          going to run test: class AccumulatorTests::Avg::None
Fri Aug 03 13:52:41 [testsuite]          going to run test: class AccumulatorTests::Avg::OneInt
Fri Aug 03 13:52:41 [testsuite]          going to run test: class AccumulatorTests::Avg::OneLong
Fri Aug 03 13:52:41 [testsuite]          going to run test: class AccumulatorTests::Avg::OneDouble
// ... snip ...
Fri Aug 03 13:53:05 [testsuite]          going to run test: class CursorTests::BtreeCursor::RangeEq
Fri Aug 03 13:53:05 [testsuite]          going to run test: class CursorTests::BtreeCursor::RangeIn
Fri Aug 03 13:53:05 [testsuite]          going to run test: class CursorTests::BtreeCursor::AbortImplicitScan
Fri Aug 03 13:53:05 [testsuite]          going to run test: class CursorTests::ClientCursor::HandleDelete
Fri Aug 03 13:53:05 [testsuite]          going to run test: class CursorTests::ClientCursor::AboutToDelete
Fri Aug 03 13:53:05 [testsuite]          going to run test: class CursorTests::ClientCursor::AboutToDeleteDuplicate
Fri Aug 03 13:53:05 [testsuite]          going to run test: class CursorTests::ClientCursor::AboutToDeleteDuplicateNextClause
Fri Aug 03 13:53:05 [testsuite]          going to run test: class CursorTests::ClientCursor::Pin::PinCursor
test.exe      ...\src\mongo\util\stacktrace.cpp(158)                   mongo::printStackTrace+0x3e
test.exe      ...\src\mongo\util\assert_util.cpp(154)                  mongo::msgasserted+0xe3
test.exe      ...\src\mongo\db\clientcursor.h(328)                     mongo::ClientCursor::find+0x99
test.exe      ...\src\mongo\dbtests\cursortests.cpp(466)               CursorTests::ClientCursor::Pin::PinCursor::assertNotPinned+0x163
test.exe      ...\src\mongo\dbtests\cursortests.cpp(454)               CursorTests::ClientCursor::Pin::PinCursor::run+0x40
test.exe      ...\src\mongo\unittest\unittest.h(269)                   mongo::unittest::Suite::runTestObject<CursorTests::ClientCursor::Pin::PinCursor>+0x41
test.exe      ...\src\mongo\unittest\unittest.cpp(142)                 mongo::unittest::Suite::run+0x43f
test.exe      ...\src\mongo\unittest\unittest.cpp(207)                 mongo::unittest::Suite::run+0x4ba
test.exe      ...\src\mongo\dbtests\framework.cpp(258)                 mongo::dbtests::runDbTests+0x1875
test.exe      ...\src\mongo\dbtests\dbtests.cpp(29)                    main+0xae
test.exe      f:\dd\vctools\crt_bld\self_64_amd64\crt\src\crt0.c(278)  __tmainCRTStartup+0x13b
kernel32.dll                                                           BaseThreadInitThunk+0xd
test.exe      ...\src\mongo\util\stacktrace.cpp(158)                   mongo::printStackTrace+0x3e
test.exe      ...\src\mongo\util\assert_util.cpp(154)                  mongo::msgasserted+0xe3
test.exe      ...\src\mongo\db\clientcursor.cpp(798)                   mongo::ClientCursor::erase+0x255
test.exe      ...\src\mongo\dbtests\cursortests.cpp(456)               CursorTests::ClientCursor::Pin::PinCursor::run+0x191
test.exe      ...\src\mongo\unittest\unittest.h(269)                   mongo::unittest::Suite::runTestObject<CursorTests::ClientCursor::Pin::PinCursor>+0x41
test.exe      ...\src\mongo\unittest\unittest.cpp(142)                 mongo::unittest::Suite::run+0x43f
test.exe      ...\src\mongo\unittest\unittest.cpp(207)                 mongo::unittest::Suite::run+0x4ba
test.exe      ...\src\mongo\dbtests\framework.cpp(258)                 mongo::dbtests::runDbTests+0x1875
test.exe      ...\src\mongo\dbtests\dbtests.cpp(29)                    main+0xae
test.exe      f:\dd\vctools\crt_bld\self_64_amd64\crt\src\crt0.c(278)  __tmainCRTStartup+0x13b
kernel32.dll                                                           BaseThreadInitThunk+0xd
test.exe      ...\src\mongo\util\stacktrace.cpp(158)                   mongo::printStackTrace+0x3e
test.exe      ...\src\mongo\util\assert_util.cpp(154)                  mongo::msgasserted+0xe3
test.exe      ...\src\mongo\db\clientcursor.h(328)                     mongo::ClientCursor::find+0x99
test.exe      ...\src\mongo\dbtests\cursortests.cpp(466)               CursorTests::ClientCursor::Pin::PinCursor::assertNotPinned+0x163
test.exe      ...\src\mongo\dbtests\cursortests.cpp(459)               CursorTests::ClientCursor::Pin::PinCursor::run+0x294
test.exe      ...\src\mongo\unittest\unittest.h(269)                   mongo::unittest::Suite::runTestObject<CursorTests::ClientCursor::Pin::PinCursor>+0x41
test.exe      ...\src\mongo\unittest\unittest.cpp(142)                 mongo::unittest::Suite::run+0x43f
test.exe      ...\src\mongo\unittest\unittest.cpp(207)                 mongo::unittest::Suite::run+0x4ba
test.exe      ...\src\mongo\dbtests\framework.cpp(258)                 mongo::dbtests::runDbTests+0x1875
test.exe      ...\src\mongo\dbtests\dbtests.cpp(29)                    main+0xae
test.exe      f:\dd\vctools\crt_bld\self_64_amd64\crt\src\crt0.c(278)  __tmainCRTStartup+0x13b
kernel32.dll                                                           BaseThreadInitThunk+0xd
Fri Aug 03 13:53:06 [testsuite]          going to run test: class CursorTests::ClientCursor::Pin::PinTwice
Fri Aug 03 13:53:07 [testsuite]          going to run test: class CursorTests::ClientCursor::Pin::CursorDeleted
Fri Aug 03 13:53:07 [testsuite]          DONE running tests



 Comments   
Comment by Erich Siedler [ 05/Aug/12 ]

Great! Thanks for the explanation.

Comment by auto [ 05/Aug/12 ]

Author:

{u'date': u'2012-08-04T13:52:10-07:00', u'email': u'tad@10gen.com', u'name': u'Tad Marshall'}

Message: SERVER-6705 disable FPO for 32-bit Windows

Disable Frame Pointer Omission/Optimization in Windows
release builds. This is already disabled in debug builds,
and only affects the 32-bit version.
Branch: master
https://github.com/mongodb/mongo/commit/0ff6b7aeb373c82d70cd5a34ce1c2d85f4a725cb

Comment by Tad Marshall [ 05/Aug/12 ]

Hi Erich,

Thanks for your note. Yes, it was the change to the stack trace code that caused this problem.

The fundamental problem is that the RtlCaptureContext() Windows API uses the ebp register in 32-bit builds to read parameters from the stack, but it doesn't set up this frame pointer itself. That means that this API is unusable (safely) if the code was built with the /Oy linker switch (omit frame pointer) or if this switch is set implicitly (and not disabled with /Oy-) by an optimization switch such as the /O2 that we use in release builds.

Somewhat contrary to the documentation, enabling FPO (frame pointer omission or optimization) does not mean that frame pointers (using ebp to point to a stack frame) will not be generated, just that the compiler is free to omit the frame pointer and use ebp as a general purpose register in generated code.

What happened with my code change (that you referenced above) is that I changed printStackTrace() from a large routine to a small one, and this led to the compiler and linker inlining it with a routine that was using ebp as a scratch register. When the smaller printStackTrace() routine called RtlCaptureContext(), the ebp register had the value zero, causing an access violation when RtlCaptureContext() tried to read from [ebp+4].

I tried disabling FPO optimization in selected routines, but printStackTrace() is called from so many places (especially now that it is inline-able) that it is not a safe approach. The better solution is to just disable frame pointer optimization altogether for the 32-bit release build. It turns out that this doesn't increase code size, and the effect on performance is negligible (undetectable, actually) in the tests that I have run. This only affects the 32-bit builds; the 64-bit version of RtlCaptureContext() doesn't have this weakness, and FPO was turned off (by /Od) in the debug 32-bit build anyway.

Thank you for your support, for reproducing the problem and for letting us know what you learned about it; this is greatly appreciated!

Tad

Comment by Erich Siedler [ 04/Aug/12 ]

Reproduced every time here too, with Vista SP2 x64. 64bit release runs fine.

I reverted the following and then the 32bit release runs the same as the others: 5300139ceb188361f773c7a6f3c24f9fe9affd6a (SERVER-6554 Windows unhandled exception filter print stack trace). No more crash, no failed test, lots os stack traces, which I suppose are expected.

I hope this helps, best regards.

Generated at Thu Feb 08 03:12:28 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.