[SERVER-4334] _DEBUG Windows version of mongod.exe crashes on ctrl-C on Windows XP and Server 2003 Created: 20/Nov/11  Updated: 11/Jul/16  Resolved: 28/Mar/12

Status: Closed
Project: Core Server
Component/s: Stability
Affects Version/s: None
Fix Version/s: 2.1.1

Type: Bug Priority: Minor - P4
Reporter: Tad Marshall Assignee: Tad Marshall
Resolution: Done Votes: 0
Labels: Windows, crash
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

32-bit debug version of mongod on 32-bit Windows XP


Backwards Compatibility: Fully Compatible
Operating System: Windows
Participants:

 Description   

If I build a 32-bit debug version of mongod.exe for Windows and run it on 32-bit Windows XP and then press ctrl-C, the ctrl-C is reported and immediately followed by an "unhandled windows exception" with reported value "ec=0xc00000fd". This turns out to be "stack overflow". I put breakpoints in the code and stepped through it in assembly language (required in this case) and I think I've got a decent handle on the problem now.

We start out on-track to do our normal cleanup-and-exit on ctrl-C, but we blow up instantly when db/db.cpp routine ctrlCTerminate() calls Client::initThread( "ctrlCTerminate" ) in db/client.cpp. It really is a stack overflow, but it is debug-only diagnostic code that doesn't even get used in this build that is killing us.

On Windows, the ctrl-C handler function we register by calling SetConsoleCtrlHandler() gets called on a new thread. Before I hit ctrl-C, I can see that we have 6 threads running. When we hit the breakpoint I set in the code after hitting ctrl-C, we are running on a new 7th thread. In order for exitCleanly( EXIT_KILL ) to work correctly, the new thread has to be set up as a "client" thread, hence the call to Client::initThread(). But we never get to execute the first source code line in initThread(), because the 256K buffer we allocate for StackChecker is too big and the _chkstk() (aka _alloca_probe()) routine that tests for stack overflow triggers. Apparently, the designers of Windows XP figured that the default Windows thread size of 1MB was excessive for a simple ctrl-C handler thread, so they picked 256K as the reserved stack size. _alloca_probe() touches memory every 4K (page size) until it either gets to the end of the requested allocation (you win) or gets an "unable to expand the stack" exception (you lose).

The somewhat ironic part is that the 256K allocated by the StackChecker object is only used if ( sizeof(void*) == 8 ) and so no stack checking takes place in this build but we overflow the stack anyway.

I don't know how much value this stack checking (when used) is providing, but I can stop the crashing by changing 'enum

{ SZ = 256 * 1024 }

;' to 'enum

{ SZ = 192 * 1024 }

;'. I am not sure why we only test and report on stack usage in 64-bit builds but consume stack space (forcing memory to be allocated) in both 32-bit and 64-bit builds. If limiting ourselves to checking for up to 192K of usage instead of 256K of usage is acceptable, this is a one line fix. And no actual customer is running a 32-bit debug version on Windows XP, it's just developers (like me) who will hit this bug.



 Comments   
Comment by auto [ 28/Mar/12 ]

Author:

{u'login': u'tadmarshall', u'name': u'Tad Marshall', u'email': u'tad@10gen.com'}

Message: SERVER-4334 Change StackChecker size from 256 KB to 192 KB

The stack size in Windows XP and Windows Server 2003 R2 for the thread
that is created to handle a ctrl-C is 256 KB, so we overflow the stack
when we pad it with 256 KB of dummy values to track stack usage. Drop
the padding to 192 KB.
Branch: master
https://github.com/mongodb/mongo/commit/38aaee4cd3ad67aa963eedf01297319dd4b18546

Comment by Tad Marshall [ 28/Mar/12 ]

Now that I'm testing 64-bit debug builds on 64-bit Windows Server 2003 R2, I can see the extent of this issue. We get the same stack overflow in 64-bit builds of mongod.exe on a 64-bit OS if the Windows OS is pre-Vista. This needs to be fixed to make debugging work right.

Generated at Thu Feb 08 03:05:41 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.