Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-4334

_DEBUG Windows version of mongod.exe crashes on ctrl-C on Windows XP and Server 2003

    • Type: Icon: Bug Bug
    • Resolution: Done
    • Priority: Icon: Minor - P4 Minor - P4
    • 2.1.1
    • Affects Version/s: None
    • Component/s: Stability
    • Labels:
    • Environment:
      32-bit debug version of mongod on 32-bit Windows XP
    • Fully Compatible
    • Windows

      If I build a 32-bit debug version of mongod.exe for Windows and run it on 32-bit Windows XP and then press ctrl-C, the ctrl-C is reported and immediately followed by an "unhandled windows exception" with reported value "ec=0xc00000fd". This turns out to be "stack overflow". I put breakpoints in the code and stepped through it in assembly language (required in this case) and I think I've got a decent handle on the problem now.

      We start out on-track to do our normal cleanup-and-exit on ctrl-C, but we blow up instantly when db/db.cpp routine ctrlCTerminate() calls Client::initThread( "ctrlCTerminate" ) in db/client.cpp. It really is a stack overflow, but it is debug-only diagnostic code that doesn't even get used in this build that is killing us.

      On Windows, the ctrl-C handler function we register by calling SetConsoleCtrlHandler() gets called on a new thread. Before I hit ctrl-C, I can see that we have 6 threads running. When we hit the breakpoint I set in the code after hitting ctrl-C, we are running on a new 7th thread. In order for exitCleanly( EXIT_KILL ) to work correctly, the new thread has to be set up as a "client" thread, hence the call to Client::initThread(). But we never get to execute the first source code line in initThread(), because the 256K buffer we allocate for StackChecker is too big and the _chkstk() (aka _alloca_probe()) routine that tests for stack overflow triggers. Apparently, the designers of Windows XP figured that the default Windows thread size of 1MB was excessive for a simple ctrl-C handler thread, so they picked 256K as the reserved stack size. _alloca_probe() touches memory every 4K (page size) until it either gets to the end of the requested allocation (you win) or gets an "unable to expand the stack" exception (you lose).

      The somewhat ironic part is that the 256K allocated by the StackChecker object is only used if ( sizeof(void*) == 8 ) and so no stack checking takes place in this build but we overflow the stack anyway.

      I don't know how much value this stack checking (when used) is providing, but I can stop the crashing by changing 'enum

      { SZ = 256 * 1024 }

      ;' to 'enum

      { SZ = 192 * 1024 }

      ;'. I am not sure why we only test and report on stack usage in 64-bit builds but consume stack space (forcing memory to be allocated) in both 32-bit and 64-bit builds. If limiting ourselves to checking for up to 192K of usage instead of 256K of usage is acceptable, this is a one line fix. And no actual customer is running a 32-bit debug version on Windows XP, it's just developers (like me) who will hit this bug.

            Assignee:
            tad Tad Marshall
            Reporter:
            tad Tad Marshall
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

              Created:
              Updated:
              Resolved: