Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-22142

resmoke.py's FlushThread attempts to reference imported members during Python interpreter shutdown

    • Fully Compatible
    • ALL
    • Hide

      This seemed to fail most often on Windows (Python 2.7.3) when the Python interpreter was relatively short-lived (~30 seconds). To achieve that, the failpoints and audit tasks were run repeatedly using the following patch:

      Unable to find source-code formatter for language: diff. Available languages are: actionscript, ada, applescript, bash, c, c#, c++, cpp, css, erlang, go, groovy, haskell, html, java, javascript, js, json, lua, none, nyan, objc, perl, php, python, r, rainbow, ruby, scala, sh, sql, swift, visualbasic, xml, yaml
      diff --git a/etc/evergreen.yml b/etc/evergreen.yml
      index 1631460..d08cae8 100644
      --- a/etc/evergreen.yml
      +++ b/etc/evergreen.yml
      @@ -286,7 +286,15 @@ functions:
                 extra_args="$extra_args --continueOnFailure"
               fi
       
      -        ${path_prefix} ${asan_symbolizer} ${enable_lsan} ${rlp_environment} ${python|python} buildscripts/resmoke.py ${resmoke_args} $extra_args ${test_flags} --log=buildlogger --reportFile=report.json
      +        ret=0
      +        for i in {0..100}; do
      +          ${path_prefix} ${asan_symbolizer} ${enable_lsan} ${rlp_environment} ${python|python} buildscripts/resmoke.py ${resmoke_args} $extra_args ${test_flags} --log=buildlogger --reportFile=report.json
      +          ret=$?
      +          if [ $ret -ne 0 ]; then
      +            break
      +          fi
      +        done
      +        exit $ret
       
         "run jstestfuzz":
           - command: shell.exec
      

      The following patch appears to be sufficient to reproduce on a Windows spawn host.

      Unable to find source-code formatter for language: diff. Available languages are: actionscript, ada, applescript, bash, c, c#, c++, cpp, css, erlang, go, groovy, haskell, html, java, javascript, js, json, lua, none, nyan, objc, perl, php, python, r, rainbow, ruby, scala, sh, sql, swift, visualbasic, xml, yaml
      diff --git a/buildscripts/resmokelib/logging/flush.py b/buildscripts/resmokelib/logging/flush.py
      index e49d3d9..0661f60 100644
      --- a/buildscripts/resmokelib/logging/flush.py
      +++ b/buildscripts/resmokelib/logging/flush.py
      @@ -62,6 +62,7 @@ class _FlushThread(threading.Thread):
               Flushes and closes all handlers of 'logger'.
               """
       
      -        for handler in logger.handlers:
      -            handler.flush()
      -            handler.close()
      +        # Continually import a module and instantiate a member of it.
      +        while True:
      +            import collections
      +            assert collections.defaultdict() == collections.defaultdict()
      
      Show
      This seemed to fail most often on Windows (Python 2.7.3) when the Python interpreter was relatively short-lived (~30 seconds). To achieve that, the failpoints and audit tasks were run repeatedly using the following patch: Unable to find source-code formatter for language: diff. Available languages are: actionscript, ada, applescript, bash, c, c#, c++, cpp, css, erlang, go, groovy, haskell, html, java, javascript, js, json, lua, none, nyan, objc, perl, php, python, r, rainbow, ruby, scala, sh, sql, swift, visualbasic, xml, yaml diff --git a/etc/evergreen.yml b/etc/evergreen.yml index 1631460..d08cae8 100644 --- a/etc/evergreen.yml +++ b/etc/evergreen.yml @@ -286,7 +286,15 @@ functions: extra_args= "$extra_args --continueOnFailure" fi - ${path_prefix} ${asan_symbolizer} ${enable_lsan} ${rlp_environment} ${python|python} buildscripts/resmoke.py ${resmoke_args} $extra_args ${test_flags} --log=buildlogger --reportFile=report.json + ret=0 + for i in {0..100}; do + ${path_prefix} ${asan_symbolizer} ${enable_lsan} ${rlp_environment} ${python|python} buildscripts/resmoke.py ${resmoke_args} $extra_args ${test_flags} --log=buildlogger --reportFile=report.json + ret=$? + if [ $ret -ne 0 ]; then + break + fi + done + exit $ret "run jstestfuzz" : - command: shell.exec The following patch appears to be sufficient to reproduce on a Windows spawn host. Unable to find source-code formatter for language: diff. Available languages are: actionscript, ada, applescript, bash, c, c#, c++, cpp, css, erlang, go, groovy, haskell, html, java, javascript, js, json, lua, none, nyan, objc, perl, php, python, r, rainbow, ruby, scala, sh, sql, swift, visualbasic, xml, yaml diff --git a/buildscripts/resmokelib/logging/flush.py b/buildscripts/resmokelib/logging/flush.py index e49d3d9..0661f60 100644 --- a/buildscripts/resmokelib/logging/flush.py +++ b/buildscripts/resmokelib/logging/flush.py @@ -62,6 +62,7 @@ class _FlushThread(threading. Thread ): Flushes and closes all handlers of 'logger' . """ - for handler in logger.handlers: - handler.flush() - handler.close() + # Continually import a module and instantiate a member of it. + while True: + import collections + assert collections.defaultdict() == collections.defaultdict()
    • Query F (02/01/16)
    • 0

      The resmokelib.logging.handlers.BufferedHandler uses a timer thread to periodically send a POST request with logs to logkeeper. A separate thread to flush() and close() the logging handlers (thereby joining the timer thread) was created in order to avoid exhausting the number of threads that can be created on a 32-bit system.

      Due to a bug in the processing of its _LOGGER_QUEUE, the flush thread may try to call flush() on a logging handler after the interpreter has been marked as uninitialized and the module dictionary has been destroyed. This tends to manifest as either

      • "Fatal Python error: PyImport_GetModuleDict: no module dictionary!"
      • "Exception in thread FlushThread (most likely raised during interpreter shutdown)"

      Extracted the relevant bits from pythonrun.c below. The FlushThread is a daemon thread in order to avoid requiring a user to wait for all the output to be logged after triggering an interrupt. This means that the FlushThread is still running after wait_for_thread_shutdown() has been called and it will attempt to send POST requests to logkeeper with any remaining logs.

      /* Wait until threading._shutdown completes, provided
         the threading module was imported in the first place.
         The shutdown routine will wait until all non-daemon
         "threading" threads have completed. */
      static void
      wait_for_thread_shutdown(void)
      {
      #ifdef WITH_THREAD
          PyObject *result;
          PyThreadState *tstate = PyThreadState_GET();
          PyObject *threading = PyMapping_GetItemString(tstate->interp->modules,
                                                        "threading");
          if (threading == NULL) {
              /* threading not imported */
              PyErr_Clear();
              return;
          }
          result = PyObject_CallMethod(threading, "_shutdown", "");
          if (result == NULL)
              PyErr_WriteUnraisable(threading);
          else
              Py_DECREF(result);
          Py_DECREF(threading);
      #endif
      }
      
      void
      Py_Finalize(void)
      {
          PyInterpreterState *interp;
          PyThreadState *tstate;
      
          if (!initialized)
              return;
      
          wait_for_thread_shutdown();
      
          /* The interpreter is still entirely intact at this point, and the
           * exit funcs may be relying on that.  In particular, if some thread
           * or exit func is still waiting to do an import, the import machinery
           * expects Py_IsInitialized() to return true.  So don't say the
           * interpreter is uninitialized until after the exit funcs have run.
           * Note that Threading.py uses an exit func to do a join on all the
           * threads created thru it, so this also protects pending imports in
           * the threads created via Threading.
           */
          call_sys_exitfunc();
          initialized = 0;
      
          ...
      
          /* Destroy all modules */
          PyImport_Cleanup();
      
          ...
      }
      

            Assignee:
            max.hirschhorn@mongodb.com Max Hirschhorn
            Reporter:
            max.hirschhorn@mongodb.com Max Hirschhorn
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated:
              Resolved: