[SERVER-22142] resmoke.py's FlushThread attempts to reference imported members during Python interpreter shutdown Created: 12/Jan/16  Updated: 18/Nov/16  Resolved: 15/Jan/16

Status: Closed
Project: Core Server
Component/s: Testing Infrastructure
Affects Version/s: None
Fix Version/s: 3.2.3, 3.3.1

Type: Bug Priority: Major - P3
Reporter: Max Hirschhorn Assignee: Max Hirschhorn
Resolution: Done Votes: 0
Labels: test-only
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Completed:
Steps To Reproduce:

This seemed to fail most often on Windows (Python 2.7.3) when the Python interpreter was relatively short-lived (~30 seconds). To achieve that, the failpoints and audit tasks were run repeatedly using the following patch:

diff --git a/etc/evergreen.yml b/etc/evergreen.yml
index 1631460..d08cae8 100644
--- a/etc/evergreen.yml
+++ b/etc/evergreen.yml
@@ -286,7 +286,15 @@ functions:
           extra_args="$extra_args --continueOnFailure"
         fi
 
-        ${path_prefix} ${asan_symbolizer} ${enable_lsan} ${rlp_environment} ${python|python} buildscripts/resmoke.py ${resmoke_args} $extra_args ${test_flags} --log=buildlogger --reportFile=report.json
+        ret=0
+        for i in {0..100}; do
+          ${path_prefix} ${asan_symbolizer} ${enable_lsan} ${rlp_environment} ${python|python} buildscripts/resmoke.py ${resmoke_args} $extra_args ${test_flags} --log=buildlogger --reportFile=report.json
+          ret=$?
+          if [ $ret -ne 0 ]; then
+            break
+          fi
+        done
+        exit $ret
 
   "run jstestfuzz":
     - command: shell.exec

The following patch appears to be sufficient to reproduce on a Windows spawn host.

diff --git a/buildscripts/resmokelib/logging/flush.py b/buildscripts/resmokelib/logging/flush.py
index e49d3d9..0661f60 100644
--- a/buildscripts/resmokelib/logging/flush.py
+++ b/buildscripts/resmokelib/logging/flush.py
@@ -62,6 +62,7 @@ class _FlushThread(threading.Thread):
         Flushes and closes all handlers of 'logger'.
         """
 
-        for handler in logger.handlers:
-            handler.flush()
-            handler.close()
+        # Continually import a module and instantiate a member of it.
+        while True:
+            import collections
+            assert collections.defaultdict() == collections.defaultdict()

Sprint: Query F (02/01/16)
Participants:
Linked BF Score: 0

 Description   

The resmokelib.logging.handlers.BufferedHandler uses a timer thread to periodically send a POST request with logs to logkeeper. A separate thread to flush() and close() the logging handlers (thereby joining the timer thread) was created in order to avoid exhausting the number of threads that can be created on a 32-bit system.

Due to a bug in the processing of its _LOGGER_QUEUE, the flush thread may try to call flush() on a logging handler after the interpreter has been marked as uninitialized and the module dictionary has been destroyed. This tends to manifest as either

  • "Fatal Python error: PyImport_GetModuleDict: no module dictionary!"
  • "Exception in thread FlushThread (most likely raised during interpreter shutdown)"

Extracted the relevant bits from pythonrun.c below. The FlushThread is a daemon thread in order to avoid requiring a user to wait for all the output to be logged after triggering an interrupt. This means that the FlushThread is still running after wait_for_thread_shutdown() has been called and it will attempt to send POST requests to logkeeper with any remaining logs.

/* Wait until threading._shutdown completes, provided
   the threading module was imported in the first place.
   The shutdown routine will wait until all non-daemon
   "threading" threads have completed. */
static void
wait_for_thread_shutdown(void)
{
#ifdef WITH_THREAD
    PyObject *result;
    PyThreadState *tstate = PyThreadState_GET();
    PyObject *threading = PyMapping_GetItemString(tstate->interp->modules,
                                                  "threading");
    if (threading == NULL) {
        /* threading not imported */
        PyErr_Clear();
        return;
    }
    result = PyObject_CallMethod(threading, "_shutdown", "");
    if (result == NULL)
        PyErr_WriteUnraisable(threading);
    else
        Py_DECREF(result);
    Py_DECREF(threading);
#endif
}
 
void
Py_Finalize(void)
{
    PyInterpreterState *interp;
    PyThreadState *tstate;
 
    if (!initialized)
        return;
 
    wait_for_thread_shutdown();
 
    /* The interpreter is still entirely intact at this point, and the
     * exit funcs may be relying on that.  In particular, if some thread
     * or exit func is still waiting to do an import, the import machinery
     * expects Py_IsInitialized() to return true.  So don't say the
     * interpreter is uninitialized until after the exit funcs have run.
     * Note that Threading.py uses an exit func to do a join on all the
     * threads created thru it, so this also protects pending imports in
     * the threads created via Threading.
     */
    call_sys_exitfunc();
    initialized = 0;
 
    ...
 
    /* Destroy all modules */
    PyImport_Cleanup();
 
    ...
}



 Comments   
Comment by Githook User [ 27/Jan/16 ]

Author:

{u'username': u'visemet', u'name': u'Max Hirschhorn', u'email': u'max.hirschhorn@mongodb.com'}

Message: SERVER-22142 join() FlushThread in resmoke.py after running all tests.

This prevents the FlushThread from running while the interpreter is
shutting down and the import machinery is uninitialized.

Also fixed an issue where a KeyboardInterrupt while running multiple
suites with --continueOnFailure would skip to the next suite, rather
than exiting resmoke.py.

(cherry picked from commit 00c8c64f6cd9b0d236d50a8a1cf7152e0367a5cd)
Branch: v3.2
https://github.com/mongodb/mongo/commit/dd6350f7dea586a73d5dbba002deb51f6a5dbce3

Comment by Githook User [ 15/Jan/16 ]

Author:

{u'username': u'visemet', u'name': u'Max Hirschhorn', u'email': u'max.hirschhorn@mongodb.com'}

Message: SERVER-22142 join() FlushThread in resmoke.py after running all tests.

This prevents the FlushThread from running while the interpreter is
shutting down and the import machinery is uninitialized.

Also fixed an issue where a KeyboardInterrupt while running multiple
suites with --continueOnFailure would skip to the next suite, rather
than exiting resmoke.py.
Branch: master
https://github.com/mongodb/mongo/commit/00c8c64f6cd9b0d236d50a8a1cf7152e0367a5cd

Generated at Thu Feb 08 03:59:30 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.