Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-22219

Use the subprocess32 package on POSIX systems in resmoke.py if it's available

    • Fully Compatible
    • Hide

      The "invalid free()" issue has only been observed on the Amazon Linux and SUSE 12 build variants. The following script aggressively spawns subprocesses and start threads to asynchronously consume their output exactly as is done in resmoke.py (just without running any actual tests!)

      It fails ~10% of the time on the Amazon Linux build variant and ~20% of the time on SUSE 12.

      #!/usr/bin/env python
      
      from __future__ import absolute_import
      
      import logging
      import os.path
      import sys
      import threading
      
      # Get relative imports to work when the package is not installed on the PYTHONPATH.
      if __name__ == "__main__" and __package__ is None:
          sys.path.append(os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
          from buildscripts import resmokelib
      
      
      # Copied from Python 2.7.
      class NullHandler(logging.Handler):
          """
          This handler does nothing. It's intended to be used to avoid the
          "No handlers could be found for logger XXX" one-off warning. This is
          important for library code, which may contain code to log events. If a user
          of the library does not configure logging, the one-off warning might be
          produced; to avoid this, the library developer simply needs to instantiate
          a NullHandler and add it to the top-level logger of the library module or
          package.
          """
          def handle(self, record):
              pass
      
          def emit(self, record):
              pass
      
          def createLock(self):
              self.lock = None
      
      
      class Worker(object):
      
          ITERATIONS = 10000
      
          def __init__(self, worker_num, failure_flag):
              self.name = "worker%d" % (worker_num)
              self.failure_flag = failure_flag
      
              exec_logger = resmokelib.logging.loggers.EXECUTOR
              self.logger = resmokelib.logging.loggers.new_logger(self.name, parent=exec_logger)
      
          def __call__(self):
              i = 0
              while not self.failure_flag.is_set():
                  if i >= Worker.ITERATIONS:
                      break
      
                  # Suppress output of the subprocess because it isn't that interesting; we only care if
                  # an invalid free() occurs.
                  process_logger = resmokelib.logging.loggers.new_logger(self.name)
                  process_logger.addHandler(NullHandler())
      
                  process = resmokelib.core.process.Process(process_logger, args=["echo", "iter%d" % (i)])
                  process.start()
                  exit_code = process.wait()
      
                  if exit_code != 0:
                      self.logger.error("Exited with code %d", exit_code)
                      self.failure_flag.set()
      
                  i += 1
      
      
      def main():
          logging_config = resmokelib.parser._get_logging_config("console")
          resmokelib.logging.config.apply_config(logging_config)
          resmokelib.logging.flush.start_thread()
      
          threads = []
          failure_flag = threading.Event()
      
          for i in xrange(16):
              t = threading.Thread(target=Worker(i, failure_flag))
              # Do not wait for tests to finish executing if interrupted by the user.
              t.daemon = True
              t.start()
              threads.append(t)
      
          for t in threads:
              t.join()
      
          if failure_flag.is_set():
              # Exit with a nonzero return code so it's easy to identify from a task failure in Evergreen
              # whether an invalid free() occurred.
              sys.exit(1)
      
      if __name__ == "__main__":
          main()
      
      Show
      The "invalid free() " issue has only been observed on the Amazon Linux and SUSE 12 build variants. The following script aggressively spawns subprocesses and start threads to asynchronously consume their output exactly as is done in resmoke.py (just without running any actual tests!) It fails ~10% of the time on the Amazon Linux build variant and ~20% of the time on SUSE 12. #!/usr/ bin /env python from __future__ import absolute_import import logging import os.path import sys import threading # Get relative imports to work when the package is not installed on the PYTHONPATH. if __name__ == "__main__" and __package__ is None : sys.path.append(os.path.dirname(os.path.dirname(os.path.abspath(__file__)))) from buildscripts import resmokelib # Copied from Python 2.7. class NullHandler(logging.Handler): """ This handler does nothing. It's intended to be used to avoid the "No handlers could be found for logger XXX" one-off warning. This is important for library code, which may contain code to log events. If a user of the library does not configure logging, the one-off warning might be produced; to avoid this, the library developer simply needs to instantiate a NullHandler and add it to the top-level logger of the library module or package. """ def handle( self , record): pass def emit( self , record): pass def createLock( self ): self .lock = None class Worker( object ): ITERATIONS = 10000 def __init__( self , worker_num, failure_flag): self .name = "worker%d" % (worker_num) self .failure_flag = failure_flag exec_logger = resmokelib.logging.loggers.EXECUTOR self .logger = resmokelib.logging.loggers.new_logger( self .name, parent=exec_logger) def __call__( self ): i = 0 while not self .failure_flag.is_set(): if i >= Worker.ITERATIONS: break # Suppress output of the subprocess because it isn't that interesting; we only care if # an invalid free() occurs. process_logger = resmokelib.logging.loggers.new_logger( self .name) process_logger.addHandler(NullHandler()) process = resmokelib.core.process.Process(process_logger, args=[ "echo" , " iter %d" % (i)]) process.start() exit_code = process.wait() if exit_code != 0: self .logger.error( "Exited with code %d" , exit_code) self .failure_flag. set () i += 1 def main(): logging_config = resmokelib.parser._get_logging_config( "console" ) resmokelib.logging.config.apply_config(logging_config) resmokelib.logging.flush.start_thread() threads = [] failure_flag = threading.Event() for i in xrange (16): t = threading.Thread(target=Worker(i, failure_flag)) # Do not wait for tests to finish executing if interrupted by the user. t.daemon = True t.start() threads.append(t) for t in threads: t.join() if failure_flag.is_set(): # Exit with a nonzero return code so it's easy to identify from a task failure in Evergreen # whether an invalid free() occurred. sys. exit (1) if __name__ == "__main__" : main()
    • Query F (02/01/16)
    • 0

      The _posixsubprocess C extension module was introduced in CPython 3.2.0 to resolve the issues (e.g. potential to hang) when using the subprocess module from multiple threads concurrently. resmoke.py attempted to work around these thread-safety issues with the subprocess module in Python 2.6 and Python 2.7 by guarding calls to subprocess.Popen() with a lock. However, it turns out that the lock is insufficient for avoiding corruption of Python's internal data structure for thread-local storage in the child process:

      [2016/01/13 06:24:33.816] *** Error in `python': free(): invalid size: 0x00007f0044000b20 ***
      [2016/01/13 06:24:33.818] ======= Backtrace: =========
      [2016/01/13 06:24:33.818] /lib64/libc.so.6(+0x730bf)[0x7f00a54f30bf]
      [2016/01/13 06:24:33.818] /lib64/libc.so.6(+0x7892e)[0x7f00a54f892e]
      [2016/01/13 06:24:33.818] /lib64/libc.so.6(+0x79636)[0x7f00a54f9636]
      [2016/01/13 06:24:33.818] /usr/lib64/libpython2.7.so.1.0(PyThread_ReInitTLS+0x4a)[0x7f00a5b530ca]
      [2016/01/13 06:24:33.818] /usr/lib64/libpython2.7.so.1.0(PyOS_AfterFork+0x10)[0x7f00a5ae5119]
      [2016/01/13 06:24:33.818] /usr/lib64/libpython2.7.so.1.0(+0x10ef86)[0x7f00a5b53f86]
      [2016/01/13 06:24:33.818] /usr/lib64/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x5d1)[0x7f00a5b0c6e1]
      [2016/01/13 06:24:33.818] /usr/lib64/libpython2.7.so.1.0(PyEval_EvalCodeEx+0x4fe)[0x7f00a5b1329e]
      [2016/01/13 06:24:33.818] /usr/lib64/libpython2.7.so.1.0(PyEval_EvalFrameEx+0xc73)[0x7f00a5b0cd83]
      [2016/01/13 06:24:33.818] /usr/lib64/libpython2.7.so.1.0(PyEval_EvalCodeEx+0x4fe)[0x7f00a5b1329e]
      [2016/01/13 06:24:33.818] /usr/lib64/libpython2.7.so.1.0(+0xb24ab)[0x7f00a5af74ab]
      [2016/01/13 06:24:33.818] /usr/lib64/libpython2.7.so.1.0(PyObject_Call+0x46)[0x7f00a5af26c6]
      [2016/01/13 06:24:33.818] /usr/lib64/libpython2.7.so.1.0(+0xae5aa)[0x7f00a5af35aa]
      [2016/01/13 06:24:33.818] /usr/lib64/libpython2.7.so.1.0(PyObject_Call+0x46)[0x7f00a5af26c6]
      [2016/01/13 06:24:33.818] /usr/lib64/libpython2.7.so.1.0(+0xbe7f9)[0x7f00a5b037f9]
      [2016/01/13 06:24:33.818] /usr/lib64/libpython2.7.so.1.0(+0xbde2a)[0x7f00a5b02e2a]
      [2016/01/13 06:24:33.818] /usr/lib64/libpython2.7.so.1.0(PyObject_Call+0x46)[0x7f00a5af26c6]
      [2016/01/13 06:24:33.818] /usr/lib64/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x144e)[0x7f00a5b0d55e]
      [2016/01/13 06:24:33.818] /usr/lib64/libpython2.7.so.1.0(PyEval_EvalCodeEx+0x221)[0x7f00a5b12fc1]
      [2016/01/13 06:24:33.818] /usr/lib64/libpython2.7.so.1.0(PyEval_EvalFrameEx+0xc73)[0x7f00a5b0cd83]
      [2016/01/13 06:24:33.818] /usr/lib64/libpython2.7.so.1.0(PyEval_EvalCodeEx+0x221)[0x7f00a5b12fc1]
      [2016/01/13 06:24:33.818] /usr/lib64/libpython2.7.so.1.0(+0xb24ab)[0x7f00a5af74ab]
      [2016/01/13 06:24:33.818] /usr/lib64/libpython2.7.so.1.0(PyObject_Call+0x46)[0x7f00a5af26c6]
      [2016/01/13 06:24:33.818] /usr/lib64/libpython2.7.so.1.0(+0xae5aa)[0x7f00a5af35aa]
      [2016/01/13 06:24:33.818] /usr/lib64/libpython2.7.so.1.0(PyObject_Call+0x46)[0x7f00a5af26c6]
      [2016/01/13 06:24:33.818] /usr/lib64/libpython2.7.so.1.0(+0x6e792)[0x7f00a5ab3792]
      [2016/01/13 06:24:33.818] /usr/lib64/libpython2.7.so.1.0(PyObject_Call+0x46)[0x7f00a5af26c6]
      [2016/01/13 06:24:33.818] /usr/lib64/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x21f5)[0x7f00a5b0e305]
      [2016/01/13 06:24:33.818] /usr/lib64/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x29cc)[0x7f00a5b0eadc]
      [2016/01/13 06:24:33.818] /usr/lib64/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x29cc)[0x7f00a5b0eadc]
      [2016/01/13 06:24:33.818] /usr/lib64/libpython2.7.so.1.0(PyEval_EvalCodeEx+0x221)[0x7f00a5b12fc1]
      [2016/01/13 06:24:33.818] /usr/lib64/libpython2.7.so.1.0(+0xb222f)[0x7f00a5af722f]
      [2016/01/13 06:24:33.818] /usr/lib64/libpython2.7.so.1.0(PyObject_Call+0x46)[0x7f00a5af26c6]
      [2016/01/13 06:24:33.818] /usr/lib64/libpython2.7.so.1.0(+0xae5aa)[0x7f00a5af35aa]
      [2016/01/13 06:24:33.818] /usr/lib64/libpython2.7.so.1.0(PyObject_Call+0x46)[0x7f00a5af26c6]
      [2016/01/13 06:24:33.819] /usr/lib64/libpython2.7.so.1.0(PyEval_CallObjectWithKeywords+0x30)[0x7f00a5b0bc40]
      [2016/01/13 06:24:33.819] /usr/lib64/libpython2.7.so.1.0(+0x9f25e)[0x7f00a5ae425e]
      [2016/01/13 06:24:33.819] /lib64/libpthread.so.0(+0x80a4)[0x7f00a58300a4]
      [2016/01/13 06:24:33.819] /lib64/libc.so.6(clone+0x6d)[0x7f00a55657fd]
      

      The actual error message can vary, but the invalid free() triggers a SIGABRT and causes the child process to exit with a return code of -6:

      • "python: free(): invalid pointer"
      • "python: double free or corruption"
      • "python: munmap_chunk(): invalid pointer"

      The version of the subprocess module in Python 3.2 is available as a separate package called subprocess32. Using it will resolve the memory corruption issue because the call to PyOS_AfterFork() (and thus the call to PyThread_ReInitTLS()) is skipped when preexec_fn is None.

            Assignee:
            max.hirschhorn@mongodb.com Max Hirschhorn
            Reporter:
            max.hirschhorn@mongodb.com Max Hirschhorn
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

              Created:
              Updated:
              Resolved: