[SERVER-22219] Use the subprocess32 package on POSIX systems in resmoke.py if it's available Created: 18/Jan/16  Updated: 18/Nov/16  Resolved: 22/Jan/16

Status: Closed
Project: Core Server
Component/s: Testing Infrastructure
Affects Version/s: None
Fix Version/s: 3.2.3, 3.3.1

Type: Bug Priority: Major - P3
Reporter: Max Hirschhorn Assignee: Max Hirschhorn
Resolution: Done Votes: 0
Labels: test-only
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
Related
Backwards Compatibility: Fully Compatible
Backport Completed:
Steps To Reproduce:

The "invalid free()" issue has only been observed on the Amazon Linux and SUSE 12 build variants. The following script aggressively spawns subprocesses and start threads to asynchronously consume their output exactly as is done in resmoke.py (just without running any actual tests!)

It fails ~10% of the time on the Amazon Linux build variant and ~20% of the time on SUSE 12.

#!/usr/bin/env python
 
from __future__ import absolute_import
 
import logging
import os.path
import sys
import threading
 
# Get relative imports to work when the package is not installed on the PYTHONPATH.
if __name__ == "__main__" and __package__ is None:
    sys.path.append(os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
    from buildscripts import resmokelib
 
 
# Copied from Python 2.7.
class NullHandler(logging.Handler):
    """
    This handler does nothing. It's intended to be used to avoid the
    "No handlers could be found for logger XXX" one-off warning. This is
    important for library code, which may contain code to log events. If a user
    of the library does not configure logging, the one-off warning might be
    produced; to avoid this, the library developer simply needs to instantiate
    a NullHandler and add it to the top-level logger of the library module or
    package.
    """
    def handle(self, record):
        pass
 
    def emit(self, record):
        pass
 
    def createLock(self):
        self.lock = None
 
 
class Worker(object):
 
    ITERATIONS = 10000
 
    def __init__(self, worker_num, failure_flag):
        self.name = "worker%d" % (worker_num)
        self.failure_flag = failure_flag
 
        exec_logger = resmokelib.logging.loggers.EXECUTOR
        self.logger = resmokelib.logging.loggers.new_logger(self.name, parent=exec_logger)
 
    def __call__(self):
        i = 0
        while not self.failure_flag.is_set():
            if i >= Worker.ITERATIONS:
                break
 
            # Suppress output of the subprocess because it isn't that interesting; we only care if
            # an invalid free() occurs.
            process_logger = resmokelib.logging.loggers.new_logger(self.name)
            process_logger.addHandler(NullHandler())
 
            process = resmokelib.core.process.Process(process_logger, args=["echo", "iter%d" % (i)])
            process.start()
            exit_code = process.wait()
 
            if exit_code != 0:
                self.logger.error("Exited with code %d", exit_code)
                self.failure_flag.set()
 
            i += 1
 
 
def main():
    logging_config = resmokelib.parser._get_logging_config("console")
    resmokelib.logging.config.apply_config(logging_config)
    resmokelib.logging.flush.start_thread()
 
    threads = []
    failure_flag = threading.Event()
 
    for i in xrange(16):
        t = threading.Thread(target=Worker(i, failure_flag))
        # Do not wait for tests to finish executing if interrupted by the user.
        t.daemon = True
        t.start()
        threads.append(t)
 
    for t in threads:
        t.join()
 
    if failure_flag.is_set():
        # Exit with a nonzero return code so it's easy to identify from a task failure in Evergreen
        # whether an invalid free() occurred.
        sys.exit(1)
 
if __name__ == "__main__":
    main()

Sprint: Query F (02/01/16)
Participants:
Linked BF Score: 0

 Description   

The _posixsubprocess C extension module was introduced in CPython 3.2.0 to resolve the issues (e.g. potential to hang) when using the subprocess module from multiple threads concurrently. resmoke.py attempted to work around these thread-safety issues with the subprocess module in Python 2.6 and Python 2.7 by guarding calls to subprocess.Popen() with a lock. However, it turns out that the lock is insufficient for avoiding corruption of Python's internal data structure for thread-local storage in the child process:

[2016/01/13 06:24:33.816] *** Error in `python': free(): invalid size: 0x00007f0044000b20 ***
[2016/01/13 06:24:33.818] ======= Backtrace: =========
[2016/01/13 06:24:33.818] /lib64/libc.so.6(+0x730bf)[0x7f00a54f30bf]
[2016/01/13 06:24:33.818] /lib64/libc.so.6(+0x7892e)[0x7f00a54f892e]
[2016/01/13 06:24:33.818] /lib64/libc.so.6(+0x79636)[0x7f00a54f9636]
[2016/01/13 06:24:33.818] /usr/lib64/libpython2.7.so.1.0(PyThread_ReInitTLS+0x4a)[0x7f00a5b530ca]
[2016/01/13 06:24:33.818] /usr/lib64/libpython2.7.so.1.0(PyOS_AfterFork+0x10)[0x7f00a5ae5119]
[2016/01/13 06:24:33.818] /usr/lib64/libpython2.7.so.1.0(+0x10ef86)[0x7f00a5b53f86]
[2016/01/13 06:24:33.818] /usr/lib64/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x5d1)[0x7f00a5b0c6e1]
[2016/01/13 06:24:33.818] /usr/lib64/libpython2.7.so.1.0(PyEval_EvalCodeEx+0x4fe)[0x7f00a5b1329e]
[2016/01/13 06:24:33.818] /usr/lib64/libpython2.7.so.1.0(PyEval_EvalFrameEx+0xc73)[0x7f00a5b0cd83]
[2016/01/13 06:24:33.818] /usr/lib64/libpython2.7.so.1.0(PyEval_EvalCodeEx+0x4fe)[0x7f00a5b1329e]
[2016/01/13 06:24:33.818] /usr/lib64/libpython2.7.so.1.0(+0xb24ab)[0x7f00a5af74ab]
[2016/01/13 06:24:33.818] /usr/lib64/libpython2.7.so.1.0(PyObject_Call+0x46)[0x7f00a5af26c6]
[2016/01/13 06:24:33.818] /usr/lib64/libpython2.7.so.1.0(+0xae5aa)[0x7f00a5af35aa]
[2016/01/13 06:24:33.818] /usr/lib64/libpython2.7.so.1.0(PyObject_Call+0x46)[0x7f00a5af26c6]
[2016/01/13 06:24:33.818] /usr/lib64/libpython2.7.so.1.0(+0xbe7f9)[0x7f00a5b037f9]
[2016/01/13 06:24:33.818] /usr/lib64/libpython2.7.so.1.0(+0xbde2a)[0x7f00a5b02e2a]
[2016/01/13 06:24:33.818] /usr/lib64/libpython2.7.so.1.0(PyObject_Call+0x46)[0x7f00a5af26c6]
[2016/01/13 06:24:33.818] /usr/lib64/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x144e)[0x7f00a5b0d55e]
[2016/01/13 06:24:33.818] /usr/lib64/libpython2.7.so.1.0(PyEval_EvalCodeEx+0x221)[0x7f00a5b12fc1]
[2016/01/13 06:24:33.818] /usr/lib64/libpython2.7.so.1.0(PyEval_EvalFrameEx+0xc73)[0x7f00a5b0cd83]
[2016/01/13 06:24:33.818] /usr/lib64/libpython2.7.so.1.0(PyEval_EvalCodeEx+0x221)[0x7f00a5b12fc1]
[2016/01/13 06:24:33.818] /usr/lib64/libpython2.7.so.1.0(+0xb24ab)[0x7f00a5af74ab]
[2016/01/13 06:24:33.818] /usr/lib64/libpython2.7.so.1.0(PyObject_Call+0x46)[0x7f00a5af26c6]
[2016/01/13 06:24:33.818] /usr/lib64/libpython2.7.so.1.0(+0xae5aa)[0x7f00a5af35aa]
[2016/01/13 06:24:33.818] /usr/lib64/libpython2.7.so.1.0(PyObject_Call+0x46)[0x7f00a5af26c6]
[2016/01/13 06:24:33.818] /usr/lib64/libpython2.7.so.1.0(+0x6e792)[0x7f00a5ab3792]
[2016/01/13 06:24:33.818] /usr/lib64/libpython2.7.so.1.0(PyObject_Call+0x46)[0x7f00a5af26c6]
[2016/01/13 06:24:33.818] /usr/lib64/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x21f5)[0x7f00a5b0e305]
[2016/01/13 06:24:33.818] /usr/lib64/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x29cc)[0x7f00a5b0eadc]
[2016/01/13 06:24:33.818] /usr/lib64/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x29cc)[0x7f00a5b0eadc]
[2016/01/13 06:24:33.818] /usr/lib64/libpython2.7.so.1.0(PyEval_EvalCodeEx+0x221)[0x7f00a5b12fc1]
[2016/01/13 06:24:33.818] /usr/lib64/libpython2.7.so.1.0(+0xb222f)[0x7f00a5af722f]
[2016/01/13 06:24:33.818] /usr/lib64/libpython2.7.so.1.0(PyObject_Call+0x46)[0x7f00a5af26c6]
[2016/01/13 06:24:33.818] /usr/lib64/libpython2.7.so.1.0(+0xae5aa)[0x7f00a5af35aa]
[2016/01/13 06:24:33.818] /usr/lib64/libpython2.7.so.1.0(PyObject_Call+0x46)[0x7f00a5af26c6]
[2016/01/13 06:24:33.819] /usr/lib64/libpython2.7.so.1.0(PyEval_CallObjectWithKeywords+0x30)[0x7f00a5b0bc40]
[2016/01/13 06:24:33.819] /usr/lib64/libpython2.7.so.1.0(+0x9f25e)[0x7f00a5ae425e]
[2016/01/13 06:24:33.819] /lib64/libpthread.so.0(+0x80a4)[0x7f00a58300a4]
[2016/01/13 06:24:33.819] /lib64/libc.so.6(clone+0x6d)[0x7f00a55657fd]

The actual error message can vary, but the invalid free() triggers a SIGABRT and causes the child process to exit with a return code of -6:

  • "python: free(): invalid pointer"
  • "python: double free or corruption"
  • "python: munmap_chunk(): invalid pointer"

The version of the subprocess module in Python 3.2 is available as a separate package called subprocess32. Using it will resolve the memory corruption issue because the call to PyOS_AfterFork() (and thus the call to PyThread_ReInitTLS()) is skipped when preexec_fn is None.



 Comments   
Comment by Githook User [ 25/Jan/16 ]

Author:

{u'username': u'visemet', u'name': u'Max Hirschhorn', u'email': u'max.hirschhorn@mongodb.com'}

Message: SERVER-22219 Use subprocess32 on POSIX systems if it's available.

(cherry picked from commit 153bc4fd12610ba0c1f53e5b6cbdeaa9df574b32)
Branch: v3.2
https://github.com/mongodb/mongo/commit/fbab3da6d560dd5b5d8d18ef2f2e09224387c109

Comment by Githook User [ 22/Jan/16 ]

Author:

{u'username': u'visemet', u'name': u'Max Hirschhorn', u'email': u'max.hirschhorn@mongodb.com'}

Message: SERVER-22219 Use subprocess32 on POSIX systems if it's available.
Branch: master
https://github.com/mongodb/mongo/commit/153bc4fd12610ba0c1f53e5b6cbdeaa9df574b32

Generated at Thu Feb 08 03:59:45 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.