Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-22165

Deadlock in resmoke.py between logger pipe and timer thread

    • Fully Compatible
    • ALL
    • TIG F (01/29/16)
    • 0

      The logger pipe will acquire the lock for guarding the handler's buffer, and then will attempt to acquire the lock for guarding the timer's state in order to snooze the timer.

      The timer thread will call func while holding the lock on its state, which will attempt to acquire the lock for guarding the handler's buffer by calling flush() on the handler.

      This can lead to a deadlock between the two threads. When this situations occurs, resmoke.py will no longer be able to consume output from the stdout/stderr pipe of the subprocess (e.g. the mongo shell). The pipe will eventually fill up and subsequent writes to it will cause that thread to block. The test's operation will then stall and likely cause the task in Evergreen to time out.

      [2016/01/05 16:45:07.090] Thread 140486153570048:
      [2016/01/05 16:45:12.171]   File "/usr/lib64/python2.6/threading.py", line 504, in __bootstrap
      [2016/01/05 16:45:12.207]     self.__bootstrap_inner()
      [2016/01/05 16:45:12.207]   File "/usr/lib64/python2.6/threading.py", line 532, in __bootstrap_inner
      [2016/01/05 16:45:12.207]     self.run()
      [2016/01/05 16:45:12.207]   File "/data/mci/src/buildscripts/resmokelib/utils/timer.py", line 116, in run
      [2016/01/05 16:45:12.207]     self.func(*self.args, **self.kwargs)
      [2016/01/05 16:45:12.207]   File "/data/mci/src/buildscripts/resmokelib/logging/handlers.py", line 100, in flush
      [2016/01/05 16:45:12.207]     with self._lock:
      [2016/01/05 16:45:12.207] Thread 140486143080192:
      [2016/01/05 16:45:12.207]   File "/usr/lib64/python2.6/threading.py", line 504, in __bootstrap
      [2016/01/05 16:45:12.207]     self.__bootstrap_inner()
      [2016/01/05 16:45:12.207]   File "/usr/lib64/python2.6/threading.py", line 532, in __bootstrap_inner
      [2016/01/05 16:45:12.207]     self.run()
      [2016/01/05 16:45:12.207]   File "/data/mci/src/buildscripts/resmokelib/core/pipe.py", line 66, in run
      [2016/01/05 16:45:12.207]     self.__logger.log(self.__level, line.rstrip())
      [2016/01/05 16:45:12.207]   File "/usr/lib64/python2.6/logging/__init__.py", line 1127, in log
      [2016/01/05 16:45:12.207]     self._log(level, msg, args, **kwargs)
      [2016/01/05 16:45:12.207]   File "/usr/lib64/python2.6/logging/__init__.py", line 1181, in _log
      [2016/01/05 16:45:12.207]     self.handle(record)
      [2016/01/05 16:45:12.207]   File "/usr/lib64/python2.6/logging/__init__.py", line 1191, in handle
      [2016/01/05 16:45:12.207]     self.callHandlers(record)
      [2016/01/05 16:45:12.207]   File "/usr/lib64/python2.6/logging/__init__.py", line 1228, in callHandlers
      [2016/01/05 16:45:12.207]     hdlr.handle(record)
      [2016/01/05 16:45:12.207]   File "/usr/lib64/python2.6/logging/__init__.py", line 687, in handle
      [2016/01/05 16:45:12.207]     self.emit(record)
      [2016/01/05 16:45:12.207]   File "/data/mci/src/buildscripts/resmokelib/logging/handlers.py", line 86, in emit
      [2016/01/05 16:45:12.207]     self._timer.snooze()
      [2016/01/05 16:45:12.207]   File "/data/mci/src/buildscripts/resmokelib/utils/timer.py", line 60, in snooze
      [2016/01/05 16:45:12.207]     with self.lock:
      

      Proposal is to modify resmokelib.utils.timer.AlarmClock.run() to release the lock on the timer's state before calling func, and then reacquire it in order to set self.snoozed = False. This prevents the timer thread from causing an invalid lock ordering so long as func is safe to call on its own.

            Assignee:
            jonathan.abrahams Jonathan Abrahams
            Reporter:
            max.hirschhorn@mongodb.com Max Hirschhorn
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

              Created:
              Updated:
              Resolved: