[SERVER-22165] Deadlock in resmoke.py between logger pipe and timer thread Created: 13/Jan/16  Updated: 18/Nov/16  Resolved: 19/Jan/16

Status: Closed
Project: Core Server
Component/s: Testing Infrastructure
Affects Version/s: None
Fix Version/s: 3.2.3, 3.3.1

Type: Bug Priority: Major - P3
Reporter: Max Hirschhorn Assignee: Jonathan Abrahams
Resolution: Done Votes: 0
Labels: test-only
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Completed:
Sprint: TIG F (01/29/16)
Participants:
Linked BF Score: 0

 Description   

The logger pipe will acquire the lock for guarding the handler's buffer, and then will attempt to acquire the lock for guarding the timer's state in order to snooze the timer.

The timer thread will call func while holding the lock on its state, which will attempt to acquire the lock for guarding the handler's buffer by calling flush() on the handler.

This can lead to a deadlock between the two threads. When this situations occurs, resmoke.py will no longer be able to consume output from the stdout/stderr pipe of the subprocess (e.g. the mongo shell). The pipe will eventually fill up and subsequent writes to it will cause that thread to block. The test's operation will then stall and likely cause the task in Evergreen to time out.

[2016/01/05 16:45:07.090] Thread 140486153570048:
[2016/01/05 16:45:12.171]   File "/usr/lib64/python2.6/threading.py", line 504, in __bootstrap
[2016/01/05 16:45:12.207]     self.__bootstrap_inner()
[2016/01/05 16:45:12.207]   File "/usr/lib64/python2.6/threading.py", line 532, in __bootstrap_inner
[2016/01/05 16:45:12.207]     self.run()
[2016/01/05 16:45:12.207]   File "/data/mci/src/buildscripts/resmokelib/utils/timer.py", line 116, in run
[2016/01/05 16:45:12.207]     self.func(*self.args, **self.kwargs)
[2016/01/05 16:45:12.207]   File "/data/mci/src/buildscripts/resmokelib/logging/handlers.py", line 100, in flush
[2016/01/05 16:45:12.207]     with self._lock:
[2016/01/05 16:45:12.207] Thread 140486143080192:
[2016/01/05 16:45:12.207]   File "/usr/lib64/python2.6/threading.py", line 504, in __bootstrap
[2016/01/05 16:45:12.207]     self.__bootstrap_inner()
[2016/01/05 16:45:12.207]   File "/usr/lib64/python2.6/threading.py", line 532, in __bootstrap_inner
[2016/01/05 16:45:12.207]     self.run()
[2016/01/05 16:45:12.207]   File "/data/mci/src/buildscripts/resmokelib/core/pipe.py", line 66, in run
[2016/01/05 16:45:12.207]     self.__logger.log(self.__level, line.rstrip())
[2016/01/05 16:45:12.207]   File "/usr/lib64/python2.6/logging/__init__.py", line 1127, in log
[2016/01/05 16:45:12.207]     self._log(level, msg, args, **kwargs)
[2016/01/05 16:45:12.207]   File "/usr/lib64/python2.6/logging/__init__.py", line 1181, in _log
[2016/01/05 16:45:12.207]     self.handle(record)
[2016/01/05 16:45:12.207]   File "/usr/lib64/python2.6/logging/__init__.py", line 1191, in handle
[2016/01/05 16:45:12.207]     self.callHandlers(record)
[2016/01/05 16:45:12.207]   File "/usr/lib64/python2.6/logging/__init__.py", line 1228, in callHandlers
[2016/01/05 16:45:12.207]     hdlr.handle(record)
[2016/01/05 16:45:12.207]   File "/usr/lib64/python2.6/logging/__init__.py", line 687, in handle
[2016/01/05 16:45:12.207]     self.emit(record)
[2016/01/05 16:45:12.207]   File "/data/mci/src/buildscripts/resmokelib/logging/handlers.py", line 86, in emit
[2016/01/05 16:45:12.207]     self._timer.snooze()
[2016/01/05 16:45:12.207]   File "/data/mci/src/buildscripts/resmokelib/utils/timer.py", line 60, in snooze
[2016/01/05 16:45:12.207]     with self.lock:

Proposal is to modify resmokelib.utils.timer.AlarmClock.run() to release the lock on the timer's state before calling func, and then reacquire it in order to set self.snoozed = False. This prevents the timer thread from causing an invalid lock ordering so long as func is safe to call on its own.



 Comments   
Comment by Githook User [ 19/Jan/16 ]

Author:

{u'username': u'hptabster', u'name': u'Jonathan Abrahams', u'email': u'jonathan@mongodb.com'}

Message: SERVER-22165 Fix resmoke.py deadlock between logger pipe and timer thread

(cherry picked from commit 8aa1202a37900752c102ce7b324f41853ec5c4c9)
Branch: v3.2
https://github.com/mongodb/mongo/commit/04a340e44af0fdf7e0a66f347a06adf0b56f7e68

Comment by Githook User [ 19/Jan/16 ]

Author:

{u'username': u'hptabster', u'name': u'Jonathan Abrahams', u'email': u'jonathan@mongodb.com'}

Message: SERVER-22165 Fix resmoke.py deadlock between logger pipe and timer thread
Branch: master
https://github.com/mongodb/mongo/commit/8aa1202a37900752c102ce7b324f41853ec5c4c9

Generated at Thu Feb 08 03:59:35 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.