Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-26445

resmoke.py's LoggerPipe threads can stop consuming output from their pipes for long periods



    • Fully Compatible
    • TIG 2016-12-12
    • 0


      The LoggerPipe thread calls BufferedHandler.flush_with_lock() when the buffer's size exceeds its capacity. In the case of the buildlogger handlers (i.e. the _BaseBuildloggerHandler class), this is a long-running, blocking network operation that prevents the LoggerPipe thread from continuing to consume output from its pipe. While this is consistent with smoke.py's behavior, it is undesirable to have log message queue up in the pipe's buffer because the timestamps used for ordering log messages from different processes are then skewed.

      Instead, we should offload the sending of all POST requests to logkeeper to the BufferedHandler's AlarmClock thread. This can be done by changing BufferedHandler.flush() to not require holding the lock during the network call and signaling the thread to wake up sooner than its scheduled sleep interval. Doing so will help ensure the LoggerPipe thread is available to read from its pipe and prevent it from becoming blocked when logging to its logging.Logger instance.

      Steps to reproduce

      Apply the following patch to simulate a delay when calling the flush_with_lock() method on the fixture's logger and run a test through resmoke.py.

      diff --git a/buildscripts/resmokeconfig/loggers/console.yml b/buildscripts/resmokeconfig/loggers/console.yml
      index b233de4..d4c4b6a 100644
      --- a/buildscripts/resmokeconfig/loggers/console.yml
      +++ b/buildscripts/resmokeconfig/loggers/console.yml
      @@ -4,9 +4,9 @@ logging:
             - class: logging.StreamHandler
      -    format: '[%(name)s] %(message)s'
      +    format: '[%(name)s] %(asctime)s %(message)s'
      -      - class: logging.StreamHandler
      +      - class: delayed
           format: '[%(name)s] %(asctime)s %(message)s'
      diff --git a/buildscripts/resmokelib/logging/config.py b/buildscripts/resmokelib/logging/config.py
      index c3960bb..9cf2f94 100644
      --- a/buildscripts/resmokelib/logging/config.py
      +++ b/buildscripts/resmokelib/logging/config.py
      @@ -6,9 +6,11 @@ from __future__ import absolute_import
       import logging
       import sys
      +import time
       from . import buildlogger
       from . import formatters
      +from . import handlers
       from . import loggers
      @@ -110,6 +112,26 @@ def apply_buildlogger_test_handler(logger,
      +class DelayedStreamHandler(handlers.BufferedHandler):
      +    def __init__(self, handler):
      +        super(DelayedStreamHandler, self).__init__(5, 1)
      +        self.handler = handler
      +    def setFormatter(self, formatter):
      +        super(DelayedStreamHandler, self).setFormatter(formatter)
      +        self.handler.setFormatter(formatter)
      +    def flush_with_lock(self, close_called):
      +        for record in self.buffer:
      +            self.handler.handle(record)
      +        # Simulate a long network delay.
      +        time.sleep(2)
      +        self.buffer = []
       def _configure_logger(logger, logger_info):
           Adds the handlers specified by the configuration to 'logger'.
      @@ -127,6 +149,8 @@ def _configure_logger(logger, logger_info):
                   handler = logging.NullHandler()
               elif handler_class == "logging.StreamHandler":
                   handler = logging.StreamHandler(sys.stdout)
      +        elif handler_class == "delayed":
      +            handler = DelayedStreamHandler(logging.StreamHandler(sys.stdout))
               elif handler_class == "buildlogger":
                   continue  # Buildlogger handlers are applied when running tests.


      Observe that the timestamp assigned by resmoke.py to the message from the output pipe drifts from the timestamp recorded by the mongod.

      [MongoDFixture:job0] 2016-10-04T01:10:13.799-0400 2016-10-04T01:10:13.798-0400 I CONTROL  [initandlisten] MongoDB starting : pid=69918 port=20000 dbpath=/data/db/job0/resmoke 64-bit host=gopher-blue
      [MongoDFixture:job0] 2016-10-04T01:10:13.799-0400 2016-10-04T01:10:13.799-0400 I CONTROL  [initandlisten] db version v3.3.15
      [MongoDFixture:job0] 2016-10-04T01:10:13.799-0400 2016-10-04T01:10:13.799-0400 I CONTROL  [initandlisten] git version: 520f5571d039b57cf9c319b49654909828971073
      [MongoDFixture:job0] 2016-10-04T01:10:14.261-0400 Waiting to connect to mongod on port 20000.
      [MongoDFixture:job0] 2016-10-04T01:10:15.805-0400 2016-10-04T01:10:13.799-0400 I CONTROL  [initandlisten] allocator: system
      [MongoDFixture:job0] 2016-10-04T01:10:15.805-0400 2016-10-04T01:10:13.799-0400 I CONTROL  [initandlisten] modules: none
      [MongoDFixture:job0] 2016-10-04T01:10:15.806-0400 2016-10-04T01:10:13.799-0400 I CONTROL  [initandlisten] build environment:
      [MongoDFixture:job0] 2016-10-04T01:10:15.806-0400 2016-10-04T01:10:13.799-0400 I CONTROL  [initandlisten]     distarch: x86_64
      [MongoDFixture:job0] 2016-10-04T01:10:15.915-0400 Successfully contacted the mongod on port 20000.
      [MongoDFixture:job0] 2016-10-04T01:10:17.812-0400 2016-10-04T01:10:13.799-0400 I CONTROL  [initandlisten]     target_arch: x86_64
      [MongoDFixture:job0] 2016-10-04T01:10:17.812-0400 2016-10-04T01:10:13.799-0400 I CONTROL  [initandlisten] options: { net: { port: 20000 }, nopreallocj: true, setParameter: { enableTestCommands: "1" }, storage: { dbPath: "/data/db/job0/resmoke" } }
      [MongoDFixture:job0] 2016-10-04T01:10:17.815-0400 2016-10-04T01:10:13.800-0400 I STORAGE  [initandlisten] wiredtiger_open config: create,cache_size=7680M,session_max=20000,eviction=(threads_max=4),config_base=false,statistics=(fast),log=(enabled=true,archive=true,path=journal,compressor=snappy),file_manager=(close_idle_time=100000),checkpoint=(wait=60,log_size=2GB),statistics_log=(wait=0),


        Issue Links



              max.hirschhorn@mongodb.com Max Hirschhorn
              max.hirschhorn@mongodb.com Max Hirschhorn
              1 Vote for this issue
              5 Start watching this issue