Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-33504

Write performance tests for resmoke.py logging to console and to logkeeper

    XMLWordPrintable

    Details

    • Type: Task
    • Status: Closed
    • Priority: Major - P3
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 3.7.3
    • Component/s: Testing Infrastructure
    • Labels:
      None
    • Backwards Compatibility:
      Fully Compatible
    • Sprint:
      TIG 2018-1-29, TIG 2018-02-12, TIG 2018-02-26, TIG 2018-03-12

      Description

      SERVER-26445 was a bug in resmoke.py that caused arbitrarily long delays between the time when a process would write to its output pipe and the time when resmoke.py would read the log message. This would cause problems for programs such as mongod that rely on writing logs messages to be fast in order to avoid holding LockManager locks and/or mutexes for long periods of time, thereby blocking other threads from doing useful work. In particular, this would manifest as the primary stepping down due to the secondary being unable to respond to heartbeat messages while the ReplicationCoordinatorImpl::_mutex is held.

      Even after the changes from SERVER-26445 where committed, we've seen other cases of time drift between the timestamp that resmoke.py reports and the timestamp that mongod reports.

      BF-4604, BF-4970, BF-6290 are other Evergreen failures where there was an apparent time drift between the timestamp that the MongoDB process reports and the timestamp that resmoke.py reports despite having the changes from SERVER-26445.

      https://jira.mongodb.org/browse/BF-6557?focusedCommentId=1674918&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-1674918

      Additionally, we've also see cases (namely those linked to MCI-2631) where resmoke.py itself gets blocked writing to its output pipe and becomes unable to consume log messages from its subprocesses. The GIL in CPython isn't held while doing I/O, so it is unexpected that the LoggerPipe threads wouldn't be able to continue consuming log messages.

        Attachments

          Issue Links

            Activity

              People

              Assignee:
              max.hirschhorn Max Hirschhorn
              Reporter:
              max.hirschhorn Max Hirschhorn
              Participants:
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved: