Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-72860

Python exceptions in create_fixture_table() cause resmoke to incorrectly mark Evergreen tasks as setup failures

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Critical - P2
    • Resolution: Fixed
    • 5.0.0, 6.0.0, 6.2.0-rc6
    • 6.3.0-rc0
    • Testing Infrastructure
    • None
    • Server Development Platform
    • Fully Compatible
    • ALL
    • Hide

      python buildscripts/resmoke.py run --suite=sharding_jscore_passthrough --log=buildlogger jstests/core/query/all/all.js
      

      diff --git a/buildscripts/resmokelib/logging/buildlogger.py b/buildscripts/resmokelib/logging/buildlogger.py
      index 1ff2689ea64..d1e89db4278 100644
      --- a/buildscripts/resmokelib/logging/buildlogger.py
      +++ b/buildscripts/resmokelib/logging/buildlogger.py
      @@ -266,10 +266,7 @@ class BuildloggerServer(object):
           def __init__(self):
               """Initialize BuildloggerServer."""
               tmp_globals = {}
      -        self.config = {}
      -        exec(
      -            compile(open(_BUILDLOGGER_CONFIG, "rb").read(), _BUILDLOGGER_CONFIG, 'exec'),
      -            tmp_globals, self.config)
      +        self.config = dict(username="u", password="p", builder="b", build_num="1")
       
               # Rename "slavename" to "username" if present.
               if "slavename" in self.config and "username" not in self.config:
      diff --git a/buildscripts/resmokelib/logging/flush.py b/buildscripts/resmokelib/logging/flush.py
      index 16335ef44ab..a2b32d5e9a2 100644
      --- a/buildscripts/resmokelib/logging/flush.py
      +++ b/buildscripts/resmokelib/logging/flush.py
      @@ -35,7 +35,7 @@ def stop_thread():
           _FLUSH_THREAD.signal_shutdown()
           # Wait for 1min instead of _FLUSH_THREAD.await_shutdown() because we can
           # sometimes wait indefinitely for a response, causing a task timeout.
      -    _FLUSH_THREAD.join(60)
      +    _FLUSH_THREAD.join(5)
       
           success = not _FLUSH_THREAD.is_alive()
           return success
      diff --git a/buildscripts/resmokelib/logging/handlers.py b/buildscripts/resmokelib/logging/handlers.py
      index 29292a3bdef..5ff2d068762 100644
      --- a/buildscripts/resmokelib/logging/handlers.py
      +++ b/buildscripts/resmokelib/logging/handlers.py
      @@ -192,6 +192,8 @@ class HTTPHandler(object):
               on the content type.
               """
       
      +        return dict(id="fake_id")
      +
               data = utils.default_if_none(data, [])
               data = json.dumps(data)
       
      diff --git a/buildscripts/resmokelib/testing/fixtures/shardedcluster.py b/buildscripts/resmokelib/testing/fixtures/shardedcluster.py
      index dddb01ca8d2..dbfdabf1018 100644
      --- a/buildscripts/resmokelib/testing/fixtures/shardedcluster.py
      +++ b/buildscripts/resmokelib/testing/fixtures/shardedcluster.py
      @@ -267,6 +267,7 @@ class ShardedClusterFixture(interface.Fixture):
               output = []
               for shard in self.shards:
                   output += shard.get_node_info()
      +        raise AttributeError("Intentionally raised")
               for mongos in self.mongos:
                   output += mongos.get_node_info()
               return output + self.configsvr.get_node_info()
      

      Show
      python buildscripts/resmoke.py run --suite=sharding_jscore_passthrough --log=buildlogger jstests/core/query/all/all.js diff --git a/buildscripts/resmokelib/logging/buildlogger.py b/buildscripts/resmokelib/logging/buildlogger.py index 1ff2689ea64..d1e89db4278 100644 --- a/buildscripts/resmokelib/logging/buildlogger.py +++ b/buildscripts/resmokelib/logging/buildlogger.py @@ -266,10 +266,7 @@ class BuildloggerServer(object): def __init__(self): """Initialize BuildloggerServer.""" tmp_globals = {} - self.config = {} - exec( - compile(open(_BUILDLOGGER_CONFIG, "rb").read(), _BUILDLOGGER_CONFIG, 'exec'), - tmp_globals, self.config) + self.config = dict(username="u", password="p", builder="b", build_num="1")   # Rename "slavename" to "username" if present. if "slavename" in self.config and "username" not in self.config: diff --git a/buildscripts/resmokelib/logging/flush.py b/buildscripts/resmokelib/logging/flush.py index 16335ef44ab..a2b32d5e9a2 100644 --- a/buildscripts/resmokelib/logging/flush.py +++ b/buildscripts/resmokelib/logging/flush.py @@ -35,7 +35,7 @@ def stop_thread(): _FLUSH_THREAD.signal_shutdown() # Wait for 1min instead of _FLUSH_THREAD.await_shutdown() because we can # sometimes wait indefinitely for a response, causing a task timeout. - _FLUSH_THREAD.join(60) + _FLUSH_THREAD.join(5)   success = not _FLUSH_THREAD.is_alive() return success diff --git a/buildscripts/resmokelib/logging/handlers.py b/buildscripts/resmokelib/logging/handlers.py index 29292a3bdef..5ff2d068762 100644 --- a/buildscripts/resmokelib/logging/handlers.py +++ b/buildscripts/resmokelib/logging/handlers.py @@ -192,6 +192,8 @@ class HTTPHandler(object): on the content type. """   + return dict(id="fake_id") + data = utils.default_if_none(data, []) data = json.dumps(data)   diff --git a/buildscripts/resmokelib/testing/fixtures/shardedcluster.py b/buildscripts/resmokelib/testing/fixtures/shardedcluster.py index dddb01ca8d2..dbfdabf1018 100644 --- a/buildscripts/resmokelib/testing/fixtures/shardedcluster.py +++ b/buildscripts/resmokelib/testing/fixtures/shardedcluster.py @@ -267,6 +267,7 @@ class ShardedClusterFixture(interface.Fixture): output = [] for shard in self.shards: output += shard.get_node_info() + raise AttributeError("Intentionally raised") for mongos in self.mongos: output += mongos.get_node_info() return output + self.configsvr.get_node_info()
    • 159

    Description

      Some of the commits which were impacted by BF-27442 had a large number of setup failures (for example).

      [2023/01/11 04:17:48.924] [executor:js_test:job0] 04:17:48.919Z The setup of ShardedClusterFixture (Job #0) failed.
      [2023/01/11 04:17:48.932] [executor:js_test:job0] 04:17:48.928Z Encountered an error when tearing down the fixture.
      [2023/01/11 04:17:48.932] Traceback (most recent call last):
      [2023/01/11 04:17:48.932]   File "/data/mci/4d10ab542ccbbb72ec0cfdae37347ba8/src/buildscripts/resmokelib/testing/job.py", line 95, in __call__
      [2023/01/11 04:17:48.932]     teardown_succeeded = self.manager.teardown_fixture(self.logger)
      [2023/01/11 04:17:48.932]   File "/data/mci/4d10ab542ccbbb72ec0cfdae37347ba8/src/buildscripts/resmokelib/testing/job.py", line 384, in teardown_fixture
      [2023/01/11 04:17:48.932]     self.report.logging_prefix = create_fixture_table(self.fixture)
      [2023/01/11 04:17:48.932]   File "/data/mci/4d10ab542ccbbb72ec0cfdae37347ba8/src/buildscripts/resmokelib/testing/fixtures/interface.py", line 360, in create_fixture_table
      [2023/01/11 04:17:48.932]     info: List[NodeInfo] = fixture.get_node_info()
      [2023/01/11 04:17:48.932]   File "/data/mci/4d10ab542ccbbb72ec0cfdae37347ba8/src/buildscripts/resmokelib/testing/fixtures/shardedcluster.py", line 271, in get_node_info
      [2023/01/11 04:17:48.932]     output += mongos.get_node_info()
      [2023/01/11 04:17:48.932]   File "/data/mci/4d10ab542ccbbb72ec0cfdae37347ba8/src/buildscripts/resmokelib/testing/fixtures/shardedcluster.py", line 573, in get_node_info
      [2023/01/11 04:17:48.932]     port=self.port, pid=self.mongos.pid)
      [2023/01/11 04:17:48.932] AttributeError: 'NoneType' object has no attribute 'pid'
      ...
      [2023/01/11 04:19:00.678] [resmoke] 04:19:00.678Z Failed to flush all logs within a reasonable amount of time, treating logs as incomplete
      [2023/01/11 04:19:00.678] [resmoke] 04:19:00.678Z Exiting with code 75 rather than requested code 2 because we failed to flush all log output to logkeeper.
      

      https://parsley.mongodb.com/evergreen/mongodb_mongo_master_enterprise_rhel_80_64_bit_dynamic_all_feature_flags_required_sharding_jscore_passthrough_3a842713b25c2945fe1884abd8e60203f37f6258_23_01_11_03_08_29/0/task?bookmarks=0,1522,1597,1598&selectedLine=1522

      Setup failures are intentionally ignored by the Build Barons so this can lead to delays in the timeliness of identifying true failures. (Setup failures are ignored because Logkeeper instability has been generally accepted and accommodated within the testing infrastructure, see SERVER-35472. The concept of setup failures may be worth revisiting now that Logkeeper has moved to S3 but I'm considering that outside the scope of this issue here.)

      It looks like the changes to standalone.py in 3805148 as part of SERVER-66045 made it so get_node_info() wouldn't raise an exception when the fixture setup had failed for mongod. However there is an equivalent case for when the fixture setup had failed for mongos and is why the setup failures observed here all happen with the ShardedClusterFixture being used.

      Note: The uncaught exception at fixture teardown also causes resmoke to leak processes upon exit. It may we worthwhile to revisit whether the calls to create_fixture_table() in job.py should have their own try/except block too.

      Attachments

        Issue Links

          Activity

            People

              tausif.rahman@mongodb.com Tausif Rahman
              max.hirschhorn@mongodb.com Max Hirschhorn
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: