[SERVER-42228] LoggerRuntimeConfigError exceptions can lead to background dbhash thread running until Evergreen task times out Created: 13/Jul/19  Updated: 29/Oct/23  Resolved: 16/Jul/19

Status: Closed
Project: Core Server
Component/s: Testing Infrastructure
Affects Version/s: None
Fix Version/s: 3.6.14, 4.0.11, 4.2.0-rc3, 4.3.1

Type: Bug Priority: Major - P3
Reporter: Max Hirschhorn Assignee: Max Hirschhorn
Resolution: Fixed Votes: 0
Labels: tig-resmoke
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Depends
Related
is related to SERVER-40514 Race between BufferedHandler.close() ... Closed
is related to SERVER-40592 Uncaught exception in resmoke.py job ... Closed
is related to SERVER-35472 resmoke.py shouldn't fall back to std... Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Requested:
v4.2, v4.0, v3.6
Steps To Reproduce:

Apply the following patch and run a test suite which uses the CheckReplDBHashInBackground hook:

python buildscripts/resmoke.py --suite=replica_sets_jscore_passthrough jstests/core/apitest_dbcollection.js --log=buildlogger

diff --git a/buildscripts/resmoke.py b/buildscripts/resmoke.py
index db33d8b95e..82e7d0aa7c 100755
--- a/buildscripts/resmoke.py
+++ b/buildscripts/resmoke.py
@@ -64,6 +64,8 @@ class Resmoke(object):  # pylint: disable=too-many-instance-attributes
     def _exit_logging(self):
         if not self._interrupted:
             logging.flush.stop_thread()
+            # Sleep here to simulate logging being slow.
+            time.sleep(60)
             self._exit_on_incomplete_logging()
 
     def _exit_on_incomplete_logging(self):
diff --git a/buildscripts/resmokelib/testing/report.py b/buildscripts/resmokelib/testing/report.py
index 3365a17075..436554dfe1 100644
--- a/buildscripts/resmokelib/testing/report.py
+++ b/buildscripts/resmokelib/testing/report.py
@@ -9,6 +9,7 @@ import time
 import unittest
 
 from .. import config as _config
+from .. import errors
 from .. import logging
 
 
@@ -107,6 +108,12 @@ class TestReport(unittest.TestResult):  # pylint: disable=too-many-instance-attr
             if test.dynamic:
                 self.num_dynamic += 1
 
+        if not test.test_name.endswith(("fixture_setup", "CheckReplDBHashInBackground")):
+            # Raise an error.LoggerRuntimeConfigError exception to simulate logkeeper being
+            # overwhelmed when running a test after the fixture has been set up. The goal is to have
+            # the background dbhash thread run but not the actual test file.
+            raise errors.LoggerRuntimeConfigError("Intentionally triggered!")
+
         # Set up the test-specific logger.
         test_logger = self.job_logger.new_test_logger(test.short_name(), test.basename(), command,
                                                       test.logger)

Sprint: STM 2019-07-29
Participants:
Linked BF Score: 11
Story Points: 2

 Description   

If TestReport.startTest() raises an exception, then Job._run_hooks_after_tests() won't be called. For test suites which use the CheckReplDBHashInBackground hook, this leads to the background thread continuing to spawn mongo shell processes and running the run_check_repl_dbhash_background.js hook. If logkeeper is overwhelmed, then an errors.LoggerRuntimeConfigError exception can also occur when attempting to tear the fixture down. This leads the Evergreen task to time out instead of failing with code 75 because the background dbhash check will continue to run while the fixture is still running and resmoke.py's flush thread will therefore never exit.

We don't want to always run the after_test() method for a hook though. For example, if running a test crashes the server, then we shouldn't attempt to run any data consistency checks because they'll just fail to connect to the downed server.



 Comments   
Comment by Githook User [ 16/Jul/19 ]

Author:

{'name': 'Max Hirschhorn', 'email': 'max.hirschhorn@mongodb.com', 'username': 'visemet'}

Message: SERVER-42228 Skip joining flush thread when logs are already incomplete.

(cherry picked from commit c8259b383f8e6f9ce9229774881a55c8dc41e222)
Branch: v3.6
https://github.com/mongodb/mongo/commit/4e4345f3e0668d425e22fbb3a6c1d064d8d8eab8

Comment by Githook User [ 16/Jul/19 ]

Author:

{'name': 'Max Hirschhorn', 'username': 'visemet', 'email': 'max.hirschhorn@mongodb.com'}

Message: SERVER-42228 Skip joining flush thread when logs are already incomplete.

(cherry picked from commit c8259b383f8e6f9ce9229774881a55c8dc41e222)
Branch: v4.0
https://github.com/mongodb/mongo/commit/f4ccbc1c8e75c4752a363d17adb947202fbfdfd9

Comment by Githook User [ 16/Jul/19 ]

Author:

{'name': 'Max Hirschhorn', 'email': 'max.hirschhorn@mongodb.com', 'username': 'visemet'}

Message: SERVER-42228 Skip joining flush thread when logs are already incomplete.

(cherry picked from commit c8259b383f8e6f9ce9229774881a55c8dc41e222)
Branch: v4.2
https://github.com/mongodb/mongo/commit/d37eef2f8a1f3ef0f7d10d1cde016bb366df37a0

Comment by Githook User [ 16/Jul/19 ]

Author:

{'name': 'Max Hirschhorn', 'email': 'max.hirschhorn@mongodb.com', 'username': 'visemet'}

Message: SERVER-42228 Skip joining flush thread when logs are already incomplete.
Branch: master
https://github.com/mongodb/mongo/commit/c8259b383f8e6f9ce9229774881a55c8dc41e222

Generated at Thu Feb 08 04:59:55 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.