Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-90458

Timeout out tests are wrongly categorized as server crash

    • Type: Icon: Bug Bug
    • Resolution: Unresolved
    • Priority: Icon: Critical - P2 Critical - P2
    • None
    • Affects Version/s: None
    • Component/s: None
    • None
    • Services & Integrations
    • ALL

      When evergreen hit idle timeout for a test it will send a SIGABRT signal to all the mongo processes (mongos/mongod).

      The mongo processes will then print the received singal:

      [j1:s0:prim] | 2024-05-13T07:29:58.037+00:00 F  CONTROL  6384300 [S] [initandlisten] "Writing fatal message","attr":{"message":"Got signal: 3 (Quit).

      And additionally will also print all the current stack traces.

      In this scenario, evergreen will categorize the task/tests as follows:

      • The tasks will be marked as "Tasked timed out".
      • The test will be marked as "Failed".
      • The associated BFGs will be marked with "Server crash" severity. I believe this is because the log analyzer find the quit stack traces.

      This is the same we would do for a real server crash. Thus, currently is very complicated to distinguish a BFGs that failed due to reaching the idle timeout versus a BFG that failed do to a server crash.

      In order to differentiate the two I would suggest that in case the test times out due to reaching the hidle timeout we should have the following:

      • The tasks should be marked as "Test timed out"
      • The tests should be markes as "Test timed out" as well.
      • The associated BFGs should be marked as "Server hang" or at least not marked as Server Crash.
         

      This is an example of BFG that timed out and was wrongly markes as "server crash"

            Assignee:
            Unassigned Unassigned
            Reporter:
            tommaso.tocci@mongodb.com Tommaso Tocci
            Votes:
            0 Vote for this issue
            Watchers:
            9 Start watching this issue

              Created:
              Updated: