Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-33427

improve detectability of test failing because ShardingTest/ReplSetTest not shut down

    XMLWordPrintable

    Details

      Description

      > Max Hirschhorn Kevin Albertson, I noticed that failing to shut down a ShardingTest/ReplSetTest doesn't cause the test to log a "failed to load" line or a javascript stack trace (which makes sense, since which line would you error on?).

      As an outcome of SERVER-25777, the mongo shell could already exit with a non-zero return code without printing a "failed to load" message.

      > The line that is logged ("a call to MongoRunner.stopMongod(), ReplSetTest#stopSet(), or ShardingTest#stop() may be missing from the test") also isn't/can't be logged at LogSeverity::Error, since it's not logged by a server process (and which makes the log line contain " E ", which is another thing I typically look for when a test fails without "failed to load").
      >
      > It took some confusion and additional scrolling through the logs for me to realize why my new test was reporting failure when it seemed like the test ran to completion successfully. Just a thought, in case there's something that can be done to make this failure easier to detect.

      Esha Maharishi, I think your confusion is understandable. The goal of the message was to make it more obvious to the user what the remediation ought to be. Since that message isn't being surfaced clearly enough, we should change the logic in the mongo shell so that it is.

      I don't see a reason that the mongo shell must use cout for logging the "exiting with a failure due to unterminated processes" message, so we could replace it with a call to severe() instead (and prefix the log message with 'F'). Do you think that would be sufficient for your purposes? Would you mind filing a new SERVER ticket for this improvement request?

      > For example, even just moving the "a call to MongoRunner.stopMongod(), ReplSetTest#stopSet(), or ShardingTest#stop() may be missing from the test" just before/after the "Summary: 1 test(s) ran in 35.86 seconds (0 succeeded, 0 were skipped, 1 failed, 0 errored)" could help.

      Those messages are logged by two different processes (the mongo shell with the former and resmoke.py with the latter) so that isn't really something we'd consider. A related feature in resmoke.py would be to have special handling around certain exit codes from known processes. This case in the mongo shell would be one, but a memory leak detected by ASan/LSan would be another.

      See comment thread on SERVER-25640; one good idea from that thread is to make the mongo shell log an error message at a more severe log level.

        Attachments

          Issue Links

            Activity

              People

              • Votes:
                0 Vote for this issue
                Watchers:
                5 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: