[SERVER-33427] improve detectability of test failing because ShardingTest/ReplSetTest not shut down Created: 21/Feb/18  Updated: 29/Oct/23  Resolved: 04/Oct/18

Status: Closed
Project: Core Server
Component/s: Testing Infrastructure
Affects Version/s: 3.7.2
Fix Version/s: 4.1.4

Type: Improvement Priority: Major - P3
Reporter: Esha Maharishi (Inactive) Assignee: Yves Duhem
Resolution: Fixed Votes: 0
Labels: tig-qwin-eligible
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
is related to SERVER-25640 Have ReplSetTest run checkDBHashes() ... Closed
Backwards Compatibility: Fully Compatible
Sprint: DAG 2018-10-08
Participants:
Story Points: 2

 Description   

> Max Hirschhorn Kevin Albertson, I noticed that failing to shut down a ShardingTest/ReplSetTest doesn't cause the test to log a "failed to load" line or a javascript stack trace (which makes sense, since which line would you error on?).

As an outcome of SERVER-25777, the mongo shell could already exit with a non-zero return code without printing a "failed to load" message.

> The line that is logged ("a call to MongoRunner.stopMongod(), ReplSetTest#stopSet(), or ShardingTest#stop() may be missing from the test") also isn't/can't be logged at LogSeverity::Error, since it's not logged by a server process (and which makes the log line contain " E ", which is another thing I typically look for when a test fails without "failed to load").
>
> It took some confusion and additional scrolling through the logs for me to realize why my new test was reporting failure when it seemed like the test ran to completion successfully. Just a thought, in case there's something that can be done to make this failure easier to detect.

Esha Maharishi, I think your confusion is understandable. The goal of the message was to make it more obvious to the user what the remediation ought to be. Since that message isn't being surfaced clearly enough, we should change the logic in the mongo shell so that it is.

I don't see a reason that the mongo shell must use cout for logging the "exiting with a failure due to unterminated processes" message, so we could replace it with a call to severe() instead (and prefix the log message with 'F'). Do you think that would be sufficient for your purposes? Would you mind filing a new SERVER ticket for this improvement request?

> For example, even just moving the "a call to MongoRunner.stopMongod(), ReplSetTest#stopSet(), or ShardingTest#stop() may be missing from the test" just before/after the "Summary: 1 test(s) ran in 35.86 seconds (0 succeeded, 0 were skipped, 1 failed, 0 errored)" could help.

Those messages are logged by two different processes (the mongo shell with the former and resmoke.py with the latter) so that isn't really something we'd consider. A related feature in resmoke.py would be to have special handling around certain exit codes from known processes. This case in the mongo shell would be one, but a memory leak detected by ASan/LSan would be another.

See comment thread on SERVER-25640; one good idea from that thread is to make the mongo shell log an error message at a more severe log level.



 Comments   
Comment by Githook User [ 04/Oct/18 ]

Author:

{'name': 'Yves Duhem', 'email': 'yves.duhem@mongodb.com', 'username': 'syev'}

Message: SERVER-33427 Update shell logs when it exits with error
Branch: master
https://github.com/mongodb/mongo/commit/b51003a59880eb8edb1abe5c5c01662e8c2eccde

Comment by Robert Guo (Inactive) [ 16/Aug/18 ]

We should also log a message with the exit code of the shell when it is non-zero and change all usages of cout to severe() or error() in dbshell.cpp when exiting with a non-zero code.

Comment by Yves Duhem [ 21/Feb/18 ]

max.hirschhorn, yes, the fatal logs could be added to the context causes displayed by the tool.

Comment by Max Hirschhorn [ 21/Feb/18 ]

robert.guo, yves.duhem, would having the Build Baron tool search for fatal log messages ('F' prefixed) from the mongo shell be a relatively generic way of surfacing this kind of information?

Generated at Thu Feb 08 04:33:22 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.