Details
-
Improvement
-
Status: Needs Scheduling
-
Minor - P4
-
Resolution: Unresolved
-
None
-
None
-
None
-
Server Development Platform
Description
Can the test harness please log an explicit and informative message that it is about to kill the test processes before doing so. The current behavior is not easily recognizable as being an intentional kill from the test harness and can easily be mistaken for a crash of the server.
Example from BF-27450:
Writing fatal message
|
message:
|
ExternalRecordStoreTest
|
NamedPipeMultiplePipes4
|
Writing fatal message
|
message: Got signal: 6 (Abort trap: 6).
|
mongo::stack_trace_detail::(anonymous namespace)::getStackTraceImpl(mongo::stack_trace_detail::(anonymous namespace)::Options const&)
|
mongo::printStackTrace()
|
abruptQuit
|
_sigtramp
|
__srefill1
|
__fread
|
fread
|
std::__1::basic_filebuf<char, std::__1::char_traits<char> >::underflow()
|
std::__1::basic_streambuf<char, std::__1::char_traits<char> >::uflow()
|
std::__1::basic_streambuf<char, std::__1::char_traits<char> >::xsgetn(char*, long)
|
std::__1::basic_istream<char, std::__1::char_traits<char> >::read(char*, long)
|
mongo::NamedPipeInput::doRead(char*, int)
|
mongo::InputStream<mongo::NamedPipeInput>::readBytes(int, char*)
|
mongo::MultiBsonStreamCursor::nextFromCurrentStream()
|
mongo::MultiBsonStreamCursor::next()
|
mongo::UnitTest_SuiteNameExternalRecordStoreTestTestNameNamedPipeMultiplePipes4::_doTest()
|
mongo::unittest::Test::run()
|
I am told that I am not the only person who has been fooled by this. This looked to me like the reason the test timed out was because the server had crashed and stack dumped and therefore stopped making progress, but the reality was that the server had stopped making progress an hour ago and then the test harness sent "kill -6" to abort the test.
A message something like the following would be helpful to avoid time wasted investigating the wrong thing, and also make it easier for humans using Parsley to find the failure (currently timed out tests do not have the word "timeout" anywhere in the logs):
TEST TIMEOUT FAILURE ABORT: Aborting the test via "kill -6" because it has not made progress for one hour.
|