[SERVER-73006] consider disabling separate debug for a majority of evergreen builds Created: 18/Jan/23  Updated: 27/Oct/23  Resolved: 27/Oct/23

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Improvement Priority: Minor - P4
Reporter: Daniel Moody Assignee: [DO NOT ASSIGN] Backlog - Server Development Platform Team (SDP) (Inactive)
Resolution: Won't Do Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
related to SERVER-49679 Dump JS debug information when aborti... Backlog
related to SERVER-75070 increase hang analyzer self test time... Closed
is related to SERVER-71954 Evergreen tasks do not always depend ... Closed
Assigned Teams:
Server Development Platform
Participants:

 Description   

We use separate debug in most cases and it adds complexity to maintain the debug symbols along with the application binaries through our testing process. It's questionable why we use it because in almost all cases we want debug symbols with the binary, except for release packages.

 

There should be discussion weighing the pros and cons of of separate-debug and if it still makes sense in the number of tasks its used in.



 Comments   
Comment by Max Hirschhorn [ 20/Jan/23 ]

Another experiment I would consider trying out is whether (i) it is possible to call a function at a known address by its address, even without having access to the debug symbol and (ii) add some testing-only logging to the mongo shell which makes it report the necessary addresses for such a command to still be run within GDB on the threads with active JavaScript runtimes.

I got curious and tried this out. It appears to work but validation would be nice and appreciated. Based on https://isocpp.org/wiki/faq/pointers-to-members#cant-cvt-fnptr-to-voidptr, I believe it isn't possible to get the address of the member function so I added a free function which accepts a MozJSImplScope* to achieve the same effect.

how SpiderMonkey stashes some of its state in thread-local variables

FWIW, I had not switched to the thread with the currentJSScope thread-local variable defined in my experiment here. Perhaps it doesn't actually matter? It may be safer to also log std::this_thread::get_id() to match the pthread_t* address up with the info threads output from GDB and switch the correct thread.

$ du -chs ./build/install/bin/mongo ./build/install/lib/lib* | tail -1
16G     total
$ /opt/mongodbtoolchain/v4/bin/strip ./build/install/bin/mongo ./build/install/lib/lib*
$ du -chs ./build/install/bin/mongo ./build/install/lib/lib* | tail -1
301M    total
$ /opt/mongodbtoolchain/v4/bin/gdb --args ./build/install/bin/mongo --nodb --eval '(function f() { (function g() { while (true) { sleep(1) } })() })()'
(gdb) run
Starting new scope: currentJSScope = 0x555555fe7000 buildStackString2 = 0x7ffff537e170
 
.. In another terminal session
-> $ kill -s SIGTRAP $(pidof mongo)
 
(gdb) call ((char* (*) (void*)) 0x7ffff537e170) (0x555555fe7000)
$1 = 0x555555f59b80 "g@(shell eval):1:53\nf@(shell eval):1:61\n@(shell eval):1:66\n"

diff --git a/src/mongo/scripting/mozjs/implscope.cpp b/src/mongo/scripting/mozjs/implscope.cpp
index 73c541e6161..1533a7fd173 100644
--- a/src/mongo/scripting/mozjs/implscope.cpp
+++ b/src/mongo/scripting/mozjs/implscope.cpp
@@ -117,6 +117,19 @@ bool gFirstRuntimeCreated = false;
 bool closeToMaxMemory() {
     return mongo::sm::get_total_bytes() > (kInterruptGCThreshold * mongo::sm::get_max_bytes());
 }
+
+char* buildStackString2(MozJSImplScope* scope) {
+    auto stackStr = scope->buildStackString();
+    // We intentionally leak memory here to allow the value to be returned to a debugger without
+    // being destroyed.
+    char* ret = new char[stackStr.size() + 1];
+
+    std::memcpy(ret, stackStr.c_str(), stackStr.size());
+    ret[stackStr.size()] = '\0';
+
+    return ret;
+}
+
 }  // namespace
 
 thread_local MozJSImplScope::ASANHandles* currentASANHandles = nullptr;
@@ -510,6 +523,11 @@ MozJSImplScope::MozJSImplScope(MozJSScriptEngine* engine, boost::optional<int> j
     }
 
     currentJSScope = this;
+
+    std::cout << "Starting new scope: currentJSScope = "
+              << static_cast<void*>(currentJSScope.load())
+              << " buildStackString2 = " << reinterpret_cast<void*>(&buildStackString2)
+              << std::endl;
 }
 
 MozJSImplScope::~MozJSImplScope() {

Comment by Max Hirschhorn [ 20/Jan/23 ]

While musing over Dan G's idea of registering a signal handler to log the JavaScript backtrace of any thread in the mongo shell running JavaScript, I realized there may be a challenge of how currentJSScope is a thread-local variable and how SpiderMonkey stashes some of its state in thread-local variables. I'm not sure how possible (safe or valid) it is for the signal handler thread to access the runtime of another thread the way mongodb-javascript-stack in the debugger switches to each thread in the mongo shell process and calls MozJSImplScope::buildStackString(). (The safety matters less here because by the time the hang analyzer has run we've given up on the cluster and its ok if it soon crashes as long as we get the diagnostics we want. But I'm not positive if we'll consistently get the diagnostics either.)

Another experiment I would consider trying out is whether (i) it is possible to call a function at a known address by its address, even without having access to the debug symbol and (ii) add some testing-only logging to the mongo shell which makes it report the necessary addresses for such a command to still be run within GDB on the threads with active JavaScript runtimes.

Comment by Daniel Moody [ 20/Jan/23 ]
  1. Do it after the fact - one crash get coredump and have a bot that crawls backwards and gets the coredump with the debug info
    1. This doesn't work with javascript backtraces

javascript backtraces should not need the debug symbols to be produced. If the jstest shell could implement a feature where a certain signal was sent to the jstest shell process and handled, it could call the appropriate functions internally to generate the js backtrace on command without needing debug symbols to do this through gdb. (credit to daniel.gottlieb@mongodb.com for the idea) 

Comment by Alex Neben [ 20/Jan/23 ]

Costs:

  1. Cost to switch initially
  2. Downloading time will increase with always including debug info (costs for compute will increase)
    1. Latency loss is not that big of a deal

 

Benefits:

  1. For dev workflows it will always be better to not have split debug info
  2. Will simplify build + test

 

Split dwarf - splitting out just dwarf info - for incremental rebuild it will be shorter (does not work with drawf5) - debug info doesn't need to be linked

split debug - splitting out all debug info (this is more than just dwarf) - for incremental rebuild it will take just as long (will not help build)

 

Ideas:

  1. Continue with trying to keep everything split and running the hang analyzer + spawn host at time of program hang / death
    1. Could always try to download the debug symbols while the test is running
  2. Bring it all together (slower builds and downloads)
  3. Do it after the fact - one crash get coredump and have a bot that crawls backwards and gets the coredump with the debug info
    1. This doesn't work with javascript backtraces
Generated at Thu Feb 08 06:23:23 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.