[SERVER-72596] Stack trace symbolization fails for large stack traces Created: 06/Jan/23  Updated: 29/Oct/23  Resolved: 06/Jan/23

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: 6.3.0-rc0

Type: Bug Priority: Major - P3
Reporter: Charlie Swanson Assignee: Charlie Swanson
Resolution: Fixed Votes: 2
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Backwards Compatibility: Fully Compatible
Operating System: ALL
Participants:

 Description   

On my machine it appears I can symbolize small stack traces, but not large ones. Largest I got working was 42 frames, smallest non-working was 50 frames.

It appears like there is an error in resetting the LibUnwindStepIteration iterator in LibUnwindStepIteration::start(). max.hirschhorn@mongodb.com noticed that the raw iterator resets _i to 0 in start(), but this _i = 0 is missing from the LibUnwind version.

Adding that in to LibUnwindStepIteration::start() resolved my issue. I think small ones work because the _i counter can be iterated twice and stay under the maxFrames limit of 100. But if your stack trace is too large it cuts off. I'm not sure why 50 didn't work though, there may be an off-by-one somewhere.

Thanks to daniel.gottlieb@mongodb.com and max.hirschhorn@mongodb.com for helping me debug and diagnose.



 Comments   
Comment by Githook User [ 06/Jan/23 ]

Author:

{'name': 'Charlie Swanson', 'email': 'charlie.swanson@mongodb.com', 'username': 'cswanson310'}

Message: SERVER-72596 Fix symbolization for large stacktraces with libunwind
Branch: master
https://github.com/mongodb/mongo/commit/9b4c21782463b34a01f437b94a1af891afbb35b2

Comment by Billy Donahue [ 06/Jan/23 ]

I agree with the assessment about the _i variable. Sorry about that!

Comment by Charlie Swanson [ 06/Jan/23 ]

Oh actually my test of 50 frames actually appears to be >= 100 if I instrument with this patch:

diff --git a/src/mongo/util/stacktrace_posix.cpp b/src/mongo/util/stacktrace_posix.cpp
index 7d9fc0cd692..c989f2dae49 100644
--- a/src/mongo/util/stacktrace_posix.cpp
+++ b/src/mongo/util/stacktrace_posix.cpp
@@ -135,7 +135,9 @@ public:
  */
 std::vector<uintptr_t> uniqueBases(IterationIface& iter, size_t capacity) {
     std::vector<uintptr_t> bases;
+    int i = 0;
     for (iter.start(iter.kSymbolic); bases.size() < capacity && !iter.done(); iter.advance()) {
+        std::cout << "uniqueBases i " << i++ << "\n";
         const auto& f = iter.deref();
         if (!f.file())
             continue;
@@ -151,7 +153,9 @@ std::vector<uintptr_t> uniqueBases(IterationIface& iter, size_t capacity) {
 
 void appendBacktrace(BSONObjBuilder* obj, IterationIface& iter, const Options& options) {
     BSONArrayBuilder frames(obj->subarrayStart("backtrace"));
+    int i = 0;
     for (iter.start(iter.kSymbolic); !iter.done(); iter.advance()) {
+        std::cout << "appendBacktrace i " << i++ << "\n";
         const auto& meta = iter.deref();
         const uintptr_t addr = reinterpret_cast<uintptr_t>(meta.address());
         BSONObjBuilder frame(frames.subobjStart());
@@ -273,6 +277,7 @@ public:
 
 private:
     void start(Flags flags) override {
+        std::cout << "LibunwindStepIteration::start() " << static_cast<void*>(this) << "\n";
         _flags = flags;
         _end = false;

I had previously just been going off the number of lines in the demangled stack trace printed in the log

Comment by Charlie Swanson [ 06/Jan/23 ]

Sending to Service arch since I believe billy.donahue@mongodb.com has the code context and can probably quickly approve the change or tell us we're missing something. I'm not sure if we can test this effectively.

Generated at Thu Feb 08 06:22:16 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.