[SERVER-47775] LOGV2_FATAL failed to print stack trace Created: 24/Apr/20  Updated: 29/Oct/23  Resolved: 25/Jun/20

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: 4.4.0-rc0
Fix Version/s: 4.4.1, 4.7.0

Type: Bug Priority: Major - P3
Reporter: Siyuan Zhou Assignee: Amirsaman Memaripour
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Depends
is depended on by SERVER-47944 Revert logging failure to collect sta... Closed
Related
related to SERVER-49110 Fix symbol collection errors for libu... Closed
related to SERVER-48202 Test printStackTrace on initialization Backlog
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Requested:
v4.4
Sprint: Service arch 2020-05-18, Service arch 2020-06-01, Service arch 2020-06-15, Service arch 2020-06-29
Participants:
Linked BF Score: 16

 Description   

In my patch build, I ran into a LOG_FATAL and expected it to print out the stack trace, but it didn’t. Here's the log message:

[js_test:replset3] 2020-04-24T22:11:25.689+0000 d23770| 2020-04-24T22:11:25.688+00:00 I  CONTROL  31430   [OplogApplier-0] "Error collecting stack trace","attr":{"error":"unw_get_proc_name(7FCEF98B5DD3): no unwind info found\nunw_get_proc_name(7FCEF98B70C9): no unwind info found\nunw_get_proc_name(7FCEF98B40B6): no unwind info found\nunw_get_proc_name(7FCEF79CAEE9): no unwind info found\nunw_get_proc_name(7FCEF7B79D7D): no unwind info found\nunw_get_proc_name(7FCEF7B91E73): no unwind info found\nunw_get_proc_name(7FCEF7B92C9A): no unwind info found\nunw_get_proc_name(7FCEF7B88AC8): no unwind info found\nunw_get_proc_name(7FCEF7C1AFB4): no unwind info found\nunw_get_proc_name(7FCEF7C6BA3C): no unwind info found\nunw_get_proc_name(7FCEF922AF73): no unwind info found\nunw_get_proc_name(7FCEF922B382): no unwind info found\nunw_get_proc_name(7FCEF93D3462): no unwind info found\nunw_get_proc_name(7FCEF93D5CB8): no unwind info found\nunw_get_proc_name(7FCEF93D6AA1): no unwind info found\nunw_get_proc_name(7FCEF93D6DD0): no unwind info found\nunw_get_proc_name(7FCEF9A5ED9F): no unwind info found\nunw_get_proc_name(7FCEF98B5DD3): no unwind info found\nunw_get_proc_name(7FCEF98B70C9): no unwind info found\nunw_get_proc_name(7FCEF98B40B6): no unwind info found\nunw_get_proc_name(7FCEF79CAEE9): no unwind info found\nunw_get_proc_name(7FCEF7B79D7D): no unwind info found\nunw_get_proc_name(7FCEF7B91E73): no unwind info found\nunw_get_proc_name(7FCEF7B92C9A): no unwind info found\nunw_get_proc_name(7FCEF7B88AC8): no unwind info found\nunw_get_proc_name(7FCEF7C1AFB4): no unwind info found\nunw_get_proc_name(7FCEF7C6BA3C): no unwind info found\nunw_get_proc_name(7FCEF922AF73): no unwind info found\nunw_get_proc_name(7FCEF922B382): no unwind info found\nunw_get_proc_name(7FCEF93D3462): no unwind info found\nunw_get_proc_name(7FCEF93D5CB8): no unwind info found\nunw_get_proc_name(7FCEF93D6AA1): no unwind info found\nunw_get_proc_name(7FCEF93D6DD0): no unwind info found\nunw_get_proc_name(7FCEF9A5ED9F): no unwind info found\n"}



 Comments   
Comment by Githook User [ 18/Aug/20 ]

Author:

{'name': 'Amirsaman Memaripour', 'email': 'amirsaman.memaripour@mongodb.com', 'username': 'samanca'}

Message: SERVER-47775 Log failure to collect stacktrace at error level

(cherry picked from commit 5b877cc1bc76fffc70261808c13807bb46ddc05b)
Branch: v4.4
https://github.com/mongodb/mongo/commit/3c75b1c4a503cd56e6eaa3ff81463c0a1ea7002c

Comment by Githook User [ 18/Aug/20 ]

Author:

{'name': 'Amirsaman Memaripour', 'email': 'amirsaman.memaripour@mongodb.com', 'username': 'samanca'}

Message: SERVER-47775 Log failure to collect stacktrace at error level

(cherry picked from commit 5b877cc1bc76fffc70261808c13807bb46ddc05b)
Branch: v4.4
https://github.com/mongodb/mongo/commit/e9845000582db2f39c12a272b927e1b705a2382f

Comment by Amirsaman Memaripour [ 25/Jun/20 ]

Closing the ticket as Fixed and moving follow-on work to SERVER-49110. This ticket, SERVER-47775, increases the log level for "Error collecting stack trace" to Error and allows the stacktrace collector to continue pulling symbol information upon receiving libunwind errors (e.g., UNW_ENOINFO).

Comment by Amirsaman Memaripour [ 25/Jun/20 ]

milkie, I created SERVER-49110 to close this ticket, while continuing the investigation there.

bruce.lucas, based on the information in SERVER-49078, I believe this ticket addresses a different issue. The one you mentioned concerns allocation in signal handler, but this one concerns failure to collect stack trace information when using libunwind.

Comment by Eric Milkie [ 25/Jun/20 ]

I don't think so.  I actually managed to reproduce this reliably and was able to step through libunwind in the debugger.  It seems to just be having trouble finding certain frames in the Elf information for certain libraries, and my sense is that it's some conflict between the way some compilers write out the DWARF info for some code segments and the way that libunwind is parsing that info.

Comment by Bruce Lucas (Inactive) [ 25/Jun/20 ]

Any chance this is related to SERVER-49078?

Comment by Eric Milkie [ 25/Jun/20 ]

Also what is this ticket blocked on?  It already has a commit so it needs to have a real fixVersion and be closed.

Comment by Eric Milkie [ 25/Jun/20 ]

Can we backport the above commit into 4.4?

Comment by Githook User [ 05/Jun/20 ]

Author:

{'name': 'Amirsaman Memaripour', 'email': 'amirsaman.memaripour@mongodb.com', 'username': 'samanca'}

Message: SERVER-47775 Save partial stacktrace on collection error
Branch: master
https://github.com/mongodb/mongo/commit/274b30c62675eb3f04f22e90f76f7891041d9147

Comment by Eric Milkie [ 03/Jun/20 ]

I discovered that there can sometimes be missing symbols for certain frames, depending on elf format. However, our current code gives up printing anything if even one frame is missing. Luckily, there is a one-line code fix to avoid this, and thus to still print out a portion of the stack trace that was collected, even if resolving some frames produced errors:

    BSONObj obj = bob.done();
    if (!err.empty()) {
        static constexpr char fmtErr[] = "Error collecting stack trace: {err}";
        if (sink) {
            *sink << fmt::format(fmtErr, "err"_a = err);
        } else {
            LOGV2(31430, fmtErr, "err"_attr = err);
        }
        // return;
    }

Comment by Eric Milkie [ 03/Jun/20 ]

I just encountered this same problem in a local build. I suspect what happened is that my system ran out of free file handles, and the libunwind stack tracer needs to reopen files in order to search for symbols. Unfortunately, libunwind is written in a C style with only a limited number of error codes to return, so there are lots of places in its code that return this particular error status (UNW_ENOINFO).
We could try instrumenting the code, for every UNW_ENOINFO return, to add a Debug() message, and then attempt to reproduce the problem by running a system with an artificially low `ulimit -n` file handle limit.

Comment by Githook User [ 05/May/20 ]

Author:

{'name': 'Amirsaman Memaripour', 'email': 'amirsaman.memaripour@mongodb.com', 'username': 'samanca'}

Message: SERVER-47775 Log failure to collect stacktrace at error level
Branch: master
https://github.com/mongodb/mongo/commit/5b877cc1bc76fffc70261808c13807bb46ddc05b

Generated at Thu Feb 08 05:15:11 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.