[SERVER-68475] Find solution to relocation overflow in static builds Created: 01/Aug/22  Updated: 29/Oct/23  Resolved: 01/Sep/22

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: 6.0.2, 6.1.0-rc1, 6.2.0-rc0

Type: Improvement Priority: Blocker - P1
Reporter: Daniel Moody Assignee: Daniel Moody
Resolution: Fixed Votes: 0
Labels: dp-qol
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: PNG File image-2022-08-12-09-25-09-638.png    
Issue Links:
Backports
Problem/Incident
causes SERVER-69765 turn off default split dwarf on darwi... Closed
Related
related to SERVER-70555 disable unnecessary use of -fdebug-ty... Closed
is related to SERVER-68474 Add -gsplit-dwarf as temp fix for rel... Closed
is related to SERVER-70839 Spawning dynamically linked mongod pr... Closed
Backwards Compatibility: Fully Compatible
Backport Requested:
v6.1
Participants:
Linked BF Score: 160

 Description   

https://jira.mongodb.org/browse/BF-25986 was a hot issue which we implemented a temporary fix in SERVER-68474 while we work to find a better solution. This ticket is for investigating potential solutions and implementing the best one or at least removing the TODO if the temp solution from SERVER-68474 is the best one.

 

This issue should block the 6.1 release because the current temporary fix breaks the ability to easily debug the binaries.



 Comments   
Comment by Githook User [ 01/Sep/22 ]

Author:

{'name': 'Daniel Moody', 'email': 'daniel.moody@mongodb.com', 'username': 'dmoody256'}

Message: SERVER-68475 use debug-types-section to reduce debug info

(cherry picked from commit b41c85c2e798f11db81b6c0ba3ca400bbf01f063)
Branch: v6.0
https://github.com/mongodb/mongo/commit/2ca8e21c41223428b9dec1f713c29633441d3964

Comment by Githook User [ 30/Aug/22 ]

Author:

{'name': 'Daniel Moody', 'email': 'daniel.moody@mongodb.com', 'username': 'dmoody256'}

Message: Revert "SERVER-68475 use debug-types-section to reduce debug info"

This reverts commit 26054e10384bd0d46de6e047a36083181d872be1.
Branch: v6.0
https://github.com/mongodb/mongo/commit/dbe3923c3771d5f4670416c5c783c68eb03208f3

Comment by Githook User [ 29/Aug/22 ]

Author:

{'name': 'Daniel Moody', 'email': 'daniel.moody@mongodb.com', 'username': 'dmoody256'}

Message: SERVER-68475 use debug-types-section to reduce debug info

(cherry picked from commit b41c85c2e798f11db81b6c0ba3ca400bbf01f063)
Branch: v6.0
https://github.com/mongodb/mongo/commit/26054e10384bd0d46de6e047a36083181d872be1

Comment by Githook User [ 29/Aug/22 ]

Author:

{'name': 'Daniel Moody', 'email': 'daniel.moody@mongodb.com', 'username': 'dmoody256'}

Message: SERVER-68475 only use custom configure check with gcc and clang.

(cherry picked from commit 53aa9bec47e34bd010b5dc6b751474c74c3a1797)
Branch: v6.1
https://github.com/mongodb/mongo/commit/96cdc44b67a7f1aa848400888d8e29acf6c9bc3f

Comment by Githook User [ 29/Aug/22 ]

Author:

{'name': 'Daniel Moody', 'email': 'daniel.moody@mongodb.com', 'username': 'dmoody256'}

Message: SERVER-68475 use debug-types-section to reduce debug info

(cherry picked from commit b41c85c2e798f11db81b6c0ba3ca400bbf01f063)
Branch: v6.1
https://github.com/mongodb/mongo/commit/e810b3d11d7554fd71ac4e800c31757b033ab039

Comment by Githook User [ 29/Aug/22 ]

Author:

{'name': 'Daniel Moody', 'email': 'daniel.moody@mongodb.com', 'username': 'dmoody256'}

Message: SERVER-68475 only use custom configure check with gcc and clang.
Branch: master
https://github.com/mongodb/mongo/commit/53aa9bec47e34bd010b5dc6b751474c74c3a1797

Comment by Githook User [ 26/Aug/22 ]

Author:

{'name': 'Daniel Moody', 'email': 'daniel.moody@mongodb.com', 'username': 'dmoody256'}

Message: SERVER-68475 use debug-types-section to reduce debug info
Branch: master
https://github.com/mongodb/mongo/commit/b41c85c2e798f11db81b6c0ba3ca400bbf01f063

Comment by Daniel Moody [ 25/Aug/22 ]

I tested adding this option with v3 gcc:

-fdebug-types-section
When using DWARF Version 4 or higher, type DIEs can be put into their own .debug_types section instead of making them part of the .debug_info section. It is more efficient to put them in a separate comdat section since the linker can then remove duplicates. But not all DWARF consumers support .debug_types sections yet and on some objects .debug_types produces larger instead of smaller debugging information.

and looks like it also solves the problem:

    FILE SIZE        VM SIZE    
 --------------  -------------- 
  47.3%  2.05Gi   0.0%       0    .debug_info
  21.7%   962Mi   0.0%       0    .debug_loc
   8.6%   382Mi   0.0%       0    .debug_types
   8.2%   366Mi   0.0%       0    .debug_str
   5.3%   234Mi   0.0%       0    .debug_ranges
   4.4%   193Mi   0.0%       0    .debug_line
   1.3%  58.1Mi  53.0%  58.1Mi    .text
   ...
 100.0%  4.33Gi 100.0%   109Mi    TOTAL

This solution would be better than dwp because it would save time and diskspace and not have the extra file to carry around.

EDIT: this really wont save any time, actually split-dwarf with dwp is the same time as no split-dwarf no dwp, so just potentially diskspace and the baggage would be saved.

Comment by Daniel Moody [ 17/Aug/22 ]

yes v4 gcc is much better about debug info and section size here is a mongod (recent master) bloaty from v4:

  FILE SIZE        VM SIZE    
 --------------  -------------- 
  69.5%  2.89Gi   0.0%       0    .debug_info
  12.9%   549Mi   0.0%       0    .debug_str
   8.4%   359Mi   0.0%       0    .debug_loclists
   3.7%   158Mi   0.0%       0    .debug_line
   1.4%  61.4Mi   0.0%       0    .debug_rnglists
   1.3%  55.3Mi  52.4%  55.3Mi    .text
...
 100.0%  4.17Gi 100.0%   105Mi    TOTAL

Here is v3 right before we started getting relocation overflows:

    FILE SIZE        VM SIZE    
 --------------  -------------- 
  68.2%  3.82Gi   0.0%       0    .debug_info
  15.7%   902Mi   0.0%       0    .debug_loc
   5.9%   340Mi   0.0%       0    .debug_str
   3.9%   221Mi   0.0%       0    .debug_ranges
   3.2%   180Mi   0.0%       0    .debug_line
   1.0%  55.2Mi  52.5%  55.2Mi    .text
...
 100.0%  5.60Gi 100.0%   105Mi    TOTAL

So switching to v4 frees up about a GB of available section size

Comment by Daniel Moody [ 17/Aug/22 ]

Are the split-dwarf problems we're experiencing also problems on newer toolchains? I'm curious if we're stuck here partly because we couldn't complete the last toolchain upgrade.

Yeah we don't really have any static builders for gcc v4, but I tested locally without split-dwarf and it looks like the newer gcc does not face the same relocation overflow issue, so maybe its become more efficient in regards to section data. I'll see if I can verify its close to the limit.

Comment by Andy Schwerin [ 17/Aug/22 ]

Indeed, I do. I think there's a good observation from Daniel, though, that we don't need to have "statically linked" debug information. As long as we had some easy-to-acquire-and-move bundle with the debug information, and it was easy to make that information available to GDB and the stack symbolizer, we could probably defer the "linking" of the debug information that happens today until it was actually required.

Comment by Andy Schwerin [ 17/Aug/22 ]

Are the split-dwarf problems we're experiencing also problems on newer toolchains? I'm curious if we're stuck here partly because we couldn't complete the last toolchain upgrade.

Comment by Daniel Moody [ 15/Aug/22 ]

another possible solution: Currently, we split the debug info out after the link step, which is were we end up running out of section space. What if we split the debug info out at the object file level using objcopy after each compile step. This is essentially almost the same thing as gsplit-dwarf, but it may get around some of the dwarf format issues we're seeing. I am not sure how to then combine the object files split debug into a single binary to produce the same .debug file we make during our link step, but I am pretty sure you can link the object file split debugs similar to just linking a binary, but the thing I am unsure about is how the linker would determine what symbols it really needs to put in the final .debug if it was separated out earlier at the object level? Maybe it doesn't matter if the .debug has all the object debug symbols?

Comment by Daniel Moody [ 12/Aug/22 ]

Definitely a fair amount of regression with the large code model:

Comment by Daniel Moody [ 12/Aug/22 ]

Here is some perf builds with the different -mcmodel options:

-mcmodel=medium (failed from internal compiler error):

-mcmodel=large:

Comment by Daniel Moody [ 11/Aug/22 ]

We should consider impacts and edge cases around mongosymb.py. This tool use llvm-symbolizer under the hood, which llvm-symbolizer appears to automatically use the .dwp if the .dwp is next to the .debug file. There might be other cases where the mongosymb.py tool fetches symbols remotely and doesn't download the .dwp file? 

Comment by Daniel Moody [ 09/Aug/22 ]

alexander.neben@mongodb.com it is not an issue for clang.

Comment by Alex Neben [ 09/Aug/22 ]

We should also confirm this affects both clang and gcc builds

Comment by Ryan Egesdahl (Inactive) [ 08/Aug/22 ]

I think that if we spend some time focusing on work like SERVER-68657 and DAG-1964, a dynamic release would be a much better long-term option for a wide variety of reasons. I think there would still be a few considerations like how we handle our libgcc redirection in the toolchain and some other oddities we've engaged in over the years, but just those two tickets alone would be a huge benefit to server engineers and end-users. Short-term, we apparently don't have a method for distributing debug symbols beyond a tarball we might ask customers to download, so we can probably get away with continuing to use split-dwarf.

Comment by Alex Neben [ 08/Aug/22 ]

I updated my above list of ideas. I see a lot of watchers on this ticket and would love to hear feedback from anyone about next steps here.

Comment by Daniel Moody [ 08/Aug/22 ]

ryan.egesdahl@mongodb.com Just want to clarify a few things in regard to the last comment. So we are still doing --separate-debug, that has not changed at all. Both the .debug and the .dwp are needed to debug with gdb.

Comment by Ryan Egesdahl (Inactive) [ 08/Aug/22 ]

daniel.moody@mongodb.com I think releasing with .dwp instead of the current --separate-debug mechanism using objcopy is going to have user-facing changes because we won't be able to install debug symbols into a distro-specific path anymore. I don't disagree with doing it that way, necessarily, but we will have to have some conversations with Product and Release teams about it and update our documentation accordingly. I'll do a little research to see if there's some way we can do some similar linking mechanism that we do with --separate-debug.

Comment by Iryna Zhuravlova [ 08/Aug/22 ]

ryan.egesdahl@mongodb.com  feel free to add your comments here

Comment by Alex Neben [ 04/Aug/22 ]

Here are a few ideas that can act as strawmen in no particular order.

  • Stop building mongodb statically. Releases and perf testing will use shared objects.
    • Project size: M, I think we would want to bundle all our current shared objects into about 5-15 meaningful ones. We would also need to change a lot of release and testing code to work with shared objects. My guess is ~2 months of work.
    • Considerations:
      • We need to bundle shared objects with our packages which will come with performance hits (best case: only slower start up, expected case: 1-5% slower execution + slower startup).
      • This will scale very well, if our shared objects are getting too big we can split them.
      • This is a very standard approach for linux which means that we should see a good amount of support
      • This will make our releases a little grosser since they now come with many shared objects
  • Continue using split-dwarf and combine the dwo files into a single dwp file that can be used with gdb.
    • Project size: S/M. We just need to include a dwp file along with our binary whenever we need backtraces or to debug
    • Considerations:
      • This is not a common solution meaning we can expect to see weird bugs (daniel.moody@mongodb.com has already seen some)
      • Should have no perf impact (or a positive one if any)
      • Will we include this dwp file with our executable?
  • Use mcmodel large/medium and eat the perf hit (if there is one).
    • Project size: S. This is just a new compiler flag
    • Considerations:
      • This is not a common solution meaning we can expect to see weird bugs.
      • This will not scale forever, at my last place this worked for a while before we hit the new limit imposed by mcmodel large and then we switched to shared objects.
      • Documented that this will have a negative perf impact (guess between 1-5%)
      • This will change nothing for our downstream user.
  • Play with some linker scripts and see if there is some solution there.
    • Project size: L? This would be very exploratory
    • Considerations:
      • This would be exploratory so I have no idea if this would work or what the implications of this are.
  • Strip out some debug information from specific object files. For example, we would no longer have an debug information from example_file.cpp
    • Project size: S.
    • Considerations:
      • Easy to do
      • No perf problems
      • Would lose debug information from files
      • Will not scale well, every month or so we will have to strip some debug info from new files.

Let me know thoughts if anyone here has them.

I think the best solution is to go with shared objects since that is a very standard linux/windows/mac pattern.

Generated at Thu Feb 08 06:10:54 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.