[CXX-90] Crash due to static initialization order fiasco in BSON implementation Created: 09/Nov/11  Updated: 08/Jan/24  Resolved: 02/Apr/14

Status: Closed
Project: C++ Driver
Component/s: None
Affects Version/s: None
Fix Version/s: legacy-0.8.0

Type: Bug Priority: Critical - P2
Reporter: Balint Szente Assignee: Mira Carey
Resolution: Done Votes: 1
Labels: boost, bson, c++, client, crash, cxxmove, initialization, legacy-cxx, static, windows
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Microsoft Windows XP SP3 32 bit
Microsoft Visual C++ 2008
Boost 1.43


Attachments: PNG File 00_call_once.png     PNG File 01_once_flag.png     File static_fiasco.7z    
Issue Links:
Depends
depends on SERVER-5112 Need a better story for startup-time ... Closed
Related
Server Compat: 2.3

 Description   

The actual BSON implementation assumes the following static initialization order: (static) global variables of Boost Thread implementation are initialized before MongoDB's global (static) variables. However, as you know it, the C/C++ standard does not guarantee this across different translation units.

To give an example:
1. static_fiasco.7z attachment contains two directories:
a. mongodb-src-r2.0.1 - the debug variant of the static mongoclient library, against Boost 1.43. It was built with the following command:
> scons mongoclient --d
b. mongotest - a very basic application, that uses the static mongo library from the first directory.
2. running the debug variant of the mongotest.exe most of the time results in crash (just run the already built EXE file from the Debug directory).

What happens? I tracked down with OllyDbg the cause of the corruption:
1. Due to the fact the mongotest is linked statically with mongoclient and boost, on my computer the linker choosed to initialize mongoclient before boost.
2. This is executed from the dbclient.cpp:
const BSONObj getlasterrorcmdobj = fromjson("

{getlasterror:1}

");
It could be also other static initialization, for example from jsobjcpp:
BSONObj staticNull = fromjson( "

{'':null}

" );
It does not matter which one. Depends on the linker, how it constructs the array with the static initialization callbacks.
3. fromjson() indirectly calls the "set_current_thread_data()" from boost\libs\thread\src\win32\thread.cpp
Please check the 00_call_once.png screenshot with the complete callstack.
4. This sets the "current_thread_tls_init_flag" from the thread.cpp
5. When the static initialization from the mongoclient is finished the boost_thread one will be executed, which will destroy/reset the already set current_thread_tls_init_flag.
Please check the 01_once_flag.png screenshot with the complete callstack.
6. The next time a thread is created, like at connect, the "create_current_thread_tls_key()" will be called second time, because the flag was reset previously incorrectly.
7. A new TLS key (index) will be allocated, causing corruption.
8. It does not matter whether it is debug or release build. The release one usually crashes more subtle.

Conclusion:

  • fromjson() uses JsonGrammar, which uses boost spirit, which uses thread specific data. Due to the fact that fromjson() is called in static initialization phase, you cannot guarantee that the "set_current_thread_data()" function which uses a "call_once()" will be executed BEFORE the initialization of "current_thread_tls_init_flag". This depends on the linker: i.e. it is random.

Possible solutions:
1. Do not use fromjson() directly in the static/global variable initializations.
2. Do not use boost::spirit for grammar in the static/global variable initializations.
3. Do not use static linking of Mongo with Boost.

I think the first two options can be dropped. It is too complicated to change the code and the logic of the C++ clases.

Solution: ALWAYS use dynamic linking with Boost on Windows. This way it can be ensured that the DllMain of boost_thread, containing the initialization of the "current_thread_tls_init_flag", will be called BEFORE the initialization of mongoclient specific static data. Add to SConstruct:
env.Append( CPPDEFINES=[ "BOOST_ALL_DYN_LINK" ] )
and use /MD instead of /MT in release.

So you should drop/deny the mongoclient.lib to be linked statically with Boost.



 Comments   
Comment by Balint Szente [ 10/Apr/14 ]

Hello Jason!

Thank you for the update on this issue. I'm glad to hear this bug was been fixed.
Of course, it is difficult to reproduce such fiasco errors, because they depend on the linker how it generates the final executable file, process that you cannot control. So it depends also on "(bad) luck" if you encounter it.

Not relying on boost::spirit definitely solves this problem. So it is OK from my part, I consider this bug fixed.

On the other hand, it is good to avoid static initialization of complex objects in C++ when multiple translation units are involved, as Andy wrote in the second comment.

Thank you,
Balint

Comment by Mira Carey [ 02/Apr/14 ]

Given that the fiasco problem seems resolved to the best of my understanding, I'm closing this out now.

If you have trouble in the future, please feel free to open a new ticket.

Comment by Mira Carey [ 18/Mar/14 ]

The C++ driver no longer relies on boost::spirit for JSON parsing, so this particular initialization order fiasco should no longer be possible. Additionally, I've had a chance to run clang's address sanitizer with checks for additional fiasco's and haven't found any that aren't appropriately guarded.

If you'd like to retest, or have additional concerns, I'm happy to continue the conversation. If not, I'll close this out in a couple of days as fixed.

Comment by Balint Szente [ 20/Dec/11 ]

Indeed, this would be the best (and maybe the only) long term solution. Unfortunately, the static initializations in C++ are safe, if and only if they use functions/data from the same translation unit where they reside. Otherwise, you never know when you will encounter such a situation again.

Comment by Andy Schwerin [ 19/Dec/11 ]

I actually think that option 1, not using fromjson in static initializers, is achievable. We should be out of the business of complex static initializers to begin with. A plausible solution, which I have used elsewhere, is to move startup-time initialization of complex objects into functions. Those functions are marked using some macro magic with their dependencies, and then a InitMongo() function gets placed at the top of main(), which walks the graph and runs the init functions in a topsort order.

With a modicum of cleverness, systems like this aren't painful to work with. The only loss is that startup-initialized objects can no longer be declared as "const".

This more general solution will hopefully also save us some future pain.

Comment by Balint Szente [ 10/Nov/11 ]

How can I edit the description of this ticket to apply wiki formatting for better readability?

Generated at Wed Feb 07 21:58:10 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.