[CDRIVER-1233] Crash after Kerberos plugin cleanup runs twice Created: 16/May/16  Updated: 03/May/17  Resolved: 16/Jun/16

Status: Closed
Project: C Driver
Component/s: None
Affects Version/s: 1.3.5
Fix Version/s: None

Type: Bug Priority: Minor - P4
Reporter: David Golden Assignee: A. Jesse Jiryu Davis
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Mac OSX 10.11.4, Apple LLVM version 7.3.0 (clang-703.0.31), Mac Ports cyrus-sasl2 @2.1.26_6+kerberos, Mac Ports kerberos5 @1.14.2 (libkrb5.3.3)


Issue Links:
Related
related to CDRIVER-592 Deprecate mongoc_init_ctor/dtor Closed
related to CDRIVER-1160 configure flag for MONGOC_NO_AUTOMATI... Closed
related to CDRIVER-783 Add support for SASL plugins Closed
related to CDRIVER-1965 Improve warning when automatic ctor/d... Backlog
is related to CDRIVER-530 SASL version check does not do what y... Closed

 Description   
Summary

In some configurations, the following sequence leads to the MIT Kerberos5 library being destroyed twice during process exit:

  1. process starts, loads libmongoc, which registers its auto-destructor
  2. libmongoc loads Cyrus, which loads its Kerberos plugin, which registers the Kerberos plugin auto-destructor
  3. main() exits
  4. the Kerberos plugin's auto-destructor runs
  5. libmongoc's auto-destructor calls sasl_client_done, which closes the Kerberos plugin shared lib
  6. closing the Kerberos plugin shared lib triggers its auto-destructor again

The solution for C programs is to configure the driver with ./configure --enable-automatic-init-and-cleanup=no and ensure they call mongoc_init and mongoc_cleanup explicitly. In version 2.0, the driver will remove the automatic init and cleanup feature.

Original report:

I found an assertion in kerberos originating in the C driver while testing the CXX driver master branch (commit 8dc0b5b). Backtrace follows:

$ lldb ./src/mongocxx/test/test_driver
 
 
(lldb) target create "./src/mongocxx/test/test_driver"
Current executable set to './src/mongocxx/test/test_driver' (x86_64).
(lldb) r
Process 47444 launched: './src/mongocxx/test/test_driver' (x86_64)
===============================================================================
All tests passed (903 assertions in 60 test cases)
 
Assertion failed: (r == 0), function k5_mutex_lock, file ../../include/k5-thread.h, line 384.
Process 47444 stopped
* thread #1: tid = 0x9686ee, 0x00007fff97fdaf06 libsystem_kernel.dylib`__pthread_kill + 10, queue = 'com.apple.main
-thread', stop reason = signal SIGABRT
    frame #0: 0x00007fff97fdaf06 libsystem_kernel.dylib`__pthread_kill + 10
libsystem_kernel.dylib`__pthread_kill:
->  0x7fff97fdaf06 <+10>: jae    0x7fff97fdaf10            ; <+20>
    0x7fff97fdaf08 <+12>: movq   %rax, %rdi
    0x7fff97fdaf0b <+15>: jmp    0x7fff97fd57cd            ; cerror_nocancel
    0x7fff97fdaf10 <+20>: retq
(lldb) bt
* thread #1: tid = 0x9686ee, 0x00007fff97fdaf06 libsystem_kernel.dylib`__pthread_kill + 10, queue = 'com.apple.main
-thread', stop reason = signal SIGABRT
  * frame #0: 0x00007fff97fdaf06 libsystem_kernel.dylib`__pthread_kill + 10
    frame #1: 0x00007fff922b94ec libsystem_pthread.dylib`pthread_kill + 90
    frame #2: 0x00007fff8cb626e7 libsystem_c.dylib`abort + 129
    frame #3: 0x00007fff8cb29df8 libsystem_c.dylib`__assert_rtn + 321
    frame #4: 0x0000000100879cd9 libkrb5support.1.1.dylib`krb5int_key_delete + 153
    frame #5: 0x0000000100837c18 libgssapi_krb5.2.2.dylib`gss_krb5int_lib_fini + 26
    frame #6: 0x000000010082b037 libgssapi_krb5.2.2.dylib`gssint_mechglue_fini + 40
    frame #7: 0x00007fff5fc1329c dyld`ImageLoaderMachO::doTermination(ImageLoader::LinkContext const&) + 212
    frame #8: 0x00007fff5fc01fee dyld`dyld::runImageStaticTerminators(ImageLoader*) + 206
    frame #9: 0x00007fff5fc04d19 dyld`dyld::garbageCollectImages() + 720
    frame #10: 0x00007fff5fc0c3df dyld`dlclose + 134
    frame #11: 0x00007fff9b9d874d libdyld.dylib`dlclose + 61
    frame #12: 0x00000001004938a1 libsasl2.dylib`_sasl_done_with_plugins + 46
    frame #13: 0x000000010048b0cf libsasl2.dylib`sasl_common_done + 78
    frame #14: 0x00000001004889fd libsasl2.dylib`sasl_client_done + 92
    frame #15: 0x0000000100711062 libmongoc-1.0.0.dylib`_mongoc_do_cleanup + 18 at mongoc-init.c:130
    frame #16: 0x00007fff922b6bf6 libsystem_pthread.dylib`__pthread_once_handler + 65
    frame #17: 0x00007fff9d026fc4 libsystem_platform.dylib`_os_once + 41
    frame #18: 0x00007fff922b6b95 libsystem_pthread.dylib`pthread_once + 57
    frame #19: 0x000000010071103b libmongoc-1.0.0.dylib`mongoc_cleanup + 27 at mongoc-init.c:150
    frame #20: 0x000000010071108e libmongoc-1.0.0.dylib`_mongoc_init_dtor + 14 at mongoc-init.c:170
    frame #21: 0x00007fff5fc1329c dyld`ImageLoaderMachO::doTermination(ImageLoader::LinkContext const&) + 212
    frame #22: 0x00007fff5fc0218b dyld`dyld::runAllStaticTerminators(void*) + 67
    frame #23: 0x00007fff8cb6346b libsystem_c.dylib`__cxa_finalize_ranges + 345
    frame #24: 0x00007fff8cb6376f libsystem_c.dylib`exit + 55
    frame #25: 0x00007fff9b9d95b4 libdyld.dylib`start + 8
(lldb)

During libmongoc compilation, I did notice numerous warnings of the type: clang: warning: argument unused during compilation: '-pthread'

UPDATE: Moved libmongoc configuration output from "environment":

 
libbson 1.3.5 was configured with the following options:
 
Build configuration:
  Enable debugging (slow)                          : yes
  Enable extra alignment (required for 1.0 ABI)    : yes
  Compile with debug symbols (slow)                : yes
  Enable GCC build optimization                    : no
  Enable automatic binary hardening                : yes
  Code coverage support                            : no
  Cross Compiling                                  : no
  Big endian                                       : no
  Link Time Optimization (experimental)            : no
 
Documentation:
  man                                              : no
  HTML                                             : no
 
 
libmongoc 1.3.5 was configured with the following options:
 
Build configuration:
  Enable debugging (slow)                          : yes
  Compile with debug symbols (slow)                : yes
  Enable GCC build optimization                    : no
  Enable automatic binary hardening                : yes
  Enable automatic init and cleanup                : yes
  Code coverage support                            : no
  Cross Compiling                                  : no
  Fast counters                                    : no
  Shared memory performance counters               : yes
  SASL                                             : sasl2
  SSL                                              : yes
  Libbson                                          : bundled
 
Documentation:
  man                                              : no
  HTML                                             : no

There's this warning that libraries should not call sasl_done:

Libraries should not call sasl_done(). Applications should exercise caution when calling sasl_done() to avoid interference with any libraries that might be using libsasl.



 Comments   
Comment by A. Jesse Jiryu Davis [ 23/Dec/16 ]

Thanks, I've updated the top of this ticket description with the current workaround for the sasl_client_done crash, and fixed the spelling error on the master branch.

Comment by Gustaf Neumann [ 23/Dec/16 ]

Thanks a lot, works perfectly, configuring now with "--disable-automatic-init-and-cleanup".

Many thanks for the quick help, and have a nice weekend and holidays!
-g

PS: Small nitpicking: on http://mongoc.org/libmongoc/1.4.0/mongoc_init.html, the word "primatives" should probably read as "primitives".

Comment by A. Jesse Jiryu Davis [ 23/Dec/16 ]

Thanks Gustaf, we closed this issue, even though we still call sasl_client_done, because we can't fully fix it until C Driver 2.0.

In the current C Driver, you can avoid this crash by building the driver with:

./configure  --enable-automatic-init-and-cleanup=no

Make sure you explicitly call mongoc_init and mongoc_cleanup in your code.

When we release C Driver 2.0, semantic versioning will allow us to make incompatible changes to our API, and that includes removing the automatic init and cleanup feature entirely (CDRIVER-1330), and then this crash will really be fixed.

Comment by Gustaf Neumann [ 23/Dec/16 ]

I see still this issue with the 1.5.1 release of the c-driver (testing under Mac OS X 10.11.6 using the default C compiler). The default configuration of libbson 1.5.1 and libmongoc runs on my machine into exact the same crash as reported above during libsystem_c.dylib`exit . This happens with the sasl2/gssapi/krb5 libraries from mac ports installed (cyrus-sasl2 @2.1.26_6).

[ 48] B141260C-CA4D-3C7D-93FC-13DC1EC53378 0x000000010028a000 /usr/local/lib/libmongoc-1.0.0.dylib 
[ 49] 493BC585-B491-36A0-A9C2-88F9558C3EBB 0x00000001031b8000 /usr/local/lib/libbson-1.0.0.dylib 
[ 50] 50C754DC-1852-3798-BBC9-037FEBE5DB0E 0x00000001031e6000 /opt/local/lib/libbson-1.0.0.dylib 
[ 51] 0DCDD68E-17B7-3BE0-86E6-C50F5D7BFDD4 0x0000000103214000 /opt/local/lib/libssl.1.0.0.dylib 
[ 52] 4D17EF12-1B2F-3006-93E6-19352A5FF20D 0x0000000103274000 /opt/local/lib/libcrypto.1.0.0.dylib 
[ 53] E379E30C-13CF-3C0C-8985-1737A4415C19 0x00000001002e1000 /opt/local/lib/libsasl2.dylib 
[ 54] 5FCFF38B-5983-3590-B5E9-074CF0950676 0x000000010346e000 /opt/local/lib/libz.1.dylib 
[ 55] A650B5C8-1950-36A0-86D1-0B2465318BFA 0x00007fff8fb68000 /usr/lib/libresolv.9.dylib 
[ 56] 4E09A4DC-D395-3B82-9A62-4922398DEF0E 0x0000000100262000 /opt/local/lib/sasl2/libanonymous.plugin 
[ 57] D8C7D5C5-988D-334F-B111-BF56989316C4 0x0000000100267000 /opt/local/lib/sasl2/libcrammd5.plugin 
[ 58] 8D54D106-CEAC-3020-B9BA-49DAFDD93DF0 0x0000000103482000 /opt/local/lib/sasl2/libdigestmd5.plugin 
[ 59] 223D5C70-6E6C-39DF-993C-3CA7EFA615A5 0x000000010348f000 /opt/local/lib/sasl2/libgs2.plugin 
[ 60] A39761FE-29B4-3EA0-A21A-51B1CFDB0733 0x0000000103497000 /opt/local/lib/libgssapi_krb5.2.2.dylib 
[ 61] 84BEB4A8-9629-3860-BA8F-1F5A74CB2109 0x00000001034d4000 /opt/local/lib/libkrb5.3.3.dylib 
[ 62] 85B486C6-91E3-3C7F-8C3A-6AACA293E9BA 0x000000010358a000 /opt/local/lib/libk5crypto.3.1.dylib 
[ 63] A49E0C62-5335-3B04-B137-9674DF4C82B0 0x00000001002f9000 /opt/local/lib/libcom_err.1.1.dylib 
[ 64] C4F1AE68-79F1-37D9-A78D-52B8820A7383 0x00000001035a2000 /opt/local/lib/libkrb5support.1.1.dylib 
[ 65] 6C3DC2F0-96B8-37D8-81EF-D85A680D5245 0x00000001035ad000 /opt/local/lib/libintl.8.dylib 
[ 66] FE5F439D-596E-31C7-9AB8-154010F30914 0x00000001035bb000 /opt/local/lib/libiconv.2.dylib 

commenting out sasl_client_done() in src/mongoc/mongoc-init.c fixes the issue - as well as configuring without sasl via ./configure --disable-sasl.

The summary block ends with

the answer is, don't call sasl_done (nor sasl_client_done on newer Cyrus).

but the c-driver still contains this call, and the issue is closed. Did this problem somehow re-appear?

If there is interest, i can provide more detailed backtraces.

Comment by David Golden [ 11/Jul/16 ]

I did not, sorry.

Comment by Jeremy Mikola [ 11/Jul/16 ]

I've done some further research. sasl_client_init/sasl_client_done do reference counting (since this commit), so calling them as a matched set should not be a problem and we don't need to worry about screwing up other SASL users by calling sasl_client_done.

david.golden: Did you also research sasl_done(), which libmongoc falls back to using if sasl_client_done() is not available?

Comment by Jeremy Mikola [ 08/Jul/16 ]

But history has uncovered problems with the sasl init/cleanup routines when called multiple times. This is known to happen when, for example, two libraries (say, libmongoc and libsamba) both assume exclusive access to sasl, and use different memory managers.

bjori: I just came across this issue, but this may be related to some segfaults users of the legacy and new PHP drivers have been experiencing. See:

Comment by Andrew Morrow (Inactive) [ 08/Jun/16 ]

The C++11 driver will be removing instance::current

Comment by David Golden [ 07/Jun/16 ]

If the C++11 driver removes instance::current(), then the instance will clean up when it goes out of scope in main, which will happen before global destruction starts.

Recall that commenting out this line – lexical scope instance lifetime – triggers the crash with auto-destructors enabled. When commented out – i.e no mongocxx::instance ever created – libmongoc's auto-constructor initializes SASL, and then libmongoc's auto-destructor races with libkrb5's auto-destructor for the crash with mongocxx totally uninvolved.

However in the original – with the lexical instance created – instance destruction at the end of main makes libmongoc call its clean up first, avoiding the bad race outcome. The problem with instance::current() is that it delays destruction until too late (after libkrb5 destructs).

So elimination of instance::current should fix the current problem for mongocxx and shouldn't need a breaking change in 1.4.0. (I still recommend eliminating auto ctor/dtor in 2.0.0.)

Comment by A. Jesse Jiryu Davis [ 07/Jun/16 ]

So will it be enough for C++ users, for now, to compile the C Driver with auto-destroy turned off (MONGOC_NO_AUTOMATIC_GLOBALS) and to manually cleanup the C++ driver's "instance" before main() exits?

What's the recommended cleanup syntax for the C++ driver look like?

Darn, I wish we'd configured the C Driver's Debian package with auto-destroy turned off from the start. I believe that, starting in 1.4.0, we should turn it off in the Debian package build, even though that's a surprising change in a minor release.

Comment by Andrew Morrow (Inactive) [ 07/Jun/16 ]

Agreed, instance::current was a badly thought through idea and should be removed. The intention has always been (mod instance::current), that users were expected to explicitly manage the lifetime of the instance object which proxied for all global state in the driver.

Comment by David Golden [ 06/Jun/16 ]

I've done some further research. sasl_client_init/sasl_client_done do reference counting (since this commit), so calling them as a matched set should not be a problem and we don't need to worry about screwing up other SASL users by calling sasl_client_done.

The original problem is the race condition between auto-destructors. If krb5 destructs first, then things go awry. Not calling sasl_client_done is a hack that "solves" the problem at the cost of leaking resources.

I think a sufficient "solution" for 1.x could be to document that auto-destructors are not recommended and to explain why.

That leaves the C++11 driver in a bit of a pickle, as relying on instance::current() to initialize a static variable destroys it too late. See CXX-890 for related problems about instance lifetime. Possibly, the answer there is also to stop relying on instance destruction at process exit time and recommend that clients own the instance object and manage its lifetime.

Comment by Hannes Magnusson [ 06/Jun/16 ]

There is nothing to init or cleanup in (currently) other implemented tls or crypto libraries.
We already went down the wrong path of calling everything with implementation agnostic language, with tainted implementation -isms. I don't think we should repeat that at all.

I also disagree on the symmetry argument. After all, again, that got us into the trouble we are in and is what we are trying to fix.
There can easily be components of a system that require mongoc and cyrus sasl separately. I don't see the need of forcing init and cleanup in the "power user access hatch".

Comment by David Golden [ 06/Jun/16 ]

Assuming we only ever have one implementation at a time, I'm very tempted to keep it agnostic:

mongoc_init_with_opts (MONGOC_INIT_SASL | MONGOC_INIT_TLS);
mongoc_cleanup_with_opts (MONGOC_CLEANUP_SSL);

The fact that we have this at all is a design flaw forced on us by problematic libraries. Therefore this is the "power user access hatch" and all we're saying is "if our init/cleanup of <implementation> is a problem for you, here's how you can switch it off".

I also wonder whether we should design for and insist on symmetry. Either the users initializes and cleans up, or we do, but not only one side of it. I.e. only take flags for mongoc_init_with_opts, remember them, and at cleanup do only the matching operations.

Comment by Hannes Magnusson [ 06/Jun/16 ]

As long as we don't remove flags, just ignore unknown values, and by default init/cleanup everything we use.. I guess its best we can do.

Keep in mind we don't do OpenSSL init/cleanup routines on Windows/Darwin, but the MONGOC_INIT_OPENSSL still needs to be available, as noop.
Same if we ever switch cyrus sasl to something else, we still need to have MONGOC_INIT_CYRUSSASL

mongoc_init_with_opts (MONGOC_INIT_CYRUSSASL | MONGOC_INIT_OPENSSL);
mongoc_cleanup_with_opts (MONGOC_CLEANUP_OPENSSL);

Probably the best we can do

Comment by A. Jesse Jiryu Davis [ 06/Jun/16 ]

Yeah, you've convinced me we need to pass the struct into mongoc_init_with_opts(), too, since some libraries' initialization won't be idempotent, either.

I agree it's fragile. The other option is not to destroy any libraries at all and shut down with leaks. acm convinced me that this is a bad idea, because then no memory checkers will work.

Is there a third option that would be less fragile, bjori? I think that giving the application full control over setup / destruction is our only option, even though it requires the user to know a lot about our implementation.

Comment by Hannes Magnusson [ 06/Jun/16 ]

That doesn't resolve the problem of double constructing something.
That struct needs to be passed into mongoc_init to, to signal "don't init sasl, I did that elsewhere" and "don't dtor it either".

I think this is fragile approach though. What happens when we remove sasl? Why should the user care at all that we currently use cyrus-sasl?
What happens when we add new dependencies?

Comment by A. Jesse Jiryu Davis [ 06/Jun/16 ]

New approach. After discussing with David and Andrew, it seems we should:

  • Shout a warning if users leave automatic destruction turned on
  • Remove automatic destruction in version 2.0 as planned
  • Add a struct mongoc_cleanup_opts_t and function mongoc_cleanup_with_opts (const mongoc_cleanup_opts_t *opts). The driver will still destroy everything it initialized, by default, but allow applications to control destruction precisely.
Comment by Hannes Magnusson [ 27/May/16 ]

Could this be related to CDRIVER-1207 ?

Comment by David Golden [ 27/May/16 ]

I also found this warning that libraries should not call sasl_done:

Libraries should not call sasl_done(). Applications should exercise caution when calling sasl_done() to avoid interference with any libraries that might be using libsasl.

Comment by David Golden [ 27/May/16 ]

jesse, that hypothesis seems right. I ran a debugging session with a breakpoint on gss_krb5int_lib_fini and it was called twice. The krb5 auto-destructor is called during program termination. Then the mongoc auto-destructor runs and calls sasl_client_done, which unloads krb5, which calls the krb5 auto-destructor again.

First backtrace:

(lldb) bt
* thread #1: tid = 0x8b1774, 0x0000000100836bfe libgssapi_krb5.2.2.dylib`gss_krb5int_lib_fini, queue = 'com.apple.main-thread', stop reason = breakpoint 1.1
  * frame #0: 0x0000000100836bfe libgssapi_krb5.2.2.dylib`gss_krb5int_lib_fini
    frame #1: 0x000000010082a037 libgssapi_krb5.2.2.dylib`gssint_mechglue_fini + 40
    frame #2: 0x00007fff5fc133bc dyld`ImageLoaderMachO::doTermination(ImageLoader::LinkContext const&) + 212
    frame #3: 0x00007fff5fc0218b dyld`dyld::runAllStaticTerminators(void*) + 67
    frame #4: 0x00007fff9b6ea46b libsystem_c.dylib`__cxa_finalize_ranges + 345
    frame #5: 0x00007fff9b6ea76f libsystem_c.dylib`exit + 55
    frame #6: 0x00007fff983f15b4 libdyld.dylib`start + 8
    frame #7: 0x00007fff983f15ad libdyld.dylib`start + 1

Second backtrace (with assertion fail):

(lldb) bt
* thread #1: tid = 0x8b1774, 0x00007fff98c5bf06 libsystem_kernel.dylib`__pthread_kill + 10, queue = 'com.apple.main-thread', stop reason = signal SIGABRT
  * frame #0: 0x00007fff98c5bf06 libsystem_kernel.dylib`__pthread_kill + 10
    frame #1: 0x00007fff8f6e74ec libsystem_pthread.dylib`pthread_kill + 90
    frame #2: 0x00007fff9b6e96e7 libsystem_c.dylib`abort + 129
    frame #3: 0x00007fff9b6b0df8 libsystem_c.dylib`__assert_rtn + 321
    frame #4: 0x0000000100878cd9 libkrb5support.1.1.dylib`krb5int_key_delete + 153
    frame #5: 0x0000000100836c18 libgssapi_krb5.2.2.dylib`gss_krb5int_lib_fini + 26
    frame #6: 0x000000010082a037 libgssapi_krb5.2.2.dylib`gssint_mechglue_fini + 40
    frame #7: 0x00007fff5fc133bc dyld`ImageLoaderMachO::doTermination(ImageLoader::LinkContext const&) + 212
    frame #8: 0x00007fff5fc01fee dyld`dyld::runImageStaticTerminators(ImageLoader*) + 206
    frame #9: 0x00007fff5fc04d19 dyld`dyld::garbageCollectImages() + 720
    frame #10: 0x00007fff5fc0c4ff dyld`dlclose + 134
    frame #11: 0x00007fff983f074d libdyld.dylib`dlclose + 61
    frame #12: 0x00000001004928a1 libsasl2.dylib`_sasl_done_with_plugins + 46
    frame #13: 0x000000010048a0cf libsasl2.dylib`sasl_common_done + 78
    frame #14: 0x00000001004879fd libsasl2.dylib`sasl_client_done + 92
    frame #15: 0x000000010070fff3 libmongoc-1.0.0.dylib`_mongoc_do_cleanup + 35 at mongoc-init.c:132
    frame #16: 0x00007fff8f6e4bf6 libsystem_pthread.dylib`__pthread_once_handler + 65
    frame #17: 0x00007fff96088fc4 libsystem_platform.dylib`_os_once + 41
    frame #18: 0x00007fff8f6e4b95 libsystem_pthread.dylib`pthread_once + 57
    frame #19: 0x000000010070ffbc libmongoc-1.0.0.dylib`mongoc_cleanup + 44 at mongoc-init.c:153
    frame #20: 0x0000000100710043 libmongoc-1.0.0.dylib`_mongoc_init_dtor + 35 at mongoc-init.c:174
    frame #21: 0x00007fff5fc133bc dyld`ImageLoaderMachO::doTermination(ImageLoader::LinkContext const&) + 212
    frame #22: 0x00007fff5fc0218b dyld`dyld::runAllStaticTerminators(void*) + 67
    frame #23: 0x00007fff9b6ea46b libsystem_c.dylib`__cxa_finalize_ranges + 345
    frame #24: 0x00007fff9b6ea76f libsystem_c.dylib`exit + 55
    frame #25: 0x00007fff983f15b4 libdyld.dylib`start + 8
    frame #26: 0x00007fff983f15ad libdyld.dylib`start + 1

Is the libc auto destructor really necessary? If the program is terminating, what needs to be cleaned up?

Comment by A. Jesse Jiryu Davis [ 27/May/16 ]

I can repro with the same setup: MacPorts cyrus-sasl2 @2.1.26_6+kerberos and kerberos5 @1.14.2_0, running the example-client program succeeds, but if I remove the mongoc_cleanup at the end and allow auto-destruction, it dies with the assertion on line 384.

kerberos5 also auto-destructs. If you properly call mongoc_cleanup in a C program, then it runs and calls sasl_client_done, which explicitly destroys kerberos5 and disables its auto-destructor.

I hypothesize that if you don't call mongoc_cleanup, it auto-destructs after kerberos5, but incorrectly re-enters kerberos5's destructor gss_krb5int_lib_fini anyway.

Comment by David Golden [ 27/May/16 ]

More research:

  • The assertion failure is here, which is a pthread_mutex_lock failure (which supposedly should only happen if a deadlock would occur or the mutex is invalid)
  • Via printf debugging, when the crash happens, it doesn't look like the C++ driver ~impl() destructor has run; when the instance has automatic scope and the ~impl() destructor does run, the crash does not occur; i.e. the timing of the destruction matters (inside or outside program termination)
  • Looking again at the call stack, krb5 is in the middle of it's own destructor triggered by SASL unloading the krb5 shared library, which happens during libmongocs destructor

Presumably, the krb5 destructor runs at program termination even when libmongoc's destructors are disabled, so it seems likely that something about the krb5 destructor being called from within libmongoc's destructor is the issue.

Comment by Hannes Magnusson [ 27/May/16 ]

> I narrowed it down and found that commenting out this line is sufficient to get test_instance to fail the same assertion as test_driver (and the test fails, as expected).

as acm said, the mongoc_init()mongoc_cleanup() routines are called when constructing/destructing the instance object, so its not surprising that commenting out those calls feel like they change something – the fact is, they are theoretically NOOP when you compile with --enable-automatic-init-and-cleanup as they are called as soon as the module is dlopen()ed and dlclose()ed.
The problem is that they are only "theoretically NOOP"
They are supposed to be idempotent, but the SASL cleanup routines have certain unforeseen issues with this.

This correct fix for this is to compile with --enable-automatic-init-and-cleanup=no.

The workaround... idk. I can only think of ridiculous static variable "have_i_been_called" workarounds.

Comment by David Golden [ 27/May/16 ]

Or I'll test it tonight. With enable-automatic-init-and-cleanup=no, and calling mongoc_cleanup() twice here, the crash does not come back.

Comment by David Golden [ 27/May/16 ]

Other weird findings (with init/cleanup enabled):

  • test_driver fails the assertion (as discussed above) even if all test source code is ifdef'd out
  • test_instance does not fail the assertion unless you ifdef out the test code

I narrowed it down and found that commenting out this line is sufficient to get test_instance to fail the same assertion as test_driver (and the test fails, as expected).

jesse, I'll test your scenario tomorrow.

Comment by A. Jesse Jiryu Davis [ 27/May/16 ]

If you call mongoc_cleanup() twice now with --enable-automatic-init-and-cleanup=no does the crash come back?

Comment by David Golden [ 27/May/16 ]

With --enable-automatic-init-and-cleanup=no and the MacPorts cyrus-sasl2 and kerberos5, the assertion failure does not occur.

Comment by Hannes Magnusson [ 26/May/16 ]

Please try compiling mongoc with --enable-automatic-init-and-cleanup=no (and enabling cyrus/kerberos again)

Comment by David Golden [ 26/May/16 ]

BTW, when I disable cyrus-sasl2 and kerberos5 in MacPorts, the assertion failure goes away.

Comment by David Golden [ 26/May/16 ]

From the config output I posted:

Enable automatic init and cleanup : yes

Re replication...

This was on my MacBook Pro. Compiler is from Xcode:

$ clang --version
Apple LLVM version 7.3.0 (clang-703.0.31)
Target: x86_64-apple-darwin15.5.0
Thread model: posix
InstalledDir: /Library/Developer/CommandLineTools/usr/bin

Other build tools are installed via MacPorts.

PKG_CONFIG_PATH=/usr/local/lib/pkgconfig

$ pkg-config --variable pc_path pkg-config
/opt/local/lib/pkgconfig:/opt/local/share/pkgconfig

I had previously installed the MacPorts cyrus-sasl2 library, which pulled in kerberos5:

cyrus-sasl2 @2.1.26_6+kerberos (active)
kerberos5 @1.14.2_0 (active)

My libmongoc configuration/installation incantation was:

$ git checkout 1.3.5
$ git clean -dxf
$ ./autogen.sh --with-libbson=bundled --enable-ssl --enable-debug
$ make
$ sudo make install

Then, from the C++ master branch:

$ cd build
$ cmake -GNinja -DCMAKE_BUILD_TYPE=RelWithDebInfo ..
$ ninja
$ src/mongocxx/test/test_driver
(assertion here)

Comment by Hannes Magnusson [ 26/May/16 ]

It should be. But history has uncovered problems with the sasl init/cleanup routines when called multiple times. This is known to happen when, for example, two libraries (say, libmongoc and libsamba) both assume exclusive access to sasl, and use different memory managers.

Plus the whole magic ctor/dtor issue.
david.golden Are you building mongoc with MONGOC_NO_AUTOMATIC_GLOBALS / --enable-automatic-init-and-cleanup=no ?

Comment by A. Jesse Jiryu Davis [ 26/May/16 ]

It is, of course, using whatever idempotency feature the build system detects in your compiler:

https://github.com/mongodb/mongo-c-driver/blob/master/src/mongoc/mongoc-init.c#L127

Comment by Andrew Morrow (Inactive) [ 26/May/16 ]

As to exactly once... how can I know? The C driver may-or-may-not invoke its cleanup handler automatically, but there is no programatic way for me to know. So, I do call mongoc_init and mongoc_cleanup explicitly. I'd really hope that mongoc_cleanup is idempotent.

Comment by Andrew Morrow (Inactive) [ 26/May/16 ]

The C++ driver calls those when you construct/destroy an 'instance' object.

Comment by Hannes Magnusson [ 26/May/16 ]

Since acm mention it, you are calling mongoc_init() at the start of the application, and mongoc_cleanup() in the end, and exactly only once?

I find it weird this is being triggered from runAllStaticTerminators

Comment by Hannes Magnusson [ 26/May/16 ]

Do you have debug symbols for libkrb5support? I'd like to know what its asserting.
Also if you have a short concrete repro, it would remove lots of guesses

We see these sort of issues in sasl cleanup routines when the cleanup happens out of order, such as allocators changes or module dependencies

Comment by Andrew Morrow (Inactive) [ 26/May/16 ]

The C++ driver 100% does not call fork We also don't make reference to SASL. I wonder if this could be related to the "feature" of the automatic global termination of the C driver via _attribute_((destructor)) or whatever it is?

Comment by David Golden [ 26/May/16 ]

It happens every time. Note that tests pass and so I think the error is happening during global destruction (stack track shows _mongoc_init_dtor).

I'm not deeply familiar with the tests, but grepping the test source doesn't show 'SASL' or 'fork'. acm, can you confirm?

Comment by Andrew Morrow (Inactive) [ 26/May/16 ]

The pthread warnings are harmless (server build with clang has them too) and is a strange problem where clang (unlike gcc) only requires the -pthread argument at compile time on OS X and not at link time (because there is no libpthread on OS X), but seems to think that providing it is worth a diagnostic. The fix for this, should you choose to bother with it, is to only add the -pthread argument to your link flags if you aren't on OS X. Unfortunately, if your build system auto-propagates CFLAGS to the link stage as is more or less correct, then you will always get this warning.

Comment by A. Jesse Jiryu Davis [ 26/May/16 ]
  • Every time?
  • Are you doing anything with SASL in your test?
  • Are you forking?

Those pthread warnings have been around in clang builds for as long as I can remember, I don't think they're related.

Generated at Wed Feb 07 21:11:57 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.