Uploaded image for project: 'Documentation'
  1. Documentation
  2. DOCS-12337

Docs for SERVER-33852: libldap is not threadsafe with NSS

      Description

      Description:

      This change causes MongoDB binaries which are linked against libldap to synchronize access to the library. If, instead, they are linked against libldap_r, they will function normally. This is intended to avoid thread-unsafe behaviour in the libldap variant of the library, which manifests as crashes in libldap's TLS handling. Customers may still wish to force their process to link against libldap_r, in order to avoid the global synchronization.

      Engineering Ticket Description:

      A while back on a RHEL6 machine, a segmentation fault at address 0x20 was detected, with this stack:

       
       libpthread.so.0(+0xF7E0) [0x7f61c59fa7e0]
       libldap-2.4.so.2(+0x3962C) [0x7f61c800f62c]
       libnspr4.so(PR_CallOnceWithArg+0xC5) [0x7f61c4434fc5]
       libldap-2.4.so.2(+0x36E1F) [0x7f61c800ce1f]
       libldap-2.4.so.2(+0x35434) [0x7f61c800b434]
       libldap-2.4.so.2(+0x355B5) [0x7f61c800b5b5]
       libldap-2.4.so.2(ldap_int_tls_start+0x6E) [0x7f61c800b7be]
       libldap-2.4.so.2(ldap_int_open_connection+0x27F) [0x7f61c7fe571f]
       libldap-2.4.so.2(ldap_new_connection+0x19F) [0x7f61c7ff90ff]
       libldap-2.4.so.2(ldap_open_defconn+0x2F) [0x7f61c7fe546f]
       libldap-2.4.so.2(ldap_send_initial_request+0x1A8) [0x7f61c7ffa208]
       libldap-2.4.so.2(ldap_sasl_bind+0x174) [0x7f61c7fef864]
       libldap-2.4.so.2(ldap_sasl_bind_s+0x8B) [0x7f61c7fefafb]
      

      Examination of the stack, and the offset where the fault occurred suggested that this was the code in question:

              /*
                MOZNSS_DIR will override everything else - you can
                always set MOZNSS_DIR to force the use of this
                directory
                If using MOZNSS, specify the location of the moznss db dir
                in the cacertdir directive of the OpenLDAP configuration.
                DEFAULT_MOZNSS_DIR will only be used if the code cannot
                find a security dir to use based on the current
                settings
              */
              nn = 0;
              securitydirs[nn++] = PR_GetEnv( "MOZNSS_DIR" );
              securitydirs[nn++] = lt->lt_cacertdir;
              securitydirs[nn++] = PR_GetEnv( "DEFAULT_MOZNSS_DIR" );
      

      The relevant line seemed to be 'securitydirs[nn++] = lt->lt_cacertdir;', I suspect that 'lt' was NULL.

      Initializing a TLS connection requires a call to ldap_int_tls_connect on the LDAP session handle. If TLS hasn't been set up on the handle before, alloc_handle is called on the handle's TLS context. alloc_handle will notice that this context is NULL, and will allocate the global TLS context, and set its refcount to 1. ldap_int_tls_connect then acquires the global TLS context, makes the LDAP handle's TLS context point at it, and bumps the refcount.

      libldap is intended to be thread safe, if LDAP session handles are not passed between threads. There is a "more thread safe" library called libldap_r, which is not intended for external consumption. It's meant to be used inside of the OpenLDAP server, and in cases, where LDAP session handles are passed between threads, and it defines mutexes to guard state for these situations.

      libldap's OpenSSL integration uses OpenSSL's own datastructures as its TLS context. When it bumps and decrements the refcounts, it uses OpenSSL methods, which use OpenSSL's own mutexes. Remember, manipulating the local session's TLS context refcount is actually manipulating the global refcount. Using OpenSSL's mutexes provides thread safety when not using libldap_r.

      Red Hat Enterprise Linux's copy of libldap appears to be compiled against NSS, Mozilla's TLS library, instead of OpenSSL. The NSS integration seems to be using libldap's mutexes(which don't exist in regular builds of the library!) to protect the refcounts. Because we're using libldap instead of libldap_r, this results in a data race.

      I modified mongoldap so that after running all the existing tests, it would spawn two threads and query an LDAP server from them concurrently. I then recompiled mongoldap and my Arch Linux system's copy of libldap(2.4.44-3) with ThreadSanitizer. I've attached three logs, test-libldap, which was ran while linked against libldap; test-libldap_r, which was ran while linked against libldap_r; and _test-libldap+mutexes which was linked against libldap, and had our own mutexes around each call to libldap.

      This is the relevant bit where it bumps the refcount:

      WARNING: ThreadSanitizer: data race (pid=12049)
      Write of size 4 at 0x7b2000002988 by thread T3:
      #0 tlsm_ctx_ref /home/sajack/abs/openldap/src/openldap-2.4.44/libraries/libldap/tls_m.c:2033:14 (libldap-2.4.so.2+0x8bd27)
      #1 tls_ctx_ref /home/sajack/abs/openldap/src/openldap-2.4.44/libraries/libldap/tls2.c:91:2 (libldap-2.4.so.2+0x8a2ee)
      #2 ldap_int_tls_connect /home/sajack/abs/openldap/src/openldap-2.4.44/libraries/libldap/tls2.c:352:4 (libldap-2.4.so.2+0x8a9bc)
      #3 ldap_int_tls_start /home/sajack/abs/openldap/src/openldap-2.4.44/libraries/libldap/tls2.c:860:8 (libldap-2.4.so.2+0x8a4b3)
      #4 ldap_int_open_connection /home/sajack/abs/openldap/src/openldap-2.4.44/libraries/libldap/open.c:448:8 (libldap-2.4.so.2+0x11ad3)
      #5 ldap_new_connection /home/sajack/abs/openldap/src/openldap-2.4.44/libraries/libldap/request.c:487:9 (libldap-2.4.so.2+0x4f98b)
      #6 ldap_open_defconn /home/sajack/abs/openldap/src/openldap-2.4.44/libraries/libldap/open.c:41:19 (libldap-2.4.so.2+0xff03)
      #7 ldap_send_initial_request /home/sajack/abs/openldap/src/openldap-2.4.44/libraries/libldap/request.c:130:8 (libldap-2.4.so.2+0x4ccde)
      #8 ldap_pvt_search /home/sajack/abs/openldap/src/openldap-2.4.44/libraries/libldap/search.c:128:12 (libldap-2.4.so.2+0x1b534)
      #9 ldap_pvt_search_s /home/sajack/abs/openldap/src/openldap-2.4.44/libraries/libldap/search.c:174:7 (libldap-2.4.so.2+0x1c16a)
      #10 ldap_search_ext_s /home/sajack/abs/openldap/src/openldap-2.4.44/libraries/libldap/search.c:150:9 (libldap-2.4.so.2+0x1bfb7)
      #11 mongo::LDAPSessionHolder<mongo::(anonymous namespace)::OpenLDAPSessionParams>::query[abi:cxx11](mongo::LDAPQuery, timeval*) /home/sajack/mongo/src/mongo/db/modules/enterprise/src/ldap/connections/ldap_connection_helpers.h:144:41 (mongoldap+0x977392)
      ...
      Previous write of size 4 at 0x7b2000002988 by thread T2:
      #0 tlsm_ctx_free /home/sajack/abs/openldap/src/openldap-2.4.44/libraries/libldap/tls_m.c:2046:13 (libldap-2.4.so.2+0x8bdc3)
      #1 ldap_pvt_tls_ctx_free /home/sajack/abs/openldap/src/openldap-2.4.44/libraries/libldap/tls2.c:83:2 (libldap-2.4.so.2+0x85d0e)
      #2 ldap_int_tls_destroy /home/sajack/abs/openldap/src/openldap-2.4.44/libraries/libldap/tls2.c:105:3 (libldap-2.4.so.2+0x85db6)
      #3 ldap_ld_free /home/sajack/abs/openldap/src/openldap-2.4.44/libraries/libldap/unbind.c:210:2 (libldap-2.4.so.2+0x301ab)
      #4 ldap_unbind_ext /home/sajack/abs/openldap/src/openldap-2.4.44/libraries/libldap/unbind.c:52:9 (libldap-2.4.so.2+0x2f1a0)
      #5 ldap_unbind_ext_s /home/sajack/abs/openldap/src/openldap-2.4.44/libraries/libldap/unbind.c:61:9 (libldap-2.4.so.2+0x304c8)
      #6 mongo::OpenLDAPConnection::disconnect() /home/sajack/mongo/src/mongo/db/modules/enterprise/src/ldap/connections/openldap_connection.cpp:321:15 (mongoldap+0x974350)
      #7 mongo::OpenLDAPConnection::~OpenLDAPConnection() /home/sajack/mongo/src/mongo/db/modules/enterprise/src/ldap/connections/openldap_connection.cpp:193:21 (mongoldap+0x974143)
      #8 mongo::OpenLDAPConnection::~OpenLDAPConnection() /home/sajack/mongo/src/mongo/db/modules/enterprise/src/ldap/connections/openldap_connection.cpp:192:43 (mongoldap+0x97469c)
      #9 std::default_delete<mongo::LDAPConnection>::operator()(mongo::LDAPConnection*) const /usr/lib64/gcc/x86_64-pc-linux-gnu/6.3.1/../../../../include/c++/6.3.1/bits/unique_ptr.h:76:2 (mongoldap+0x9730be)
      #10 std::unique_ptr<mongo::LDAPConnection, std::default_delete<mongo::LDAPConnection> >::~unique_ptr() /usr/lib64/gcc/x86_64-pc-linux-gnu/6.3.1/../../../../include/c++/6.3.1/bits/unique_ptr.h:239:4 (mongoldap+0x9725c8)
      #11 boost::optional_detail::optional_base<std::unique_ptr<mongo::LDAPConnection, std::default_delete<mongo::LDAPConnection> > >::destroy_impl(mpl_::bool_<false>) /home/sajack/mongo/src/third_party/boost-1.60.0/boost/optional/optional.hpp:745:67 (mongoldap+0x97397e)
      #12 boost::optional_detail::optional_base<std::unique_ptr<mongo::LDAPConnection, std::default_delete<mongo::LDAPConnection> > >::destroy() /home/sajack/mongo/src/third_party/boost-1.60.0/boost/optional/optional.hpp:707:9 (mongoldap+0x9738f1)
      #13 boost::optional_detail::optional_base<std::unique_ptr<mongo::LDAPConnection, std::default_delete<mongo::LDAPConnection> > >::~optional_base() /home/sajack/mongo/src/third_party/boost-1.60.0/boost/optional/optional.hpp:327:24 (mongoldap+0x973868)
      #14 boost::optional<std::unique_ptr<mongo::LDAPConnection, std::default_delete<mongo::LDAPConnection> > >::~optional() /home/sajack/mongo/src/third_party/boost-1.60.0/boost/optional/optional.hpp:877:18 (mongoldap+0x9737e8)
      #15 mongo::StatusWith<std::unique_ptr<mongo::LDAPConnection, std::default_delete<mongo::LDAPConnection> > >::~StatusWith() /home/sajack/mongo/src/mongo/db/modules/enterprise/src/ldap/connections/ldap_connection_factory.h:12:7 (mongoldap+0x9ad956)
      #16 mongo::LDAPRunnerImpl::runQuery[abi:cxx11](mongo::LDAPQuery const&) /home/sajack/mongo/src/mongo/db/modules/enterprise/src/ldap/ldap_runner_impl.cpp:77:1 (mongoldap+0x9ac939)
      #17 mongo::LDAPManagerImpl::_getGroupDNsFromServer[abi:cxx11](mongo::LDAPQuery&) /home/sajack/mongo/src/mongo/db/modules/enterprise/src/ldap/ldap_manager_impl.cpp:144:67 (mongoldap+0x98da18)
      ...
      Location is heap block of size 120 at 0x7b2000002980 allocated by main thread:
      #0 malloc /home/sajack/llvm/projects/compiler-rt/lib/tsan/rtl/tsan_interceptors.cc:586 (mongoldap+0x683c2d)
      #1 ber_memalloc_x /home/sajack/abs/openldap/src/openldap-2.4.44/libraries/liblber/memory.c:228:9 (liblber-2.4.so.2+0x11327)
      #2 tlsm_ctx_new /home/sajack/abs/openldap/src/openldap-2.4.44/libraries/libldap/tls_m.c:1998:8 (libldap-2.4.so.2+0x8ba02)
      #3 ldap_int_tls_init_ctx /home/sajack/abs/openldap/src/openldap-2.4.44/libraries/libldap/tls2.c:236:20 (libldap-2.4.so.2+0x865f3)
      #4 ldap_pvt_tls_init_def_ctx /home/sajack/abs/openldap/src/openldap-2.4.44/libraries/libldap/tls2.c:273:7 (libldap-2.4.so.2+0x864a3)
      #5 alloc_handle /home/sajack/abs/openldap/src/openldap-2.4.44/libraries/libldap/tls2.c:288:8 (libldap-2.4.so.2+0x86c53)
      #6 ldap_int_tls_connect /home/sajack/abs/openldap/src/openldap-2.4.44/libraries/libldap/tls2.c:337:9 (libldap-2.4.so.2+0x8a831)
      #7 ldap_int_tls_start /home/sajack/abs/openldap/src/openldap-2.4.44/libraries/libldap/tls2.c:860:8 (libldap-2.4.so.2+0x8a4b3)
      #8 ldap_int_open_connection /home/sajack/abs/openldap/src/openldap-2.4.44/libraries/libldap/open.c:448:8 (libldap-2.4.so.2+0x11ad3)
      #9 ldap_new_connection /home/sajack/abs/openldap/src/openldap-2.4.44/libraries/libldap/request.c:487:9 (libldap-2.4.so.2+0x4f98b)
      #10 ldap_open_defconn /home/sajack/abs/openldap/src/openldap-2.4.44/libraries/libldap/open.c:41:19 (libldap-2.4.so.2+0xff03)
      #11 ldap_send_initial_request /home/sajack/abs/openldap/src/openldap-2.4.44/libraries/libldap/request.c:130:8 (libldap-2.4.so.2+0x4ccde)
      #12 ldap_pvt_search /home/sajack/abs/openldap/src/openldap-2.4.44/libraries/libldap/search.c:128:12 (libldap-2.4.so.2+0x1b534)
      #13 ldap_pvt_search_s /home/sajack/abs/openldap/src/openldap-2.4.44/libraries/libldap/search.c:174:7 (libldap-2.4.so.2+0x1c16a)
      #14 ldap_search_ext_s /home/sajack/abs/openldap/src/openldap-2.4.44/libraries/libldap/search.c:150:9 (libldap-2.4.so.2+0x1bfb7)
      #15 mongo::LDAPSessionHolder<mongo::(anonymous namespace)::OpenLDAPSessionParams>::query[abi:cxx11](mongo::LDAPQuery, timeval*) /home/sajack/mongo/src/mongo/db/modules/enterprise/src/ldap/connections/ldap_connection_helpers.h:144:41 (mongoldap+0x977392)
      

      I believe that if the refcount became incorrect, libldap may decide to deallocate the global TLS context. Everything trying to use TLS would then break.

      We have a few possible mitigations.
      1) Use libldap_r. It may not be on all systems, that would need to be checked. I ran thread sanitizer against it, and the refcount data race went away, but there were still some others. Note that if we consume third party libraries which link against standard libldap, we will wind up with two sets of symbols in our process space. This seems to have been noticed, but not fixed, in Fedora.
      2) Drop mutexes around all of our LDAP code. This will mean that all external authorization will be serialized, and so could be slower if lots of people were attempting to authZN at once.
      3) Get upstream to replace libldap with libldap_r. Debian, and by extension Ubuntu, seem to have done this already.

      Using LD_LIBRARY_PATH to override the symbols produced by libldap with symbols produced from libldap_r has proven to mitigate this issue.

      Scope of changes

      Impact to Other Docs

      MVP (Work and Date)

      Resources (Scope or Design Docs, Invision, etc.)

            Assignee:
            caleb.thompson@mongodb.com Caleb Thompson
            Reporter:
            kay.kim@mongodb.com Kay Kim (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated:
              Resolved:
              4 years, 19 weeks, 1 day ago