[CDRIVER-4114] Reduce contention on topology description Created: 03/Aug/21  Updated: 17/Nov/21  Resolved: 10/Nov/21

Status: Closed
Project: C Driver
Component/s: None
Affects Version/s: None
Fix Version/s: 1.20.0

Type: Improvement Priority: Unknown
Reporter: Colby Pike Assignee: Colby Pike
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Epic Link: Improve multi-threaded perf

 Description   

The `mongoc_topology_t` contains a topology description that is updated to reflect the known status of the database topology. For each server, a thread will be spawned to monitor that server's status. These threads all contend to keep the topology description up-to-date. Additionally, the SRV monitor thread can also update the topology concurrently.

All operations that access the topology are guarded by a single mongoc_topology_t::mutex, so a single thread locking this mutex can bring every client thread to a halt while it performs necessary updates.



 Comments   
Comment by Colby Pike [ 10/Nov/21 ]

Initial changes have been merged and are promising. Some improvements to be made for baseline performance beyond just thread contention. Will submit more changes soon.

Comment by Githook User [ 10/Nov/21 ]

Author:

{'name': 'vector-of-bool', 'email': 'vectorofbool@gmail.com', 'username': 'vector-of-bool'}

Message: CDRIVER-4114: Reduce contention on the topology description (#854)

  • Additional atomic operations
  • 64bit and pointer atomics for MSVC
  • Revive some old exported symbols, clean up some unused code, and mark unreachable code.
  • No 64bit atomics for x86 GCC either. Fix several warnings
  • Token balance in case of empty macro expansion
  • Fix extra semicolon, plus macOS posix detection
  • Fix check on 32-bit, and missing 'default' case.

Closes: CDRIVER-4124

  • Missing ptr_fetch
  • CXX-4113: Thread-safe pool for server session objects
  • Remove unused node pointers on server sessions
  • Track pool size, and don't loop while holding the lock to clear
  • Wrong arg names in i64 compare_exchange emul
  • Missing dist listing for ts-pool files
  • Cleanup and some redesign of pool API
  • Declare special pool for server sessions
  • Fix pruning pool not leaking the 10,001st session
  • Use typed pool fns
  • Visiting pool items, and checking that items are returned to the pool
  • Fix unacknowledged sessions tests
  • Additional atomic operations
  • A simple shared pointer construct
  • More atomics
  • Fix atomic intrinsics on MSVC
  • Topology description is a shared ptr
  • Give sptr._aux a type to aid in debugging. And no spinlocks!
  • Fix race on updating server rtt
  • Assertion macro for asserting pointers inline
  • Refactoring many locations to copy the shared topology description
  • Undo accidental local dev changes
  • Give the topology scanner its own mutex for the handshake_cmd

This requires a refactor of how the handshake_cmd is passed around.
Instead of the scanner returning a pointer to contended data and expecting
the caller to bring their own synchronization, the scanner duplicates the
handshake into an output parameter and performs its own synchronization
internally.

  • Instead of using a mutex to guard monitor thread startup, use an atomic compare-exchange.
  • Pull from new-atomics
  • Fix: 'release' is not valid for GCC compare-exchange
  • Lots of "const" weeds out threading a mutation issues
  • Fix const-init from non-constant-expr
  • Use BSON_OS_UNIX
  • fetch() is `const`, and no `release` in cmpxch
  • Fix missing 'inline's
  • Mention the LIFO nature of the thread-pool, and the non-conformance of the 'push' operation
  • auditing the pool is compile-time optional
  • A shared-pointer abstraction
  • Make mongoc-shared a private API
  • typo
  • init the shared_ptr_mtx in a Windows-compatible way.
  • Update names to reflect C11 names, strong/weak cmpxchg
  • Fix const-to-mutable bind in MSVC
  • PR and impl updates for shared_ptr:
  • Rename to more closely match stdlib equivs
  • Delete macros that are unneeded
  • Tweak impl of atomic_store to be more efficient
  • Update doc comments
  • Address various PR comments
  • More tests for atomic operations
  • Restore bson_memory_barrier with a deprecation notice
  • Spelling is hard
  • No 'consume' memorder for atomic exchange
  • Fix detection of 64bit, plus unused intrin value
  • Missing i64_fetch and i64_fetch_sub on 32-bit playforms
  • Fix spinlock logic inversion for i64 emul
  • Remove config checks for atomics
  • Just use `int`
  • Thread-safe CSE startup without a mutex
  • Set march=i686 for 32bit builds
  • Update src/libmongoc/src/mongoc/mongoc-shared.c

Co-authored-by: Ezra Chung <88335979+eramongodb@users.noreply.github.com>

  • PR comments, use bson_assert_param, protect from self-assignment
  • CheckAtomics.cmake is gone
  • PR comments, and an assertion fix
  • Update doc comments from PR comments
  • Test aliasing to subobject, and fix C99 usage
  • Remove C99 usage, unneeded assertions, and warnings
  • Fix lost topology update with mongoc_topology_reconcile
  • Refresh td ptr after making changes in a test
  • Fix deadlocks in topology shutdown
  • Guard against a zero-allocated mongoc_set
  • Fix invalid test result JSON generation
  • Fix warnings from MSVC. Fix unexported symbols used on 32bit DLL
  • That is not 'const'
  • Remove C++ comments
  • Fix leak from misuse of bson_copy_to
  • Simplify scanner handshake setup
  • Fix leak of shared pointer when committing topo updates
  • Fix unsafe use-after free in test case
  • Missing cast-from-const in int64 atomic emul
  • Remove unneeded NULL checks. Fix doubled events when racing to invalidate servers
  • Use generation to validate server description
  • Fix attempt to double-reconnect
  • Update _mongoc_cluster_create_server_stream in conditional code segments
  • Conditionally set error in _cluster_fetch_sream_single
  • Refresh topology after requesting a scan
  • Fix race with the scanner to finish the initial handshake
  • Fix leak of topology descr reference
  • Fix leaked alloc of bson_t in _stream_run_hello
  • More leaks with dup_handshake_cmd
  • BSON initializer that is safe to destroy or to leak.
  • Undo incorrect changes for bson memcheck code
  • Less macro magic
  • Propagate the random seed when cloning a topology description
  • Prevent recursive topology modifications, which deadlock
  • Update comments and cleanup.
  • Remove mentions of topology->mutex.
  • Update comments that refer to locking.
  • Remove some redundant APIs.
  • Move some APIs to have internal linkage.
  • Guard against a deadlock with an assertion
  • Revert "Guard against a deadlock with an assertion"

This reverts commit 4b9892ed2196a2f8274885a9f5b844685dbeb814.

  • Update for PR comments. Cleanup, const, and typos
  • Fix use-after free of server description in test. Remove unneeded locking
  • Fix decl-after-stmt
  • Fix shadowed name
  • Pin docutils version when installing sphinx
  • Separate mutex for srv polling, document more usage of tpld_modification_mtx
  • Suppress unused-var warning
  • ThreadSanitizer calls me out for wrong atomics.
  • Fix race in on counters in counter test
  • Release the topology description outside of the mutex
  • Thread-safe direct topology modification for testing purposes
  • Fix unnecessary race on getting topology event counts
  • Thread-safety in tests around topology reconciliation
  • Fix regression in counters test
  • Switch sign in counters tests
  • More thread safety in tests around topology descriptions
  • Spell
  • Update to comments, docs, and fixes from PR
  • Score scanner_state as an int, so we can perform well-defined atomic ops on it
  • temporarily disable silence in 'run auth tests'
  • Fix missing close quote in evergreen test script
  • Temporarily enable logging for 'run auth tests'
  • Give generation map const-correctness

The generation map is owned by its server description, so the pointer
member should have deep-const semantics. This uncovers new potential
correctness issues, which are fixed now.

  • Fix TSan warning on server sets when running down
  • Clean up test usages of unsafe tpld access
  • Copy the maxElectionId for the topology description
  • Clear last_hello_response when invalidating server descriptions
  • Missed modification begin on topology
  • Remove redundant casts
  • Re-silence 'run auth tests'

Co-authored-by: Ezra Chung <88335979+eramongodb@users.noreply.github.com>
Branch: master
https://github.com/mongodb/mongo-c-driver/commit/ded9ae5e9f2897a283305175aae8e1bbf4021c36

Generated at Wed Feb 07 21:19:58 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.