-
Type: Epic
-
Resolution: Fixed
-
Priority: Critical - P2
-
Affects Version/s: None
-
Component/s: Performance
-
None
-
Done
-
Improve multi-threaded perf
-
5
-
11
-
12
-
150
-
A simple benchmark which spawns shows negative scaling of operation throughput as the thread count increases.
workload_find.c creates n threads. Each thread pops a client from a mongoc_client_pool_t and repeatedly executes a find with filter _id: 0. I observed similar scaling behavior. I added a flag to alternatively create a separate single-threaded client per thread. These were the results on a 16 vCPU Ubuntu 18.04 host:
threads workload_find_pool workload_find_single cpu ops/s cpu ops/s 1 52% 7 k 52% 6.9 k 10 540% 21 k 509% 47.5 k 100 600% 20.1 k 735% 80 k
Taking five samples of GDB stack traces shows many threads waiting for the topology mutex:
threads reverse call tree 65.000 ▽ LEAF 23.000 ├▽ __lll_lock_wait:135 23.000 │ ▽ __GI___pthread_mutex_lock:80 5.000 │ ├▷ _mongoc_topology_push_server_session 4.000 │ ├▷ mongoc_topology_select_server_id 4.000 │ ├▷ _mongoc_topology_update_cluster_time 3.000 │ ├▷ _mongoc_cluster_stream_for_server 3.000 │ ├▷ mongoc_cluster_run_command_monitored 2.000 │ ├▷ _mongoc_topology_pop_server_session 2.000 │ └▷ _mongoc_cluster_create_server_stream
Some of these functions could optimize to reduce how long they hold the topology mutex. A read/write lock may benefit the functions that are only reading the topology description.
To verify the performance is improved, let's add a performance benchmark test to exercise concurrent operations on a mongoc_client_pool_t.