Details
-
Epic
-
Resolution: Fixed
-
Critical - P2
-
None
-
None
-
5
-
11
-
12
-
150
-
Description
A simple benchmark which spawns shows negative scaling of operation throughput as the thread count increases.
workload_find.c creates n threads. Each thread pops a client from a mongoc_client_pool_t and repeatedly executes a find with filter _id: 0. I observed similar scaling behavior. I added a flag to alternatively create a separate single-threaded client per thread. These were the results on a 16 vCPU Ubuntu 18.04 host:
threads workload_find_pool workload_find_single
|
cpu ops/s cpu ops/s
|
1 52% 7 k 52% 6.9 k
|
10 540% 21 k 509% 47.5 k
|
100 600% 20.1 k 735% 80 k
|
Taking five samples of GDB stack traces shows many threads waiting for the topology mutex:
threads reverse call tree
|
65.000 ▽ LEAF
|
23.000 ├▽ __lll_lock_wait:135
|
23.000 │ ▽ __GI___pthread_mutex_lock:80
|
5.000 │ ├▷ _mongoc_topology_push_server_session
|
4.000 │ ├▷ mongoc_topology_select_server_id
|
4.000 │ ├▷ _mongoc_topology_update_cluster_time
|
3.000 │ ├▷ _mongoc_cluster_stream_for_server
|
3.000 │ ├▷ mongoc_cluster_run_command_monitored
|
2.000 │ ├▷ _mongoc_topology_pop_server_session
|
2.000 │ └▷ _mongoc_cluster_create_server_stream
|
Some of these functions could optimize to reduce how long they hold the topology mutex. A read/write lock may benefit the functions that are only reading the topology description.
To verify the performance is improved, let's add a performance benchmark test to exercise concurrent operations on a mongoc_client_pool_t.