[CDRIVER-4002] Improve multi-threaded perf for mongoc_client_pool_t Created: 19/May/21  Updated: 28/Oct/23  Resolved: 06/Jan/22

Status: Closed
Project: C Driver
Component/s: Performance
Affects Version/s: None
Fix Version/s: 1.20.0

Type: Epic Priority: Critical - P2
Reporter: Kevin Albertson Assignee: Colby Pike
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
Problem/Incident
Start date:
End date:
Calendar Time: 24 weeks, 3 days
Scope Cost Estimate: 5
Cost to Date: 11
Final Cost Estimate: 12
Cost Threshold %: 150
Detailed Project Statuses:

Engineer(s): Colby, Kevin

Summary: Improve multi-threaded perf for mongoc_client_pool_t

2021-12-27:

Status update: Updating end date to 1/7/2022

  • Adding Ubuntu 18.04 variant to benchmark tasks in review.
  • Next (and last) adding alerts for regression.

Rationale for delays:

  • Holidays.

Risks:

  • No risks. Performance improvements are released.

2021-12-13: Updating end date to 12/17/2021

Status update:

  • Performance tests merged, which show improvement.
  • Working on adding alerts.

Rationale for delays:

  • No good rationale.
  • PTO for three days.

Risks:

  • No risks. Performance improvements are released.

2021-11-30: Updating end date to 12/07/2021

Status update:

  • 1.20.0 released with perf improvements.
  • Kevin and Colby have independently validated changes.
  • Kevin working on adding perf tests to Evergreen.

Rationale for delays:

  • Kevin seeing unexpected difference between tests run on a spawn host an equivalent host in patch builds. Investigating.

Risks:

  • No risks. Performance changes are complete. Only tests and alerting remain.

2021-11-16: Updating end date to 11/19/2021

Status update:

  • Topology description contention change merged.
  • Performance test in review, and alerts to be set up after.
  • 1.20.0 will be released ASAP and will not wait on performance alerts.

Rationale for delays:

  • Review took longer than anticipated.
  • Performance tests were started later than anticipated.

Risks:

  • No additional risks.

2021-11-02: Updated target date to 2021-11-05

Status update:

  • Removing contention from topology description is approved by one; waiting on final changes.

Rationale for delays:

  • Reviewer PTO.

Risks:

  • None.

2021-10-19: No update to target date

Status update:

  • Next: fixing remaining test failures and putting topology description in review.

Rationale for delays:

  • No delays.

Risks:

  • None.

2021-10-05: Updating target date to 2021-10-22

Status update:

  • No significant changes.

Rationale for delays:

  • Colby was on PTO for a week to move.

Risks:

  • None.

2021-09-21: Updating target date to 2021-10-08

Status:

  • Dependencies of reducing contention on topology description merged.
  • Removing the session pool from the topology mutex merged.
  • Next: fixing remaining test failures and putting topology description in review.
  • Adding 2 weeks to the end date for reviews and responding to feedback.

Rationale for delays:

  • Unexpected difficulties supporting old platforms for atomics improvements.
  • Less availability while moving.

Risks:

  • None. Still on track for the C driver 1.20.0 target release date.

2021-09-07: No update to target date.

  • Colby split off reviews of topology description contention into smaller reviews for atomics improvements, a shared pointer abstraction.

2021-08-24: Updating target date to 2021-09-24

  • Bumping target date by one month because of unanticipated complexity and less availability from Colby in the upcoming month.
  • Colby has a review up for session pool improvements.

2021-08-10: Setting initial target end date to 2021-08-27

  • Colby started prototyping this late July and after a couple weeks of investigation and prototyping realized that this needs much more work than we originally thought it would
  • He paused on it for a week or so in between and is working on it again. We've updated the estimate based on his current understanding of the remaining work

 


 Description   

A simple benchmark which spawns shows negative scaling of operation throughput as the thread count increases.

workload_find.c creates n threads. Each thread pops a client from a mongoc_client_pool_t and repeatedly executes a find with filter _id: 0. I observed similar scaling behavior. I added a flag to alternatively create a separate single-threaded client per thread. These were the results on a 16 vCPU Ubuntu 18.04 host:

threads     workload_find_pool        workload_find_single
            cpu   ops/s               cpu   ops/s
1           52%     7 k               52%   6.9 k
10         540%    21 k              509%  47.5 k
100        600%  20.1 k              735%    80 k

Taking five samples of GDB stack traces shows many threads waiting for the topology mutex:

threads  reverse call tree
 65.000  ▽ LEAF
 23.000  ├▽ __lll_lock_wait:135
 23.000  │ ▽ __GI___pthread_mutex_lock:80
  5.000  │ ├▷ _mongoc_topology_push_server_session
  4.000  │ ├▷ mongoc_topology_select_server_id
  4.000  │ ├▷ _mongoc_topology_update_cluster_time
  3.000  │ ├▷ _mongoc_cluster_stream_for_server
  3.000  │ ├▷ mongoc_cluster_run_command_monitored
  2.000  │ ├▷ _mongoc_topology_pop_server_session
  2.000  │ └▷ _mongoc_cluster_create_server_stream

Some of these functions could optimize to reduce how long they hold the topology mutex. A read/write lock may benefit the functions that are only reading the topology description.

To verify the performance is improved, let's add a performance benchmark test to exercise concurrent operations on a mongoc_client_pool_t.



 Comments   
Comment by Kevin Albertson [ 17/Nov/21 ]

C 1.20.0 has been released. Leaving this epic open until continuous tests (CDRIVER-4105) and alerting (CDRIVER-4228) is complete.

Comment by Kevin Albertson [ 17/Nov/21 ]

This is the result of running a slightly modified workload_find.c on an ubuntu1804-large host against 1.19.2 and the current development branch on commit 6f2102b. This includes CDRIVER-4114, CDRIVER-4113, CDRIVER-4124, CDRIVER-4147, CDRIVER-4227.

Threads 1.19.2 ops/s 6f2102b ops/s
1 7k 7k
10 38k 48k
100 51k 84k

This includes scripts to compare against the 1.19.2 release and current development branch:
https://github.com/kevinAlbs/c-bootstrap/tree/ab5f88a96ec9fad5c93d35c07feabc4810e3083c/investigations/cdriver4002

Comment by Githook User [ 17/Nov/21 ]

Author:

{'name': 'vector-of-bool', 'email': 'vectorofbool@gmail.com', 'username': 'vector-of-bool'}

Message: CDRIVER-4227 Add a shared_mutex, and use it when locking in shared_ptr atomic operations (#895)

  • Add a shared_mutex, and use it when locking in shared_ptr atomic operations

Tag CDRIVER-4002

Co-authored-by: Kevin Albertson <kevin.albertson@10gen.com>

  • Force enable more recent POSIX APIs, regardless of language mode

This change only applies to private code, and does not expose any
changes in the public API.

  • No reallocf in strict POSIX mode

Co-authored-by: Kevin Albertson <kevin.albertson@10gen.com>
Branch: master
https://github.com/mongodb/mongo-c-driver/commit/8d1af19aa067818044d5f52039ae2c761dad3168

Generated at Wed Feb 07 21:19:39 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.