Uploaded image for project: 'C Driver'
  1. C Driver
  2. CDRIVER-4002

Improve multi-threaded perf for mongoc_client_pool_t

    • Type: Icon: Epic Epic
    • Resolution: Fixed
    • Priority: Icon: Critical - P2 Critical - P2
    • 1.20.0
    • Affects Version/s: None
    • Component/s: Performance
    • Labels:
      None
    • 5
    • 11
    • 12
    • 150
    • Hide

      Engineer(s): Colby, Kevin

      Summary: Improve multi-threaded perf for mongoc_client_pool_t

      2021-12-27:

      Status update: Updating end date to 1/7/2022

      • Adding Ubuntu 18.04 variant to benchmark tasks in review.
      • Next (and last) adding alerts for regression.

      Rationale for delays:

      • Holidays.

      Risks:

      • No risks. Performance improvements are released.

      2021-12-13: Updating end date to 12/17/2021

      Status update:

      • Performance tests merged, which show improvement.
      • Working on adding alerts.

      Rationale for delays:

      • No good rationale.
      • PTO for three days.

      Risks:

      • No risks. Performance improvements are released.

      2021-11-30: Updating end date to 12/07/2021

      Status update:

      • 1.20.0 released with perf improvements.
      • Kevin and Colby have independently validated changes.
      • Kevin working on adding perf tests to Evergreen.

      Rationale for delays:

      • Kevin seeing unexpected difference between tests run on a spawn host an equivalent host in patch builds. Investigating.

      Risks:

      • No risks. Performance changes are complete. Only tests and alerting remain.

      2021-11-16: Updating end date to 11/19/2021

      Status update:

      • Topology description contention change merged.
      • Performance test in review, and alerts to be set up after.
      • 1.20.0 will be released ASAP and will not wait on performance alerts.

      Rationale for delays:

      • Review took longer than anticipated.
      • Performance tests were started later than anticipated.

      Risks:

      • No additional risks.

      2021-11-02: Updated target date to 2021-11-05

      Status update:

      • Removing contention from topology description is approved by one; waiting on final changes.

      Rationale for delays:

      • Reviewer PTO.

      Risks:

      • None.

      2021-10-19: No update to target date

      Status update:

      • Next: fixing remaining test failures and putting topology description in review.

      Rationale for delays:

      • No delays.

      Risks:

      • None.

      2021-10-05: Updating target date to 2021-10-22

      Status update:

      • No significant changes.

      Rationale for delays:

      • Colby was on PTO for a week to move.

      Risks:

      • None.

      2021-09-21: Updating target date to 2021-10-08

      Status:

      • Dependencies of reducing contention on topology description merged.
      • Removing the session pool from the topology mutex merged.
      • Next: fixing remaining test failures and putting topology description in review.
      • Adding 2 weeks to the end date for reviews and responding to feedback.

      Rationale for delays:

      • Unexpected difficulties supporting old platforms for atomics improvements.
      • Less availability while moving.

      Risks:

      • None. Still on track for the C driver 1.20.0 target release date.

      2021-09-07: No update to target date.

      • Colby split off reviews of topology description contention into smaller reviews for atomics improvements, a shared pointer abstraction.

      2021-08-24: Updating target date to 2021-09-24

      • Bumping target date by one month because of unanticipated complexity and less availability from Colby in the upcoming month.
      • Colby has a review up for session pool improvements.

      2021-08-10: Setting initial target end date to 2021-08-27

      • Colby started prototyping this late July and after a couple weeks of investigation and prototyping realized that this needs much more work than we originally thought it would
      • He paused on it for a week or so in between and is working on it again. We've updated the estimate based on his current understanding of the remaining work

       

      Show
      Engineer(s): Colby, Kevin Summary: Improve multi-threaded perf for mongoc_client_pool_t 2021-12-27: Status update: Updating end date to 1/7/2022 Adding Ubuntu 18.04 variant to benchmark tasks in review. Next (and last) adding alerts for regression. Rationale for delays: Holidays. Risks: No risks. Performance improvements are released. 2021-12-13: Updating end date to 12/17/2021 Status update: Performance tests merged, which show improvement. Working on adding alerts. Rationale for delays: No good rationale. PTO for three days. Risks: No risks. Performance improvements are released. 2021-11-30: Updating end date to 12/07/2021 Status update: 1.20.0 released with perf improvements. Kevin and Colby have independently validated changes. Kevin working on adding perf tests to Evergreen. Rationale for delays: Kevin seeing unexpected difference between tests run on a spawn host an equivalent host in patch builds. Investigating. Risks: No risks. Performance changes are complete. Only tests and alerting remain. 2021-11-16: Updating end date to 11/19/2021 Status update: Topology description contention change merged. Performance test in review, and alerts to be set up after. 1.20.0 will be released ASAP and will not wait on performance alerts. Rationale for delays: Review took longer than anticipated. Performance tests were started later than anticipated. Risks: No additional risks. 2021-11-02: Updated target date to 2021-11-05 Status update: Removing contention from topology description is approved by one; waiting on final changes. Rationale for delays: Reviewer PTO. Risks: None. 2021-10-19: No update to target date Status update: Next: fixing remaining test failures and putting topology description in review. Rationale for delays: No delays. Risks: None. 2021-10-05: Updating target date to 2021-10-22 Status update: No significant changes. Rationale for delays: Colby was on PTO for a week to move. Risks: None. 2021-09-21: Updating target date to 2021-10-08 Status: Dependencies of reducing contention on topology description merged. Removing the session pool from the topology mutex merged. Next: fixing remaining test failures and putting topology description in review. Adding 2 weeks to the end date for reviews and responding to feedback. Rationale for delays: Unexpected difficulties supporting old platforms for atomics improvements. Less availability while moving. Risks: None. Still on track for the C driver 1.20.0 target release date. 2021-09-07: No update to target date. Colby split off reviews of topology description contention into smaller reviews for atomics improvements, a shared pointer abstraction. 2021-08-24: Updating target date to 2021-09-24 Bumping target date by one month because of unanticipated complexity and less availability from Colby in the upcoming month. Colby has a review up for session pool improvements. 2021-08-10: Setting initial target end date to 2021-08-27 Colby started prototyping this late July and after a couple weeks of investigation and prototyping realized that this needs much more work than we originally thought it would He paused on it for a week or so in between and is working on it again. We've updated the estimate based on his current understanding of the remaining work  

      A simple benchmark which spawns shows negative scaling of operation throughput as the thread count increases.

      workload_find.c creates n threads. Each thread pops a client from a mongoc_client_pool_t and repeatedly executes a find with filter _id: 0. I observed similar scaling behavior. I added a flag to alternatively create a separate single-threaded client per thread. These were the results on a 16 vCPU Ubuntu 18.04 host:

      threads     workload_find_pool        workload_find_single
                  cpu   ops/s               cpu   ops/s
      1           52%     7 k               52%   6.9 k
      10         540%    21 k              509%  47.5 k
      100        600%  20.1 k              735%    80 k
      

      Taking five samples of GDB stack traces shows many threads waiting for the topology mutex:

      threads  reverse call tree
       65.000  ▽ LEAF
       23.000  ├▽ __lll_lock_wait:135
       23.000  │ ▽ __GI___pthread_mutex_lock:80
        5.000  │ ├▷ _mongoc_topology_push_server_session
        4.000  │ ├▷ mongoc_topology_select_server_id
        4.000  │ ├▷ _mongoc_topology_update_cluster_time
        3.000  │ ├▷ _mongoc_cluster_stream_for_server
        3.000  │ ├▷ mongoc_cluster_run_command_monitored
        2.000  │ ├▷ _mongoc_topology_pop_server_session
        2.000  │ └▷ _mongoc_cluster_create_server_stream
      

      Some of these functions could optimize to reduce how long they hold the topology mutex. A read/write lock may benefit the functions that are only reading the topology description.

      To verify the performance is improved, let's add a performance benchmark test to exercise concurrent operations on a mongoc_client_pool_t.

            Assignee:
            colby.pike@mongodb.com Colby Pike
            Reporter:
            kevin.albertson@mongodb.com Kevin Albertson
            Votes:
            0 Vote for this issue
            Watchers:
            18 Start watching this issue

              Created:
              Updated:
              Resolved:
              24 weeks, 3 days