Loading...

XML

Word

Printable

JSON

Type: Epic
Resolution: Fixed
Priority: Critical - P2
Fix Version/s: 1.20.0
Affects Version/s: None
Component/s: Performance
Labels:
None

Epic Name:
Improve multi-threaded perf
Documentation Changes:
None
Epic Status:
Done

Quarter:
None
Scope Cost Estimate:
5
Cost to Date:
11
Final Cost Estimate:
12
Cost Threshold %:
150
Cost Diff from Original %:
140
Confidence Status:
None
Latest Project Update:
None
Detailed Project Statuses:
Hide

Engineer(s): Colby, Kevin

Summary: Improve multi-threaded perf for mongoc_client_pool_t

2021-12-27:

Status update: Updating end date to 1/7/2022

Adding Ubuntu 18.04 variant to benchmark tasks in review.

Next (and last) adding alerts for regression.

Rationale for delays:

Holidays.

Risks:

No risks. Performance improvements are released.

2021-12-13: Updating end date to 12/17/2021

Status update:

Performance tests merged, which show improvement.

Working on adding alerts.

Rationale for delays:

No good rationale.

PTO for three days.

Risks:

No risks. Performance improvements are released.

2021-11-30: Updating end date to 12/07/2021

Status update:

1.20.0 released with perf improvements.

Kevin and Colby have independently validated changes.

Kevin working on adding perf tests to Evergreen.

Rationale for delays:

Kevin seeing unexpected difference between tests run on a spawn host an equivalent host in patch builds. Investigating.

Risks:

No risks. Performance changes are complete. Only tests and alerting remain.

2021-11-16: Updating end date to 11/19/2021

Status update:

Topology description contention change merged.

Performance test in review, and alerts to be set up after.

1.20.0 will be released ASAP and will not wait on performance alerts.

Rationale for delays:

Review took longer than anticipated.

Performance tests were started later than anticipated.

Risks:

No additional risks.

2021-11-02: Updated target date to 2021-11-05

Status update:

Removing contention from topology description is approved by one; waiting on final changes.

Rationale for delays:

Reviewer PTO.

Risks:

None.

2021-10-19: No update to target date

Status update:

Next: fixing remaining test failures and putting topology description in review.

Rationale for delays:

No delays.

Risks:

None.

2021-10-05: Updating target date to 2021-10-22

Status update:

No significant changes.

Rationale for delays:

Colby was on PTO for a week to move.

Risks:

None.

2021-09-21: Updating target date to 2021-10-08

Status:

Dependencies of reducing contention on topology description merged.

Removing the session pool from the topology mutex merged.

Next: fixing remaining test failures and putting topology description in review.

Adding 2 weeks to the end date for reviews and responding to feedback.

Rationale for delays:

Unexpected difficulties supporting old platforms for atomics improvements.

Less availability while moving.

Risks:

None. Still on track for the C driver 1.20.0 target release date.

2021-09-07: No update to target date.

Colby split off reviews of topology description contention into smaller reviews for atomics improvements, a shared pointer abstraction.

2021-08-24: Updating target date to 2021-09-24

Bumping target date by one month because of unanticipated complexity and less availability from Colby in the upcoming month.

Colby has a review up for session pool improvements.

2021-08-10: Setting initial target end date to 2021-08-27

Colby started prototyping this late July and after a couple weeks of investigation and prototyping realized that this needs much more work than we originally thought it would

He paused on it for a week or so in between and is working on it again. We've updated the estimate based on his current understanding of the remaining work
Show
Engineer(s): Colby, Kevin Summary: Improve multi-threaded perf for mongoc_client_pool_t 2021-12-27: Status update: Updating end date to 1/7/2022 Adding Ubuntu 18.04 variant to benchmark tasks in review. Next (and last) adding alerts for regression. Rationale for delays: Holidays. Risks: No risks. Performance improvements are released. 2021-12-13: Updating end date to 12/17/2021 Status update: Performance tests merged, which show improvement. Working on adding alerts. Rationale for delays: No good rationale. PTO for three days. Risks: No risks. Performance improvements are released. 2021-11-30: Updating end date to 12/07/2021 Status update: 1.20.0 released with perf improvements. Kevin and Colby have independently validated changes. Kevin working on adding perf tests to Evergreen. Rationale for delays: Kevin seeing unexpected difference between tests run on a spawn host an equivalent host in patch builds. Investigating. Risks: No risks. Performance changes are complete. Only tests and alerting remain. 2021-11-16: Updating end date to 11/19/2021 Status update: Topology description contention change merged. Performance test in review, and alerts to be set up after. 1.20.0 will be released ASAP and will not wait on performance alerts. Rationale for delays: Review took longer than anticipated. Performance tests were started later than anticipated. Risks: No additional risks. 2021-11-02: Updated target date to 2021-11-05 Status update: Removing contention from topology description is approved by one; waiting on final changes. Rationale for delays: Reviewer PTO. Risks: None. 2021-10-19: No update to target date Status update: Next: fixing remaining test failures and putting topology description in review. Rationale for delays: No delays. Risks: None. 2021-10-05: Updating target date to 2021-10-22 Status update: No significant changes. Rationale for delays: Colby was on PTO for a week to move. Risks: None. 2021-09-21: Updating target date to 2021-10-08 Status: Dependencies of reducing contention on topology description merged. Removing the session pool from the topology mutex merged. Next: fixing remaining test failures and putting topology description in review. Adding 2 weeks to the end date for reviews and responding to feedback. Rationale for delays: Unexpected difficulties supporting old platforms for atomics improvements. Less availability while moving. Risks: None. Still on track for the C driver 1.20.0 target release date. 2021-09-07: No update to target date. Colby split off reviews of topology description contention into smaller reviews for atomics improvements, a shared pointer abstraction. 2021-08-24: Updating target date to 2021-09-24 Bumping target date by one month because of unanticipated complexity and less availability from Colby in the upcoming month. Colby has a review up for session pool improvements. 2021-08-10: Setting initial target end date to 2021-08-27 Colby started prototyping this late July and after a couple weeks of investigation and prototyping realized that this needs much more work than we originally thought it would He paused on it for a week or so in between and is working on it again. We've updated the estimate based on his current understanding of the remaining work

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None
Goal Tier(s):
None

A simple benchmark which spawns shows negative scaling of operation throughput as the thread count increases.

workload_find.c creates n threads. Each thread pops a client from a mongoc_client_pool_t and repeatedly executes a find with filter _id: 0. I observed similar scaling behavior. I added a flag to alternatively create a separate single-threaded client per thread. These were the results on a 16 vCPU Ubuntu 18.04 host:

threads     workload_find_pool        workload_find_single
            cpu   ops/s               cpu   ops/s
1           52%     7 k               52%   6.9 k
10         540%    21 k              509%  47.5 k
100        600%  20.1 k              735%    80 k

Taking five samples of GDB stack traces shows many threads waiting for the topology mutex:

threads  reverse call tree
 65.000  ▽ LEAF
 23.000  ├▽ __lll_lock_wait:135
 23.000  │ ▽ __GI___pthread_mutex_lock:80
  5.000  │ ├▷ _mongoc_topology_push_server_session
  4.000  │ ├▷ mongoc_topology_select_server_id
  4.000  │ ├▷ _mongoc_topology_update_cluster_time
  3.000  │ ├▷ _mongoc_cluster_stream_for_server
  3.000  │ ├▷ mongoc_cluster_run_command_monitored
  2.000  │ ├▷ _mongoc_topology_pop_server_session
  2.000  │ └▷ _mongoc_cluster_create_server_stream

Some of these functions could optimize to reduce how long they hold the topology mutex. A read/write lock may benefit the functions that are only reading the topology description.

To verify the performance is improved, let's add a performance benchmark test to exercise concurrent operations on a mongoc_client_pool_t.

Assignee:: Colby Pike
Reporter:: Kevin Albertson
Goal DRI(s):: None
Votes:: 0 Vote for this issue
Watchers:: 19 Start watching this issue

Created:: May 19 2021 11:29:54 PM UTC
Updated:: Jul 19 2024 01:29:42 AM UTC
Resolved:: Jan 06 2022 01:18:45 AM UTC
Calendar Time:: 24 weeks, 3 days
Target start:: None
Target end:: None
Start date:: 20/Jul/21
End date:: 07/Jan/22
Confidence Status Last Update:: 20/Jul/21 6:15 PM
Goal Completion Date:: None

Details

Description

Attachments

Forms

Activity

People

Dates