[SERVER-79131] Improve performance of collecting operator usage metrics Created: 19/Jul/23  Updated: 28/Aug/23

Status: Backlog
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Improvement Priority: Major - P3
Reporter: Charlie Swanson Assignee: Backlog - Query Execution
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: PNG File Screenshot 2023-07-19 at 2.48.22 PM.png     PNG File Screenshot 2023-07-19 at 2.50.42 PM.png     File perf-test_phase-0004_flamegraph_connections_merged.svg    
Issue Links:
Related
is related to SERVER-85107 Tracking: Performance is "good enough" Closed
Assigned Teams:
Query Execution
Participants:

 Description   

I noticed 'incrementMatchExprCounter' and 'stopExpressionCounters' in some flame graphs (attached). It was only about 0.16% and 0.17% of the time for IDHACK workloads, but I was still surprised it showed up at all. When I looked into it, it seems that it's using atomic counters and allocating some memory for each one. I imagine some of this is for convenience with hooking into FTDC counters, but I thought that there must be a more efficient way to do it, and probably also we could back off some of the exactness and accept an approximate count to lower the overhead - we are only looking at these for usage metrics after all.

See these two other posts which inspired me: Approximation Pattern: replace 100% of threads increments by 1 with 1% of threads incrementing by 100 and this paper from SIGMOD (poster) which suggests splitting hot counters into many discrete counters to help with concurrency.

That last one is probably overkill here, but seemed topical.


Generated at Thu Feb 08 06:40:10 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.