[SERVER-63263] Add metric for connection establishment once MongoDB accepts() a new connection on a socket Created: 03/Feb/22 Updated: 29/Oct/23 Resolved: 20/Apr/22 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | 6.1.0-rc0 |
| Type: | Improvement | Priority: | Major - P3 |
| Reporter: | George Wangensteen | Assignee: | Reo Kimura (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||
| Sprint: | Service Arch 2022-2-21, Service Arch 2022-03-07, Service Arch 2022-03-21, Service Arch 2022-04-04, Service Arch 2022-04-18, Service Arch 2022-05-02 | ||||||||||||
| Participants: | |||||||||||||
| Story Points: | 3 | ||||||||||||
| Description |
|
We have often seen large delays in client perceived connection establishment latency, and don't have enough data to pin-down exactly where the delay is. While we often suspect delays may be happening in the TCP stack below mongoDB, or perhaps the network, we don't know how long MongoDB itself is taking on average to accept a new connection. As a first step should add a histogram that reveals how much time it takes for connections to be accept()ed on a socket by MongoDB's listener thread, until the connection is given its own dedicated thread and begins to run operations. |
| Comments |
| Comment by Githook User [ 19/Apr/22 ] |
|
Author: {'name': 'Reo Kimura', 'email': 'reo.kimura@mongodb.com', 'username': 'rkimura21'}Message: |
| Comment by Reo Kimura (Inactive) [ 09/Mar/22 ] |
|
bruce.lucas Yes, that is correct. I'll update this ticket with the relevant information as the code review progresses. |
| Comment by Bruce Lucas (Inactive) [ 09/Mar/22 ] |
|
george.wangensteen, reo.kimura, can you please add a comment to this ticket summarizing the design that you are going with? From the code review I think you are adding only a single metric, cumulativeConnectionEstablishmentLatency, and not a histogram, is that correct? |
| Comment by Bruce Lucas (Inactive) [ 03/Feb/22 ] |
|
I assume the intent is to record this in FTDC. If so, ideally the histogram would have a small number of buckets so it doesn't add a large FTDC burden. Alternatively, you could add a single metric that records cumulative time for all connections between the two points you mention. This could then be used to compute the average time that it takes over any interval you want (delta time divided by delta connections), and we've generally found that histograms don't add a lot of diagnostic value above what you get with averages. |