[SERVER-36099] FTDC for mongos is unworkably large for large installations Created: 12/Jul/18 Updated: 29/Oct/23 Resolved: 21/May/19 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Diagnostics |
| Affects Version/s: | None |
| Fix Version/s: | 3.6.14, 4.1.12, 4.0.11 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Bruce Lucas (Inactive) | Assignee: | A. Jesse Jiryu Davis |
| Resolution: | Fixed | Votes: | 1 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||
| Operating System: | ALL | ||||||||||||||||
| Backport Requested: |
v4.0, v3.6
|
||||||||||||||||
| Sprint: | Security 2018-09-10, Service Arch 2019-04-22, Service Arch 2019-05-06, Service Arch 2019-05-20, Service Arch 2019-06-03 | ||||||||||||||||
| Participants: | |||||||||||||||||
| Description |
|
For installations with hundreds of nodes, the number of metrics collected by mongos FTDC can be unworkably large. For example one cluster with ~500 nodes collects ~20k metrics (per sample), mostly due to connPoolStats. This creates problems for downstream consumers of this information, and also will severely limit the retention period for FTDC data. This is because connPoolStats records several metrics per (pool, host) pair. Some possible solutions
|
| Comments |
| Comment by Githook User [ 26/Jun/19 ] |
|
Author: {'name': 'A. Jesse Jiryu Davis', 'email': 'jesse@mongodb.com', 'username': 'ajdavis'}Message: |
| Comment by Githook User [ 26/Jun/19 ] |
|
Author: {'name': 'A. Jesse Jiryu Davis', 'username': 'ajdavis', 'email': 'jesse@mongodb.com'}Message: |
| Comment by A. Jesse Jiryu Davis [ 21/May/19 ] |
|
kelsey.schubert, bruce.lucas, and linda.qin, I've closed this ticket because I hope that these changes have made connection pool stats small enough to use. If one of you would like to test that the changes are sufficient, please let me know, and feel free to reopen the ticket if it needs more work. |
| Comment by Githook User [ 21/May/19 ] |
|
Author: {'email': 'jesse@mongodb.com', 'name': 'A. Jesse Jiryu Davis', 'username': 'ajdavis'}Message: |
| Comment by A. Jesse Jiryu Davis [ 14/May/19 ] |
|
The design is approved. I might have time to play with this before my Service Architecture rotation ends so I'll keep it assigned to me for the moment. |
| Comment by Mark Benvenuto [ 17/Sep/18 ] |
|
To service architecture since connection pool stats has to reworked to either work better with ftdc or dropped from ftdc. |
| Comment by Mark Benvenuto [ 01/Aug/18 ] |
|
My idea was to limit it to just 5 metrics, but I agree, it may not provide much value. |
| Comment by Bruce Lucas (Inactive) [ 01/Aug/18 ] |
|
I don't personally have a lot of experience diagnosing problems from the mongos end. dmitry.agranat, kelsey.schubert, maybe you have an opinion or know someone who might? If we are going to have per-host metrics, given the possibility of very large deployments, I think we would be best to limit it to one metric per host. This could simply be current count of connections to that host at the time of the measurement, like we do for mongod connection count. Or did you mean just a single min, max, mean, median over number of connections to each host? I'm not sure that that's much more informative than just the total number of connections - I don't know for example what I might infer diagnostically if max >> min or max == min. |