[SERVER-36099] FTDC for mongos is unworkably large for large installations Created: 12/Jul/18  Updated: 29/Oct/23  Resolved: 21/May/19

Status: Closed
Project: Core Server
Component/s: Diagnostics
Affects Version/s: None
Fix Version/s: 3.6.14, 4.1.12, 4.0.11

Type: Bug Priority: Major - P3
Reporter: Bruce Lucas (Inactive) Assignee: A. Jesse Jiryu Davis
Resolution: Fixed Votes: 1
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Depends
Related
related to SERVER-44839 Frequent schema changes in mongos ftd... Closed
Operating System: ALL
Backport Requested:
v4.0, v3.6
Sprint: Security 2018-09-10, Service Arch 2019-04-22, Service Arch 2019-05-06, Service Arch 2019-05-20, Service Arch 2019-06-03
Participants:

 Description   

For installations with hundreds of nodes, the number of metrics collected by mongos FTDC can be unworkably large. For example one cluster with ~500 nodes collects ~20k metrics (per sample), mostly due to connPoolStats. This creates problems for downstream consumers of this information, and also will severely limit the retention period for FTDC data.

This is because connPoolStats records several metrics per (pool, host) pair. Some possible solutions

  • aggregate the information in some way before recording it to avoid the combinatorial explosion
  • remove this information altogether
  • add an option to remove it (off by default to avoid catching customers unawares?)


 Comments   
Comment by Githook User [ 26/Jun/19 ]

Author:

{'name': 'A. Jesse Jiryu Davis', 'email': 'jesse@mongodb.com', 'username': 'ajdavis'}

Message: SERVER-36099 Trim FTDC connection pool stats
Branch: v3.6
https://github.com/mongodb/mongo/commit/f9c258b9c420e8984173af69c41a0636b8d719ad

Comment by Githook User [ 26/Jun/19 ]

Author:

{'name': 'A. Jesse Jiryu Davis', 'username': 'ajdavis', 'email': 'jesse@mongodb.com'}

Message: SERVER-36099 Trim FTDC connection pool stats
Branch: v4.0
https://github.com/mongodb/mongo/commit/0800a6bb4af2ec7e4439a70e58d23c46ffe2381d

Comment by A. Jesse Jiryu Davis [ 21/May/19 ]

kelsey.schubert, bruce.lucas, and linda.qin, I've closed this ticket because I hope that these changes have made connection pool stats small enough to use. If one of you would like to test that the changes are sufficient, please let me know, and feel free to reopen the ticket if it needs more work.

Comment by Githook User [ 21/May/19 ]

Author:

{'email': 'jesse@mongodb.com', 'name': 'A. Jesse Jiryu Davis', 'username': 'ajdavis'}

Message: SERVER-36099 Trim FTDC connection pool stats
Branch: master
https://github.com/mongodb/mongo/commit/f30e70b5d1ebae87f373de108314993e58309739

Comment by A. Jesse Jiryu Davis [ 14/May/19 ]

The design is approved. I might have time to play with this before my Service Architecture rotation ends so I'll keep it assigned to me for the moment.

Comment by Mark Benvenuto [ 17/Sep/18 ]

To service architecture since connection pool stats has to reworked to either work better with ftdc or dropped from ftdc.

Comment by Mark Benvenuto [ 01/Aug/18 ]

My idea was to limit it to just 5 metrics, but I agree, it may not provide much value.

Comment by Bruce Lucas (Inactive) [ 01/Aug/18 ]

I don't personally have a lot of experience diagnosing problems from the mongos end. dmitry.agranat, kelsey.schubert, maybe you have an opinion or know someone who might?

If we are going to have per-host metrics, given the possibility of very large deployments, I think we would be best to limit it to one metric per host. This could simply be current count of connections to that host at the time of the measurement, like we do for mongod connection count.

Or did you mean just a single min, max, mean, median over number of connections to each host? I'm not sure that that's much more informative than just the total number of connections - I don't know for example what I might infer diagnostically if max >> min or max == min.

Generated at Thu Feb 08 04:42:02 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.