[SERVER-44839] Frequent schema changes in mongos ftdc metrics limits retention period Created: 26/Nov/19  Updated: 08/Jan/24  Resolved: 25/Jan/20

Status: Closed
Project: Core Server
Component/s: Diagnostics, Sharding
Affects Version/s: 4.2.0
Fix Version/s: 4.2.4, 4.3.3

Type: Bug Priority: Major - P3
Reporter: Bruce Lucas (Inactive) Assignee: Benjamin Caimano (Inactive)
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: File repro.sh    
Issue Links:
Backports
Related
is related to SERVER-36099 FTDC for mongos is unworkably large f... Closed
is related to SERVER-42125 Avoid increasing connection count to ... Closed
is related to SERVER-45546 Do not create HostPools for passive m... Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Requested:
v4.2
Sprint: Service Arch 2020-01-13, Service Arch 2020-01-27
Participants:

 Description   

The schema for metrics under connPoolStats.connectionsInUsePerPool.NetworkInterfaceTL-TaskExecutorPool-0 changes frequently as it appears that hosts in that tree are emitted in an order that changes every couple of samples, and hosts come and go frequently from the metrics. These frequent schema changes greatly reduce compression efficiency, limiting the retention period - as short as 15 hours in one case. The schema for this subtree should be monotonic and consistent from one sample to the next.



 Comments   
Comment by Githook User [ 02/Mar/20 ]

Author:

{'name': 'Ben Caimano', 'email': 'ben.caimano@10gen.com'}

Message: SERVER-44839 Make ConnectionPoolStats ordered and filtered

(cherry picked from commit d86e7c464d276fbd40570a4a2a7144fe133bd780)
Branch: v4.2
https://github.com/mongodb/mongo/commit/a8d87c2516057864d9cc5032dd99fb16e9581702

Comment by Githook User [ 25/Jan/20 ]

Author:

{'email': 'ben.caimano@10gen.com', 'name': 'Ben Caimano'}

Message: SERVER-44839 Make ConnectionPoolStats ordered and filtered
Branch: master
https://github.com/mongodb/mongo/commit/d86e7c464d276fbd40570a4a2a7144fe133bd780

Comment by Benjamin Caimano (Inactive) [ 14/Jan/20 ]

I've filed a separate ticket (SERVER-45546) for the underlying bug that made this problem evident.

Comment by Bruce Lucas (Inactive) [ 13/Jan/20 ]

Repro script attached.

Comment by Benjamin Caimano (Inactive) [ 13/Jan/20 ]

Liking the last ticket that touched the RSM/ConnPool FTDC conenction

Comment by Bruce Lucas (Inactive) [ 07/Jan/20 ]

Git bisect identifies the following commit as when the problem started:

edcd0b9a2254cbac3d843be28f373a4f0f3024b4 is the first bad commit
SERVER-42125 Omit passive members from RSM connection strings

Since this isn't directly related to connection pool stats or ftdc data, I imagine the change in behavior of the connection pool stats is an unfortunate side effect of this change, and some compensating change in collecting connection pool stats for ftdc will be needed make them stable and monotonic again.

Comment by Bruce Lucas (Inactive) [ 07/Jan/20 ]

Testing confirms that this is a 4.2 regression, introduced between 4.2.0-rc4 and 4.2.0-rc5.

Generated at Thu Feb 08 05:07:08 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.