[SERVER-47641] Limit size of serverStatus metrics for the range deleter Created: 17/Apr/20  Updated: 29/Oct/23  Resolved: 22/Apr/20

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: None
Fix Version/s: 4.4.0-rc3, 4.7.0

Type: Bug Priority: Major - P3
Reporter: Matthew Saltz (Inactive) Assignee: Gregory Noma
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Depends
Documented
is documented by DOCS-13606 Investigate changes in SERVER-47641: ... Closed
Problem/Incident
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Requested:
v4.4
Sprint: Sharding 2020-05-04
Participants:
Linked BF Score: 95

 Description   

SERVER-47528 describes a case where a serverStatus metric with per-namespace information consumes too much space in FTDC. SERVER-14126 added a metric to track the number of range deletion tasks per namespace (rangeDeletionTasks, code is here). We should limit the size of this somehow.



 Comments   
Comment by Githook User [ 22/Apr/20 ]

Author:

{'name': 'Gregory Noma', 'email': 'gregory.noma@gmail.com', 'username': 'gregorynoma'}

Message: SERVER-47641 Use appendNumber when appending number of range deletion tasks to serverStatus

This reverts commit 62d9485657717bf61fbb870cb3d09b52b1a614dd.
Branch: master
https://github.com/mongodb/mongo/commit/47bdd45ded2b9c16a88877c161023c3364099196

Comment by Githook User [ 22/Apr/20 ]

Author:

{'name': 'Gregory Noma', 'email': 'gregory.noma@gmail.com', 'username': 'gregorynoma'}

Message: SERVER-47641 Report total number of range deletion tasks in serverStatus rather than per collection

(cherry picked from commit fa945325938ada67a088e7dbe951404d092e8771)
(cherry picked from commit 9717e231da81bddfeef636fd99b93725a2c2a1c2)
Branch: v4.4
https://github.com/mongodb/mongo/commit/d0343420e4aa7520e4e2007090ba7a4c499ddae0

Comment by Githook User [ 22/Apr/20 ]

Author:

{'name': 'Gregory Noma', 'email': 'gregory.noma@gmail.com', 'username': 'gregorynoma'}

Message: SERVER-47641 Use long long when counting number of range deletion tasks for serverStatus
Branch: master
https://github.com/mongodb/mongo/commit/9717e231da81bddfeef636fd99b93725a2c2a1c2

Comment by Githook User [ 22/Apr/20 ]

Author:

{'name': 'Benety Goh', 'email': 'benety@mongodb.com', 'username': 'benety'}

Message: SERVER-47641 fix mac os x compile
Branch: master
https://github.com/mongodb/mongo/commit/62d9485657717bf61fbb870cb3d09b52b1a614dd

Comment by Githook User [ 21/Apr/20 ]

Author:

{'name': 'Gregory Noma', 'email': 'gregory.noma@gmail.com', 'username': 'gregorynoma'}

Message: SERVER-47641 Report total number of range deletion tasks in serverStatus rather than per collection
Branch: master
https://github.com/mongodb/mongo/commit/fa945325938ada67a088e7dbe951404d092e8771

Comment by Bruce Lucas (Inactive) [ 21/Apr/20 ]

Thanks, that definitely sounds useful.

Comment by Gregory Noma [ 21/Apr/20 ]

bruce.lucas Per your insight, we've decided to go with reporting a single number representing the total number of range deletion tasks, rather than per collection.

Comment by Bruce Lucas (Inactive) [ 20/Apr/20 ]

Specifically it means addition or deletion of a key, where a key is a path through the document from the root that leads to a numeric value. It's very expensive because it requires starting a new chunk; each chunk is a reference document (big, somewhat compressible) with a bunch of delta-code arrays of values for each key in the reference document (very highly compressible). The number of them is also a consideration; thousands or tens of thousands of them would inflate ftdc and also reduce retention.

Isn't this information also obtainable from the logs, with a little analysis?

From an FTDC perspective I think it would be best to omit this information; that could be done with a parameter to serverStatus. I think it's also iffy to have this in serverStatus in general - how for example does a huge serverStatus impact Cloud monitoring?

Comment by Matthew Saltz (Inactive) [ 17/Apr/20 ]

Sure, it's a BSONArray of the form

[ "mynamespace" : <number of range deletion tasks pending for mynamespace>, ..., "mylastnamespace" : <number of range deletion tasks ...>]

Also, I'm not sure what you mean by "it's also a question of schema changes in serverStatus which are very expensive from an ftdc perspective". By schema change, do you mean the addition of a new field and/or modification of the format of a given field? In what way is it expensive? (I'm not super familiar with the process for obtaining FTDC data so forgive me if this is a basic question.)

Comment by Bruce Lucas (Inactive) [ 17/Apr/20 ]

It's not just a question of number of keys, it's also a question of schema changes in serverStatus which are very expensive from an ftdc perspective, depending on the rate at which they occur. Can you describe or point to a description of the content of this field? As a general rule, it's not a good idea to put per-namespace info in serverStatus.

Comment by Matthew Saltz (Inactive) [ 17/Apr/20 ]

Two possible options would be to limit it a certain BSONObj size, or limit to a max number of namespaces. When the limit is exceeded, we could instead report the total number of range deletion tasks across all namespaces rather than reporting that number per namespace.

bruce.lucas would you have a preference? Do you think there's some number of namespaces after which the information simply becomes unwieldy?

Note that the same (and in fact, more detailed) information will also be visible in the config.rangeDeletions collection on each shard.

Generated at Thu Feb 08 05:14:48 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.