[SERVER-75277] TTL deleter attributes resource metrics across multiple databases Created: 24/Mar/23  Updated: 01/Feb/24  Resolved: 28/Mar/23

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: 6.2.1
Fix Version/s: 7.0.0-rc0, 6.3.1

Type: Bug Priority: Critical - P2
Reporter: Eric Milkie Assignee: Louis Williams
Resolution: Fixed Votes: 1
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Depends
Problem/Incident
Related
Assigned Teams:
Storage Execution
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Requested:
v6.3
Steps To Reproduce:

You can see the issue if you add one invariant to ScopedMetricsCollector constructor, like this:

...
    invariant(!metrics.hasCollectedMetrics());   // <- add this new line!
    metrics.beginScopedCollecting(opCtx, dbName);
}

The TTL deleter will crash on that invariant if you have multiple collections with TTL indexes that it needs to pass over.

Sprint: Execution Team 2023-04-03
Participants:
Linked BF Score: 0

 Description   

The ResourceConsumption::ScopedMetricsCollector constructor tells the MetricsCollector attached to the current OperationContext to begin recording metrics. The ScopedMetricsCollector destructor then tells the MetricsCollector to stop recording metrics and merges the MetricsCollector's metrics into the global map, keyed on database name.
This normally works fine, except that it doesn't clear the MetricsCollector after merging the metrics. The code assumes the MetricsCollector will destruct after this point.
This means if you happen to encounter another ScopedMetricsCollector before the OperationContext/MetricsCollector destructs, it records more metrics on top of the ones it already collected, and then merges those erroneous stats into a possibly different database in the global metrics map. See Steps to Reproduce.



 Comments   
Comment by Githook User [ 14/Apr/23 ]

Author:

{'name': 'Louis Williams', 'email': 'louis.williams@mongodb.com', 'username': 'louiswilliams'}

Message: SERVER-75277 Clear resource consumption metrics before each TTL index pass

(cherry picked from commit 8e744478f8569c1ead0ffe7b725c89983c886ac6)
Branch: v6.3
https://github.com/mongodb/mongo/commit/ff5e2fc2fde5edf9ccd7c9814734651996f45851

Comment by Haley Connelly [ 11/Apr/23 ]

I wonder if we could keep a scoped "state" struct for the MetricsCollector to ensure next time we add a new field to the scoped state, it gets reset and cleared accordingly.

Comment by Githook User [ 28/Mar/23 ]

Author:

{'name': 'Louis Williams', 'email': 'louis.williams@mongodb.com', 'username': 'louiswilliams'}

Message: SERVER-75277 Clear resource consumption metrics before each TTL index pass
Branch: master
https://github.com/mongodb/mongo/commit/8e744478f8569c1ead0ffe7b725c89983c886ac6

Comment by Eric Milkie [ 24/Mar/23 ]

I also recommend adding the following debug log to merge():

@@ -432,6 +432,11 @@ void ResourceConsumption::merge(OperationContext* opCtx,
     stdx::lock_guard<Mutex> lk(_mutex);
     _dbMetrics[dbName] += newMetrics;
     _cpuTime += newMetrics.cpuNanos;
+     LOGV2_DEBUG(6523908,
+                    1,
+                    "ResourceConsumption::merge",
+                 "dbName"_attr = dbName);
+
 }

Generated at Thu Feb 08 06:29:46 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.