[SERVER-27830] TTL Monitor creates performance degradation when there are > 100k indexes Created: 27/Jan/17  Updated: 06/Dec/22  Resolved: 30/Jan/17

Status: Closed
Project: Core Server
Component/s: Performance, TTL
Affects Version/s: 3.2.11
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Akira Kurogane Assignee: Backlog - Storage Execution Team
Resolution: Duplicate Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: PNG File 5min_ftdc_timeseries.1915_sample.png    
Issue Links:
Duplicate
duplicates SERVER-24631 TTL Monitor performance degradation o... Closed
Related
related to SERVER-24631 TTL Monitor performance degradation o... Closed
Assigned Teams:
Storage Execution
Operating System: ALL
Steps To Reproduce:

Create a DB with > 100k namespaces and indexes. Place write of load of thousands of writes per second, scattered over all or at least a large fraction of the collections randomly.

Observe the impact that affects the mongod process directly after the ttl.passes metric increments (passes gets incremented at the start of the TTL pass, not the end, so the work happens after rather than before it).

Participants:
Case:

 Description   

When a mongod instance has a very large number of namespaces an impact on the whole performances of the mongod can be observed when the TTL monitor is iterating them. Of course there will be impact when there are TTL-expired documents to delete, but this issue appears even when none of the indexes are TTL ones. The size of the impact will vary according to capacity of the server and the other load happening concurrently of course, but in one case with ~190k indexes delays of ~4 seconds were observed.

It will take on the order of 100k indexes, plus having a concurrent high load, for the impact to become visible. But in that situation when the TTLMonitor thread runs it iterates through every index in the 'dbHolder' helper object to see if they have the expireAfterSeconds property, and then only if that is present will the TTL scan/delete be performed.

In the case where there are few TTL indexes but many normal indexes this is unnecessary work. Can the 'dbHolder' class be improved to afford quick iteration of only the TTL indexes?

Current workaround: A DBA can set the TTLMonitorEnable parameter to false if they are not using TTL at all, but it would be better if even this was not needed.



 Comments   
Comment by Akira Kurogane [ 27/Jan/17 ]

Ah, my mistake: the relevant code in 3.4 is not the same as 3.2, contrary to what I was thinking when I opened this. I see SERVER-24631 was applied to 3.4, and is a solution to the issue.

3.2 is the version I've observed this issue in. (I would assume it affects 3.0 too).

Generated at Thu Feb 08 04:16:22 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.