[SERVER-56194] Make TTL deletes fair Created: 20/Apr/21  Updated: 16/Nov/23  Resolved: 30/May/22

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: 6.1.0-rc0

Type: Improvement Priority: Major - P3
Reporter: Louis Williams Assignee: Haley Connelly
Resolution: Fixed Votes: 0
Labels: PM-2227-M3
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
depends on SERVER-63040 Batch TTL deletions Closed
depends on SERVER-66210 BatchedDeleteStage doesn't indicate w... Closed
Documented
is documented by DOCS-15365 Investigate changes in SERVER-56194: ... Backlog
Related
related to SERVER-71290 configure ttlIndexDeleteTargetTimeMS ... Closed
related to SERVER-66898 Update permalinks in TTLMonitor Arch ... Closed
related to SERVER-56195 Make TTL monitor multi-threaded Backlog
is related to SERVER-66655 Update Storage Execution Arch Guide w... Closed
is related to SERVER-66537 Combine BatchedDeleteStage<Batch/Pass... Closed
Backwards Compatibility: Fully Compatible
Sprint: Execution Team 2021-08-23, Execution Team 2022-05-02, Execution Team 2022-05-16, Execution Team 2022-05-30, Execution Team 2022-06-13
Participants:
Case:

 Description   

The single-threaded TTL Monitor can get "stuck" deleting large ranges of documents on specific collections or databases.

This prevents the TTL monitor from performing deletes on higher-priority collections such as config.system.sessions.

We should consider imposing configurable per-database and per-collection document deletion limits. In addition, we will need to consider significantly lowering the default TTL pass interval (from 60 seconds) to ensure we make progress.

Some hypothetical limits would be:

  • Maximum 10000 documents per database
  • Maximum 1000 documents per collection
  • Run TTL monitor every 5 seconds.

Also consider prioritizing important collections like config.system.sessions.



 Comments   
Comment by Githook User [ 27/May/22 ]

Author:

{'name': 'Haley Connelly', 'email': 'haley.connelly@mongodb.com', 'username': 'haleyConnelly'}

Message: SERVER-56194 Make TTL deletes fair
Branch: master
https://github.com/mongodb/mongo/commit/1e84419513046d4f5755f06bfeebaf8ec442583e

Comment by Haley Connelly [ 26/Apr/22 ]

After chatting with louis.williams@mongodb.com, for simplicity, we decided to bound the deletes per collection and not enforce fairness per database.

Things get complicated when a cache of collection UUIDs turns into a perDB structure that needs to enforce some sort of order / fairness and keep track of which collection the previous pass left off on if a dbLimit was reached. Preventing collection starvation, and accounting for new collections and dropped collections, for each database, would increase complexity. 

Comment by Haley Connelly [ 27/Aug/21 ]

Currently, there is no easy way to limit the number of documents/ amount of work done while deleting documents via an index scan. While this could be pushed to the query level, we determined fair TTL deletions will likely benefit from the work done to improve truncate for efficient range deletion.

Deferring this ticket until TTL deletions can utilize a more efficient truncate for range deletions.

Comment by Bruce Lucas (Inactive) [ 20/Apr/21 ]

milkie good idea, and I think it's crucial, otherwise we will be setting an unncessarily low maximum rate of deletions. Consider the case where we have one active TTL collection and TTL has been disabled for a while (e.g. during live migration) and we have considerable catchup to do. Taking the example parameters mentioned above, we would be limited to 200 documents per second during the catchup period, and risk not ever catching up.

Comment by Eric Milkie [ 20/Apr/21 ]

In addition to the limits, we might consider "starting over from the top of the list" after processing the last collection, if the TTL thread ever hits the limit for any collection or database, rather than stopping and waiting for the TTL period to expire.

Generated at Thu Feb 08 05:38:37 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.