[SERVER-56194] Make TTL deletes fair Created: 20/Apr/21 Updated: 16/Nov/23 Resolved: 30/May/22 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | 6.1.0-rc0 |
| Type: | Improvement | Priority: | Major - P3 |
| Reporter: | Louis Williams | Assignee: | Haley Connelly |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | PM-2227-M3 | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||||||||||||||||||||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||||||||||||||||||||||||||||||||||
| Sprint: | Execution Team 2021-08-23, Execution Team 2022-05-02, Execution Team 2022-05-16, Execution Team 2022-05-30, Execution Team 2022-06-13 | ||||||||||||||||||||||||||||||||||||||||||||
| Participants: | |||||||||||||||||||||||||||||||||||||||||||||
| Case: | (copied to CRM) | ||||||||||||||||||||||||||||||||||||||||||||
| Description |
|
The single-threaded TTL Monitor can get "stuck" deleting large ranges of documents on specific collections or databases. This prevents the TTL monitor from performing deletes on higher-priority collections such as config.system.sessions. We should consider imposing configurable per-database and per-collection document deletion limits. In addition, we will need to consider significantly lowering the default TTL pass interval (from 60 seconds) to ensure we make progress. Some hypothetical limits would be:
Also consider prioritizing important collections like config.system.sessions. |
| Comments |
| Comment by Githook User [ 27/May/22 ] |
|
Author: {'name': 'Haley Connelly', 'email': 'haley.connelly@mongodb.com', 'username': 'haleyConnelly'}Message: |
| Comment by Haley Connelly [ 26/Apr/22 ] |
|
After chatting with louis.williams@mongodb.com, for simplicity, we decided to bound the deletes per collection and not enforce fairness per database. Things get complicated when a cache of collection UUIDs turns into a perDB structure that needs to enforce some sort of order / fairness and keep track of which collection the previous pass left off on if a dbLimit was reached. Preventing collection starvation, and accounting for new collections and dropped collections, for each database, would increase complexity. |
| Comment by Haley Connelly [ 27/Aug/21 ] |
|
Currently, there is no easy way to limit the number of documents/ amount of work done while deleting documents via an index scan. While this could be pushed to the query level, we determined fair TTL deletions will likely benefit from the work done to improve truncate for efficient range deletion. Deferring this ticket until TTL deletions can utilize a more efficient truncate for range deletions. |
| Comment by Bruce Lucas (Inactive) [ 20/Apr/21 ] |
|
milkie good idea, and I think it's crucial, otherwise we will be setting an unncessarily low maximum rate of deletions. Consider the case where we have one active TTL collection and TTL has been disabled for a while (e.g. during live migration) and we have considerable catchup to do. Taking the example parameters mentioned above, we would be limited to 200 documents per second during the catchup period, and risk not ever catching up. |
| Comment by Eric Milkie [ 20/Apr/21 ] |
|
In addition to the limits, we might consider "starting over from the top of the list" after processing the last collection, if the TTL thread ever hits the limit for any collection or database, rather than stopping and waiting for the TTL period to expire. |