[DOCS-15365] Investigate changes in SERVER-56194: Make TTL deletes fair Created: 27/May/22  Updated: 22/Jan/24

Status: Backlog
Project: Documentation
Component/s: manual, Server
Affects Version/s: 6.0.0
Fix Version/s: 6.1.0-rc0

Type: Task Priority: Major - P3
Reporter: Backlog - Core Eng Program Management Team Assignee: Unassigned
Resolution: Unresolved Votes: 0
Labels: backlog, feature, replication, server-docs-bug-bash
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Documented
documents SERVER-56194 Make TTL deletes fair Closed
Participants:
Days since reply: 1 year, 36 weeks, 2 days ago
Epic Link: DOCSP-19446

 Description   
Original Downstream Change Summary

Changes to serverStatus:
The TTL Monitor has a new ServerStatus field ttl.subPasses

A given TTL pass may consist of 0 to many sub-passes. 0 only in the case the replica set is not in a readable state when the pass begins.

The ServerStatus ttl.passes field remains the same in that a single pass deletes all expired documents (unless externally interrupted).

Note: Behavior of the TTLMonitor only changes when server-parameter 'ttlMonitorBatchDeletes' is set to true. Otherwise, the TTL Monitor uses legacy behavior and each TTL pass, provided the replica set is in a readable state, consists of a single sub-pass.

New Behavior:
If the TTL monitor batches deletes (ttlMonitorBatchDeletes server param), then it provides fair TTL deletion as below.

  • The TTL pass consists of zero, one or more subpasses.
  • Each subpass deletes all expired documents on each TTL index in a round-robin fashion.
  • The delete on each TTL index removes up to ttlIndexDeleteTargetDocs or runs up to ttlIndexDeleteTargetTimeMS, whichever happens first. The same TTL index can be queued up to be revisited in the same subpass if there are outstanding deletions.
  • A TTL index is not visited any longer in a subpass once all documents are deleted.
  • The duration of a subpass is limited to ttlMonitorSubPassTargetSecs. If there are outstanding deletions by the end of the subpass, a new subpass starts within the same pass.

Motivation:
Legacy behavior involves single iteration over each TTL index, with an unbounded of documents removed and time spent executing deletes on a TTL index. Thus, the TTL Monitor could spend unbounded time deleting expired documents on one TTL index while starving the others from deletes.

Description of Linked Ticket

The single-threaded TTL Monitor can get "stuck" deleting large ranges of documents on specific collections or databases.

This prevents the TTL monitor from performing deletes on higher-priority collections such as config.system.sessions.

We should consider imposing configurable per-database and per-collection document deletion limits. In addition, we will need to consider significantly lowering the default TTL pass interval (from 60 seconds) to ensure we make progress.

Some hypothetical limits would be:

  • Maximum 10000 documents per database
  • Maximum 1000 documents per collection
  • Run TTL monitor every 5 seconds.

Also consider prioritizing important collections like config.system.sessions.



 Comments   
Comment by Education Bot [ 30/May/22 ]

Fix Version updated for upstream SERVER-56194:
6.1.0-rc0

Generated at Thu Feb 08 08:12:41 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.