Uploaded image for project: 'Documentation'
  1. Documentation
  2. DOCS-15365

Investigate changes in SERVER-56194: Make TTL deletes fair

      Original Downstream Change Summary

      Changes to serverStatus:
      The TTL Monitor has a new ServerStatus field ttl.subPasses

      A given TTL pass may consist of 0 to many sub-passes. 0 only in the case the replica set is not in a readable state when the pass begins.

      The ServerStatus ttl.passes field remains the same in that a single pass deletes all expired documents (unless externally interrupted).

      Note: Behavior of the TTLMonitor only changes when server-parameter 'ttlMonitorBatchDeletes' is set to true. Otherwise, the TTL Monitor uses legacy behavior and each TTL pass, provided the replica set is in a readable state, consists of a single sub-pass.

      New Behavior:
      If the TTL monitor batches deletes (ttlMonitorBatchDeletes server param), then it provides fair TTL deletion as below.

      • The TTL pass consists of zero, one or more subpasses.
      • Each subpass deletes all expired documents on each TTL index in a round-robin fashion.
      • The delete on each TTL index removes up to ttlIndexDeleteTargetDocs or runs up to ttlIndexDeleteTargetTimeMS, whichever happens first. The same TTL index can be queued up to be revisited in the same subpass if there are outstanding deletions.
      • A TTL index is not visited any longer in a subpass once all documents are deleted.
      • The duration of a subpass is limited to ttlMonitorSubPassTargetSecs. If there are outstanding deletions by the end of the subpass, a new subpass starts within the same pass.

      Motivation:
      Legacy behavior involves single iteration over each TTL index, with an unbounded of documents removed and time spent executing deletes on a TTL index. Thus, the TTL Monitor could spend unbounded time deleting expired documents on one TTL index while starving the others from deletes.

      Description of Linked Ticket

      The single-threaded TTL Monitor can get "stuck" deleting large ranges of documents on specific collections or databases.

      This prevents the TTL monitor from performing deletes on higher-priority collections such as config.system.sessions.

      We should consider imposing configurable per-database and per-collection document deletion limits. In addition, we will need to consider significantly lowering the default TTL pass interval (from 60 seconds) to ensure we make progress.

      Some hypothetical limits would be:

      • Maximum 10000 documents per database
      • Maximum 1000 documents per collection
      • Run TTL monitor every 5 seconds.

      Also consider prioritizing important collections like config.system.sessions.

            Assignee:
            Unassigned Unassigned
            Reporter:
            backlog-server-pm Backlog - Core Eng Program Management Team
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated:
              1 year, 47 weeks, 2 days ago