Loading...

XML

Word

Printable

JSON

Type: Task
Resolution: Fixed
Priority: Major - P3
Fix Version/s: 8.3.0-rc0, 8.2.2, 8.0.18
Affects Version/s: None
Component/s: None
Labels:
None

Assigned Teams:

Query Execution
Backwards Compatibility:
Fully Compatible
Backport Requested:

v8.2, v8.0
Linked BF Score:
200
CAR Domain/s:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

Operations are expected to check for interrupt periodically. This task was to track when operations have a long duration between two interrupt checks, and to provide metrics about such operations in server status.

Summary of diagnostics added

Under this ticket, a number of diagnostics were added:

Every operation now tracks how many interrupt checks it does.
- At the operation level, numInterruptChecks is reported in slow query logs and $currentOp (~~SERVER-104009~~)
- At the query shape level, we intend to report interrupt checks per second and possibly other information as well (~~SERVER-107647~~)
- At the process level we track the total number of interrupt checks across all operations. This can be used for very coarse grain analysis.
A small fraction (< 1%) of operations will track the maximum time between interrupt checks and the accumulated "overdue" time.
- We attempted to do this for all operations. It incurred a small but measurable performance penalty, so we only do it for a small sample.
- The sampling ratio is controlled by the setParameter overdueInterruptCheckSamplingRate. Code here.
- At the operation level, sampled operations will report:
  - The number of overdue interrupt checks. (In addition to the total number of interrupt checks)
  - The maximum time between two interrupt checks
  - The accumulated overdue time between interrupt checks.
- At the process level, we report:
  - The number of sampled operations
  - The number of sampled operations which had at least one overdue interrupt check
  - The total number of interrupt checks by sampled operations
  - The number of overdue interrupt checks by sampled operations
  - The accumulated "time overdue" by sampled operations
  - The maximum time between any two interrupt checks, across all sampled operations
  - From these values we can also derive things like:
    - Lower bound for average time between interrupt checks
      - Lower bound because we'd assume that non-overdue interrupt checks come exactly on time
    - Average time between overdue interrupt checks

Other considerations

Interruptible waits
- Operations are able to do a wait on a condition variable which completes if the condition becomes true OR if the operation is killed.
  - We do not count time spent in an interruptible wait as time between interrupt checks. We should not get false positives due to interruptible waits.
Suboperations
- Operations can "spawn" other operations that run under the same OperationContext (DBDirectClient, bulk inserts, $out?)
- For simplicity, only the top-level operation reports statistics about overdue interrupt checks
- If an operation spawns a sub-operation which is delinquent, the parent operation will be considered delinquent and include all of the metrics about the overdue interrupt checks.
Clock accuracy
- Sampled operations track time between interrupt checks using TickSource.

is depended on by

COMPASS-9770 Investigate changes in SERVER-104007: Track delinquent interrupt checks

Closed

is related to

SERVER-107293 Revert SERVER-104007

Closed

SERVER-104008 Record operation delinquency ticket information in QueryStats

Closed

SERVER-104009 Record delinquent checkForInterrupt() information in CurOp and slow logging

Closed

SERVER-104010 Track delinquent ticket releases

Closed

related to

SERVER-106769 A low overhead timer for x86 and AArch64

Investigating

SERVER-107407 Complete TODO listed in SERVER-104007

Open

SERVER-105801 Consider moving OperationContext overdue interrupt information to Interruptible

Backlog

SERVER-107647 Record operation delinquency checkForinterrupt information in QueryStats

Closed

(4 related to)

Assignee:: Ian Boros
Reporter:: Ian Boros
Participants:: Githook User, Ian Boros
Votes:: 0 Vote for this issue
Watchers:: 6 Start watching this issue

Created:: Apr 18 2025 02:17:01 PM UTC
Updated:: Dec 19 2025 11:26:20 AM UTC
Resolved:: Aug 28 2025 06:56:39 PM UTC

Details

Description

Attachments

Issue Links

Activity

People

Dates