[SERVER-59700] Add programming support for tracepoints Created: 31/Aug/21  Updated: 29/Oct/23  Resolved: 07/Jan/22

Status: Closed
Project: Core Server
Component/s: Internal Code
Affects Version/s: None
Fix Version/s: 5.3.0

Type: New Feature Priority: Major - P3
Reporter: Amirsaman Memaripour Assignee: Amirsaman Memaripour
Resolution: Fixed Votes: 0
Labels: servicearch-q4-2021, servicearch-wfbf-day
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
is related to SERVER-59599 Investigate large delays in connectio... Closed
is related to SERVER-71736 Move tracing support lint to clang-tidy Closed
Backwards Compatibility: Fully Compatible
Sprint: Service Arch 2021-11-15, Service Arch 2021-11-22, Service Arch 2021-12-13, Service Arch 2022-1-10, Service Arch 2022-1-24
Participants:
Story Points: 2

 Description   

The aim is to provide a library that allows adding tracepoints to mongo servers. The primary intent for these tracepoints is to collect timing information (e.g., using ServiceContext::getFastClockSource) for an ongoing operation. The tracepoints must compile to a noop by default, unless requested otherwise at compile time.

The following is a possible/suggested API design for the tracepoints.

void someFunction() {
    // TracePoint(std::string name, Duration<T> loggingThreshold);
    auto tp = makeTracepoint("MyTracepoint", Milliseconds(100));
    runOperationA();
    tp.checkpoint("Checkpoint 1");
    runOperationB();
}

When enabled, the tracepoint must collect the timestamp at construction and at each checkpoint, and log the timestamps when the lifetime of the tracepoint object exceeds a threshold.

For example, if running runOperationA and runOperationB takes more than 100 milliseconds, we expect to see a line similar to the following in mongo logs:

{ "Tracepoint exceeded its expected lifetime" , "attr": {"name": "MyTracepoint", "constructedAt": "timestamp", "destroyedAt": "timestamp", "Checkpoint 1": "timestamp", "expectedLifetimeMS": 100, "observedLifetimeMS": 101} }

Acceptance criteria:

  • For the first pass, someone will do a POC timeboxed at 2 story points to answer some of the open questions


 Comments   
Comment by Githook User [ 07/Jan/22 ]

Author:

{'name': 'Amirsaman Memaripour', 'email': 'amirsaman.memaripour@mongodb.com', 'username': 'samanca'}

Message: SERVER-59700 Add programming support for tracepoints
Branch: master
https://github.com/mongodb/mongo/commit/3f951c777d1002909c6e5cef1a58556e040ebac8

Comment by Matthew Saltz (Inactive) [ 20/Sep/21 ]

Questions:

  • Do we want this to compile out for sure?
  • How to enable/disable?
    • At compile time somehow - Maybe only enable in debug mode?
  • Do we want an Evergreen variant with these enabled?
  • Able to enable globally or per tracepoint?
    • We think just globally
  • Have to be thread safe?
  • How does this compare to Sam's Skunkworks?

API Ideas:

  • Version that takes a lambda
  • Another version more similar to OpenTelemetry
Comment by Bruce Lucas (Inactive) [ 02/Sep/21 ]

I think ideally the logged message would follow the established convention for reporting the duration by using an attr.durationMillis field. This would allow using tools that expect such a field, which we have found very useful in practice.

Generated at Thu Feb 08 05:47:54 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.