[SERVER-58408] Improve diagnostics for networking reactors Created: 09/Jul/21  Updated: 06/Dec/22

Status: Backlog
Project: Core Server
Component/s: Internal Code
Affects Version/s: None
Fix Version/s: None

Type: Improvement Priority: Major - P3
Reporter: Amirsaman Memaripour Assignee: Backlog - Service Architecture
Resolution: Unresolved Votes: 0
Labels: sa-remove-fv-backlog-22
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
Related
related to SERVER-59858 Add observability for tasks scheduled... Closed
Assigned Teams:
Service Arch
Participants:
Linked BF Score: 37
Story Points: 4

 Description   

The transport layer utilizes ASIO reactors to perform networking operations. Such operations include but are not limited to DNS resolution and connection establishment.

Below is an example of using the reactors to perform asynchronous DNS resolution, where _resolver, which is an instance of resolver::async_resolve, schedules a request to resolve peer.host() on a networking reactor:

Future<EndpointVector> _asyncResolve(const HostAndPort& peer, Flags flags, bool enableIPv6) {
    auto port = std::to_string(peer.port());
    Future<Results> ret;
    if (enableIPv6) {
        ret = _resolver.async_resolve(peer.host(), port, flags, UseFuture{});
    } else {
        ret = _resolver.async_resolve(asio::ip::tcp::v4(), peer.host(), port, flags, UseFuture{});
    }
    return std::move(ret).onError([this, peer](Status status) {
        return _checkResults(status, peer);
    }).then([this, peer](Results results) {
        return _makeFuture(results, peer);
    });
}

The goal is to collect more information on tasks ran by the reactor threads. At a minimum, we'd want to collect the duration of tasks ran by the reactor. It'd also be useful to measure the cost of running the continuations chained to tasks scheduled on reactor threads (e.g., see here).

Reporting these metrics in serverStatus (and FTDC) can help with diagnosing slow DNS resolutions and large delays in connection establishment.

 

Acceptance criteria: Report a histogram of each stage in connection establishment and success and failure statistics for each stage.


Generated at Thu Feb 08 05:44:26 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.