Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-99896

Consider making opLatency metrics more comprehensive

    • Type: Icon: Improvement Improvement
    • Resolution: Unresolved
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: None
    • Component/s: None
    • None
    • Networking & Observability
    • N&O Prioritized List
    • None
    • 3
    • None
    • None
    • None
    • None
    • None
    • None

      The average latency metric timer (aka opLatency) currently starts too late and ends too early.

      For example, on v6.0, starting the timers late in SEP::_initiateCommand suggests that the latency of the command path before this call and after we receive a network request is negligible. We see that this is not true in HELP-68909 (v6.0), where there is lock contention in areas (vivify mutex during SEP::_initiateCommand, ServiceContext mutex during opCtx creation) before this call. There may be more areas that can cause meaningful latency increase in other parts of the code outside of our timers.

      I think it's worth investigating how these metrics can be extended to include more of the command processing path. This issue is loosely outlined in the "Addressing Server Networking Problems" document (observability section).

      Note: the ServiceContext contention doesn't happen anymore (v8.0+) because of optimizations, though the vivify contention can still happen on v8.0+.

            Assignee:
            Unassigned Unassigned
            Reporter:
            alex.li@mongodb.com Alex Li
            Votes:
            2 Vote for this issue
            Watchers:
            10 Start watching this issue

              Created:
              Updated: