Uploaded image for project: 'Documentation'
  1. Documentation
  2. DOCS-13613

Investigate changes in SERVER-31368: Log time spent waiting for other shards in merge cursors aggregation stage

    XMLWordPrintable

    Details

    • Type: Task
    • Status: Closed
    • Priority: Major - P3
    • Resolution: Done
    • Affects Version/s: None
    • Fix Version/s: 4.4.0-rc5, 4.7.0, 4.2.10
    • Component/s: manual, Server
    • Labels:
      None
    • Last comment by Customer:
      true
    • Story Points:
      3
    • Sprint:
      ServerDocs2021: Aug3 - Aug10

      Description

      Description

      Downstream Change Summary

      This change adds a new optional field, "remoteOpWaitMillis" to the profiler / slow query log lines. It tells you how much time the node spent waiting for results from other nodes. By comparing this with the total "durationMillis" you can figure out whether the merger or a shard is to blame for a slow query.

      It only appears when the command is an aggregate or a getMore on an aggregation cursor. It only appears on the merging node.

      Description of Linked Ticket

      Aggregation queries using $mergeCursors communicate with other shards and therefore can be affected by communication issues or issues on other nodes. In the following example the query ran for a very long time but did not spend any time acquiring locks and did not yield. Possibly it was waiting for a reponse from one or more of the other hosts, but there is no indication of that in the log message. It would be helpful for diagnosis if the slow query report indicated how much time was spent waiting for each host.

      2017-06-23T06:53:15.357+0200 I COMMAND  [conn44] command ... command: aggregate { aggregate: "...", pipeline: [ { $mergeCursors: [ { host: "...", ns: "...", id: 253733414549 }, ...]}, { $group: ... } keysExamined:0 docsExamined:0 numYields:0 nreturned:0 reslen:17527 locks:{ Global: { acquireCount: { r: 6 } }, Database: { acquireCount: { r: 3 } }, Collection: { acquireCount: { r: 2 } } } protocol:op_command 2909221ms
      

      Scope of changes

      Impact to Other Docs

      MVP (Work and Date)

      Resources (Scope or Design Docs, Invision, etc.)

        Attachments

          Issue Links

            Activity

              People

              Assignee:
              jason.price Jason Price
              Reporter:
              backlog-server-pm Backlog - Core Eng Program Management Team
              Participants:
              Last commenter:
              Jason Price Jason Price
              Docs Reviewer:
              Joseph Dougherty Joseph Dougherty
              External Reviewer:
              David Percy
              Votes:
              1 Vote for this issue
              Watchers:
              5 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved:
                Days since reply:
                10 weeks, 6 days ago
                Date of 1st Reply: