Details
Description
Description
This change adds a new optional field, "remoteOpWaitMillis" to the profiler / slow query log lines. It tells you how much time the node spent waiting for results from other nodes. By comparing this with the total "durationMillis" you can figure out whether the merger or a shard is to blame for a slow query.
It only appears when the command is an aggregate or a getMore on an aggregation cursor. It only appears on the merging node.
Description of Linked Ticket
Aggregation queries using $mergeCursors communicate with other shards and therefore can be affected by communication issues or issues on other nodes. In the following example the query ran for a very long time but did not spend any time acquiring locks and did not yield. Possibly it was waiting for a reponse from one or more of the other hosts, but there is no indication of that in the log message. It would be helpful for diagnosis if the slow query report indicated how much time was spent waiting for each host.
2017-06-23T06:53:15.357+0200 I COMMAND [conn44] command ... command: aggregate { aggregate: "...", pipeline: [ { $mergeCursors: [ { host: "...", ns: "...", id: 253733414549 }, ...]}, { $group: ... } keysExamined:0 docsExamined:0 numYields:0 nreturned:0 reslen:17527 locks:{ Global: { acquireCount: { r: 6 } }, Database: { acquireCount: { r: 3 } }, Collection: { acquireCount: { r: 2 } } } protocol:op_command 2909221ms
|
Scope of changes
Impact to Other Docs
MVP (Work and Date)
Resources (Scope or Design Docs, Invision, etc.)
Attachments
Issue Links
- documents
-
SERVER-31368 Log time spent waiting for other shards in merge cursors aggregation stage
-
- Closed
-
1.
|
Backport to 4.4 | DOCS-14726 |
|
Closed | Jason Price | 4.4.0 |
2.
|
Backport to 4.2.10 | DOCS-14727 |
|
Closed | Jason Price | 4.2.10 |