[SERVER-37782] Would like server stats in mongos on scatter-gather queries Created: 26/Oct/18 Updated: 06/Dec/22 Resolved: 10/Dec/19 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Diagnostics, Sharding |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | New Feature | Priority: | Major - P3 |
| Reporter: | Mike Zraly | Assignee: | Backlog - Query Team (Inactive) |
| Resolution: | Duplicate | Votes: | 0 |
| Labels: | SWDI, qopt-team | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||
| Assigned Teams: |
Query
|
||||||||
| Participants: | |||||||||
| Description |
|
While monitoring a sharded cluster it would be helpful to know how seriously we are being affected by scatter-gather queries - how many operations are being targeted to multiple shards, how many require sorting and merging results on a shardsvr, and so forth. While some of this data can be gathered by analyzing queries as they go in, it would be nice to be able to monitor a simple counter or two on grafana, CloudWatch, or Ops Manager to tell if some new code release or some other event has triggered a spate of scatter-gather queries that are negatively affecting performance. |
| Comments |
| Comment by Asya Kamsky [ 09/Dec/19 ] | |
|
This appears to be a duplicate of | |
| Comment by Mike Zraly [ 14/Nov/18 ] | |
|
I am looking for an external monitoring method for sharding overhead that does not require a lot of the monitor. Parsing log files is out of course, since that ignores queries that execute faster than the slow query threshold. Parsing db.currentOp() is better. But when I run db.currentOp() via mongos 3.4.17 I get output for each shard - at least for the primary replica set member. That's showing current operations from the perspective of the individual shards. Even with an nShards entry in the output I would have to work backwards to avoid over-counting, because a query addressing N shards can appear up to N times in the output. Besides, this still requires the monitor to parse a variable-length structure that gets larger precisely when we are most in need of timely reporting. And it only samples the queries being processed by mongos, it won't capture queries that execute and complete in between calls from the monitor. I do think that a few counters, one for queries processed, another for shards targeted, perhaps a third counting queries that require sorting, would provide a quickly processed unambiguous measure of sharding fan-out and overhead. Perhaps these counters could be located in a new subsection of the db.serverStatus() shardingStatistics object, as a peer to catalogCache.
| |
| Comment by Asya Kamsky [ 14/Nov/18 ] | |
|
mzraly there was a project for 4.0 which added slow query logging for mongos which might have what you are looking for. It was Here is an example of logging of query which was scatter gather (my cluster has two shards):
You can see nShards:2 in the logs (as well as in profiler output and currentOp. When the logged query is sent to one shard, it's logged with nShards:1. Would this be what you are looking for? Technically it doesn't say if query uses shard key or not, just how many shards it was sent to. |