[SERVER-37782] Would like server stats in mongos on scatter-gather queries Created: 26/Oct/18  Updated: 06/Dec/22  Resolved: 10/Dec/19

Status: Closed
Project: Core Server
Component/s: Diagnostics, Sharding
Affects Version/s: None
Fix Version/s: None

Type: New Feature Priority: Major - P3
Reporter: Mike Zraly Assignee: Backlog - Query Team (Inactive)
Resolution: Duplicate Votes: 0
Labels: SWDI, qopt-team
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Duplicate
duplicates SERVER-43617 Add metrics on the mongos to indicate... Closed
Assigned Teams:
Query
Participants:

 Description   

While monitoring a sharded cluster it would be helpful to know how seriously we are being affected by scatter-gather queries - how many operations are being targeted to multiple shards, how many require sorting and merging results on a shardsvr, and so forth.

While some of this data can be gathered by analyzing queries as they go in, it would be nice to be able to monitor a simple counter or two on grafana, CloudWatch, or Ops Manager to tell if some new code release or some other event has triggered a spate of scatter-gather queries that are negatively affecting performance.



 Comments   
Comment by Asya Kamsky [ 09/Dec/19 ]

This appears to be a duplicate of SERVER-43617

Comment by Mike Zraly [ 14/Nov/18 ]

I am looking for an external monitoring method for sharding overhead that does not require a lot of the monitor.

Parsing log files is out of course, since that ignores queries that execute faster than the slow query threshold.

Parsing db.currentOp() is better.  But when I run db.currentOp() via mongos 3.4.17 I get output for each shard - at least for the primary replica set member.  That's showing current operations from the perspective of the individual shards.  Even with an nShards entry in the output I would have to work backwards to avoid over-counting, because a query addressing N shards can appear up to N times in the output.  Besides, this still requires the monitor to parse a variable-length structure that gets larger precisely when we are most in need of timely reporting.  And it only samples the queries being processed by mongos, it won't capture queries that execute and complete in between calls from the monitor.

I do think that a few counters, one for queries processed, another for shards targeted, perhaps a third counting queries that require sorting, would provide a quickly processed unambiguous measure of sharding fan-out and overhead.  Perhaps these counters could be located in a new subsection of the db.serverStatus() shardingStatistics object, as a peer to catalogCache.

 

 

 

Comment by Asya Kamsky [ 14/Nov/18 ]

mzraly there was a project for 4.0 which added slow query logging for mongos which might have what you are looking for. It was SERVER-14900.

Here is an example of logging of query which was scatter gather (my cluster has two shards):

2018-11-14T12:32:05.723-0500 I COMMAND  [conn12] command logtest.coll appName: "MongoDB Shell" command: find { find: "coll", filter: { _id: { $in: [ 3.0, 13.0, 23.0, 33.0 ] } }, $clusterTime: { clusterTime: Timestamp(1542216711, 1), signature: { hash: BinData(0, 0000000000000000000000000000000000000000), keyId: 0 } }, $db: "logtest" } nShards:2 cursorExhausted:1 numYields:0 nreturned:4 reslen:354 protocol:op_msg 2ms

You can see nShards:2 in the logs (as well as in profiler output and currentOp. When the logged query is sent to one shard, it's logged with nShards:1.

Would this be what you are looking for? Technically it doesn't say if query uses shard key or not, just how many shards it was sent to.

Generated at Thu Feb 08 04:47:00 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.