[DRIVERS-1688] Investigate changes in PM-1858: Add and expose metrics to make shard key selection easier Created: 20/Apr/21  Updated: 01/Dec/22  Resolved: 10/May/21

Status: Closed
Project: Drivers
Component/s: None
Fix Version/s: None

Type: Epic Priority: Major - P3
Reporter: Backlog - Core Eng Program Management Team Assignee: Unassigned
Resolution: Won't Do Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
Driver Changes: Needed
Server Compat: 7.0

 Description   
Downstream Change Summary

Potentially a new command option for DBAs will be created

Description of Linked Ticket

Epic Summary

Summary

Decide on an initial set of metrics that can be exposed from the server to evaluate the efficiency of a shard key or data distribution and, for the metrics that are not already being collected and exposed, collect and expose them.

We should also decide how the metrics should be exposed to best be consumed by a shard key recommender service (e.g. via serverStatus or some other mechanism).

We want to expose more information to help users with dedicated clusters pick a good shard key (like recent query access patterns) and evaluate their shard key (what percent of queries are using scatter/gather, or updateOne's are being converted to transactions, because they don't include the shard key?)

To come up with the set of metrics, it may be useful to study:

  • The strategies currently used by CEs and TSEs to recommend shard keys for MongoDB.
  • The information currently used by customers to select shard keys. 
  • The inputs and cost models used by d4, an open source research project that automatically recommends shard keys for MongoDB workloads. In particular, their cost model may be interesting.
  • The inputs and cost models used by other systems that automatically choose a partition key. 

 

Motivation

  • For serverless, it will be critical that machine resources are used efficiently while providing good performance for tenants. At a minimum, this will require selecting a good shard key for a tenant which is either too active to be supported by a single shard or has too much data to be stored on a single shard. The metrics may also be useful input to the balancer in deciding the optimal way to distribute multiple tenants' data across a cluster.
  • For on-prem & Atlas customers who have a hard time judging what a good shard key will be and are afraid of sharding.
  • Customers who want to confirm that they've picked a good shard key.
  • Customers who want to reshard and need help picking a new shard key. 

Risks

  • The extra information will not be helpful or result in worse shard keys
  • We will sometimes recommend poor shard keys
  • We lose out on support $$

Cast of Characters

  • Product Owner: Garaudy Etienne
  • Project Lead: 
  • Program Manager: Ratika Gandhi
  • Drivers Contact: 

Documentation

Scope Document
Technical Design Document



 Comments   
Comment by Esha Bhargava [ 10/May/21 ]

No driver changes needed.

Generated at Thu Feb 08 08:23:51 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.