Uploaded image for project: 'Drivers'
  1. Drivers
  2. DRIVERS-1688

Investigate changes in PM-1858: Add and expose metrics to make shard key selection easier

    XMLWordPrintableJSON

Details

    • Icon: Epic Epic
    • Resolution: Won't Do
    • Icon: Major - P3 Major - P3
    • None
    • None
    • None
    • Needed

    Description

      Downstream Change Summary

      Potentially a new command option for DBAs will be created

      Description of Linked Ticket

      Epic Summary

      Summary

      Decide on an initial set of metrics that can be exposed from the server to evaluate the efficiency of a shard key or data distribution and, for the metrics that are not already being collected and exposed, collect and expose them.

      We should also decide how the metrics should be exposed to best be consumed by a shard key recommender service (e.g. via serverStatus or some other mechanism).

      We want to expose more information to help users with dedicated clusters pick a good shard key (like recent query access patterns) and evaluate their shard key (what percent of queries are using scatter/gather, or updateOne's are being converted to transactions, because they don't include the shard key?)

      To come up with the set of metrics, it may be useful to study:

      • The strategies currently used by CEs and TSEs to recommend shard keys for MongoDB.
      • The information currently used by customers to select shard keys. 
      • The inputs and cost models used by d4, an open source research project that automatically recommends shard keys for MongoDB workloads. In particular, their cost model may be interesting.
      • The inputs and cost models used by other systems that automatically choose a partition key. 

       

      Motivation

      • For serverless, it will be critical that machine resources are used efficiently while providing good performance for tenants. At a minimum, this will require selecting a good shard key for a tenant which is either too active to be supported by a single shard or has too much data to be stored on a single shard. The metrics may also be useful input to the balancer in deciding the optimal way to distribute multiple tenants' data across a cluster.
      • For on-prem & Atlas customers who have a hard time judging what a good shard key will be and are afraid of sharding.
      • Customers who want to confirm that they've picked a good shard key.
      • Customers who want to reshard and need help picking a new shard key. 

      Risks

      • The extra information will not be helpful or result in worse shard keys
      • We will sometimes recommend poor shard keys
      • We lose out on support $$

      Cast of Characters

      • Product Owner: Garaudy Etienne
      • Project Lead: 
      • Program Manager: Ratika Gandhi
      • Drivers Contact: 

      Documentation

      Scope Document
      Technical Design Document

      Attachments

        Activity

          People

            Unassigned Unassigned
            backlog-server-pm Backlog - Core Eng Program Management Team
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: