[SERVER-69132] Additional metrics on chunk balancing performance Created: 25/Aug/22  Updated: 12/Dec/23

Status: Backlog
Project: Core Server
Component/s: Sharding
Affects Version/s: None
Fix Version/s: None

Type: Improvement Priority: Minor - P4
Reporter: Linda Qin Assignee: Backlog - Catalog and Routing
Resolution: Unresolved Votes: 0
Labels: 12/12, diagnostics
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
Assigned Teams:
Catalog and Routing
Participants:
Story Points: 2

 Description   

While diagnosing balancing performance, sometime we have to look into the config.actionlogs for the balancing round information, or config.changelog/mongod logs for the time spent on each move Chunk step. It would be nice if we can also have some metrics around these in FTDC. For example:

  • On CSRS primary:
    • Balancing around currently running (something like "wt transaction transaction checkpoint currently running")
    • Number of candidate chunks found 
    • Number of chunks moved
    • Number of chunks aborted
    • Number of chunk migrations in progress.
    • These may help us to understand how long the time is spent on moving chunks, and the other steps like finding chunks to move.
  • On shard primary:
    • moveChunk currently running
    • step X currently running.
    • These may help us to understand the slowest step(s) for the chunk migration.


 Comments   
Comment by Marcos José Grillo Ramirez [ 31/Mar/23 ]

pierlauro.sciarelli@mongodb.com, SERVER-72146 is more about Atlas feedback, we were planning to leave the t2 requirements for another ticket, looks like it could be this one.

Generated at Thu Feb 08 06:12:39 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.