[DOCS-11282] Add sharding metadata refresh metrics section to serverStatus (SERVER-28670) Created: 05/Feb/18  Updated: 29/Oct/23  Resolved: 06/Jun/18

Status: Closed
Project: Documentation
Component/s: manual, Server
Affects Version/s: None
Fix Version/s: 3.7.2, 3.6.4, 3.4.15

Type: Task Priority: Major - P3
Reporter: Kay Kim (Inactive) Assignee: Kay Kim (Inactive)
Resolution: Fixed Votes: 0
Labels: add
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

https://docs.mongodb.com/manual/reference/command/serverStatus/


Issue Links:
Documented
documents SERVER-28670 Add sharding metadata refresh metrics... Closed
Participants:
Days since reply: 5 years, 36 weeks ago
Epic Link: DOCS: 4.0 Server
Story Points: 0.4

 Description   

Documentation Request Summary:

This change introduce a new shardingStatistics server status section, with the following contents:

  • Catalog cache statistics (under a catalogCache subsection) - described here
  • General shard statistics - described here

Scope of changes:

Impact to other docs outside of this product:

MVP:

Resources:

Scope of changes:

Impact to other docs outside of this product:

MVP:

Resources:

Engineering Ticket Description:

Sharding metadata refreshes have occasionally been demonstrated to cause throughput stalls. Currently, there is no visibility into when these are happening other than looking at the server log and trying to match them with FTDC data.

In order to improve diagnosability we should introduce metadata refresh metrics to serverStatus so they can also be recorded in FTDC. All the proposed metrics should be under a section called shardingStatistics and will behave like this:

  • shardingStatistics
    • countStaleConfigErrors - Counts how many times threads hit stale config exception (which is what triggers metadata refreshes
    • countDonorMoveChunkStarted - Cumulative, always-increasing counter of how many chunks did this node start donating (whether they succeeded or not)
    • totalDonorMoveChunkTimeMillis - Cumulative, always-increasing counter of how much time the entire move chunk operation took (excluding range deletion)
    • totalDonorChunkCloneTimeMillis - Cumulative, always-increasing counter of how much time the clone phase took on the donor node, before it was appropriate to enter the critical section
    • totalCriticalSectionCommitTimeMillis - Cumulative, always-increasing counter of how much time the critical section's commit phase took (this is the period of time when all operations on the collection are blocked, not just the reads (from 3.6 onward))
    • totalCriticalSectionTimeMillis - Cumulative, always-increasing counter of how much time the entire critical section took. It includes the time the recipient took to fetch the latest modifications from the donor and persist them plus the critical section commit time. The value of totalCriticalSectionTimeMillis - totalCriticalSectionCommitTimeMillis gives the duration of the catch-up phase of the critical section (where the last mods are transferred from the donor to the recipient).
  • shardingStatistics.catalogCache
    • numDatabaseEntries - Tracks how many database entries in total are in currently the catalog cache
    • numCollectionEntries - Tracks how many collection entries (in total across all databases) are currently in the catalog cache
    • countStaleConfigErrors - Counts how many times threads hit stale config exception (which is what triggers metadata refreshes)
    • totalRefreshWaitTimeMicros - Cumulative, always-increasing counter of how much time threads waiting for refresh combined
    • numActiveIncrementalRefreshes - Tracks how many incremental refreshes are waiting to complete currently
    • countIncrementalRefreshesStarted - Cumulative, always-increasing counter of how many incremental refreshes have been kicked off
    • numActiveFullRefreshes - Tracks how many full refreshes are waiting to complete currently
    • countFullRefreshesStarted - Cumulative, always-increasing counter of how many full refreshes have been kicked off
    • countFailedRefreshes - Cumulative, always-increasing counter of how many full or incremental refreshes failed for whatever reason


 Comments   
Comment by Githook User [ 06/Jun/18 ]

Author:

{'username': 'kay-kim', 'name': 'kay', 'email': 'kay.kim@10gen.com'}

Message: DOCS-11282: shardingStatus
Branch: v3.4
https://github.com/mongodb/docs/commit/c67c259793e5f8adaa2fc38f29bdf22cf1e0325e

Comment by Githook User [ 06/Jun/18 ]

Author:

{'username': 'kay-kim', 'name': 'kay', 'email': 'kay.kim@10gen.com'}

Message: DOCS-11282: shardingStatus
Branch: v3.6
https://github.com/mongodb/docs/commit/6b511cb548a709048df098654ca67d7e169e6e62

Comment by Githook User [ 06/Jun/18 ]

Author:

{'username': 'kay-kim', 'name': 'kay', 'email': 'kay.kim@10gen.com'}

Message: DOCS-11282: shardingStatus
Branch: master
https://github.com/mongodb/docs/commit/97ad9dc92733a87297ab887b08adb259a6e0e470

Generated at Thu Feb 08 08:02:29 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.