[COMPASS-6037] Investigate changes in PM-2664: FTDC metrics for global index builds Created: 18/Aug/22  Updated: 23/Aug/22  Resolved: 23/Aug/22

Status: Closed
Project: Compass
Component/s: None
Affects Version/s: None
Fix Version/s: No version

Type: Investigation Priority: Major - P3
Reporter: Backlog - Core Eng Program Management Team Assignee: Unassigned
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
Documentation Changes: Not Needed

 Description   
Original Downstream Change Summary

the Server documentation will need to be updated as part of these changes.

The serverStatus and $currentOp output are being modified as part of PM-2664 and are documented in pages such as

https://www.mongodb.com/docs/manual/reference/command/serverStatus/#mongodb-serverstatus-serverstatus.shardingStatistics.resharding
https://www.mongodb.com/docs/manual/reference/operator/aggregation/currentOp/#mongodb-data--currentOp.totalOperationTimeElapsed
My team has attempted to summarize in prose the changes to resharding's serverStatus section and the changes to resharding's $currentOp output in the design document. There is also some output from running serverStatus and $currentOp during an active resharding operation included in SERVER-57943.

Description of Linked Ticket

Epic Summary

Summary

Create common metrics classes for global indexes builds and resharding.

Motivation

The resharding project (PM-234) had added FTDC metrics very late in its development relative to when its data replication components became fully functional. This hindered the team a lot during performance investigations because it left questions open about where the time in the resharding operation was being spent. Extending the set of FTDC metrics available during the global index build and resharding processes will aid in all future investigations.

Additionally, the resharding project ended up with error-prone C++ lifetime management because both its FTDC metrics and $currentOp metrics are decorations on the ServiceContext. This design led to multiple bugs where the server crashes around stepdown and step-up. Having the C++ object for $currentOp metrics instead be a member variable on the PrimaryOnlyService::Instance will avoid these issues for the global index builds project and simplify resharding’s design.

Cast of Characters

Documentation

Product Description
Scope Document
Technical Design Document


Generated at Wed Feb 07 22:41:41 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.