Uploaded image for project: 'MongoDB Database Tools'
  1. MongoDB Database Tools
  2. TOOLS-3181

Investigate changes in PM-2664: FTDC metrics for global index builds

    • Type: Icon: Investigation Investigation
    • Resolution: Won't Do
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: None
    • Component/s: None
    • Labels:
      None

      Original Downstream Change Summary

      the Server documentation will need to be updated as part of these changes.

      The serverStatus and $currentOp output are being modified as part of PM-2664 and are documented in pages such as

      https://www.mongodb.com/docs/manual/reference/command/serverStatus/#mongodb-serverstatus-serverstatus.shardingStatistics.resharding
      https://www.mongodb.com/docs/manual/reference/operator/aggregation/currentOp/#mongodb-data--currentOp.totalOperationTimeElapsed
      My team has attempted to summarize in prose the changes to resharding's serverStatus section and the changes to resharding's $currentOp output in the design document. There is also some output from running serverStatus and $currentOp during an active resharding operation included in SERVER-57943.

      Description of Linked Ticket

      Epic Summary

      Summary

      Create common metrics classes for global indexes builds and resharding.

      Motivation

      The resharding project (PM-234) had added FTDC metrics very late in its development relative to when its data replication components became fully functional. This hindered the team a lot during performance investigations because it left questions open about where the time in the resharding operation was being spent. Extending the set of FTDC metrics available during the global index build and resharding processes will aid in all future investigations.

      Additionally, the resharding project ended up with error-prone C++ lifetime management because both its FTDC metrics and $currentOp metrics are decorations on the ServiceContext. This design led to multiple bugs where the server crashes around stepdown and step-up. Having the C++ object for $currentOp metrics instead be a member variable on the PrimaryOnlyService::Instance will avoid these issues for the global index builds project and simplify resharding’s design.

      Cast of Characters

      Documentation

      Product Description
      Scope Document
      Technical Design Document

            Assignee:
            dave.rolsky@mongodb.com Dave Rolsky
            Reporter:
            backlog-server-pm Backlog - Core Eng Program Management Team
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated:
              Resolved: