Uploaded image for project: 'Drivers'
  1. Drivers
  2. DRIVERS-2666

Standardize performance testing infrastructure

    • Type: Icon: Improvement Improvement
    • Resolution: Unresolved
    • Priority: Icon: Unknown Unknown
    • None
    • Labels:
      None
    • Needed - No Spec Changes
    • Hide

      PTAL at the description in the DRIVERS ticket. The Node impl. is also available for reference.

      Show
      PTAL at the description in the DRIVERS ticket. The Node impl. is also available for reference.
    • $i18n.getText("admin.common.words.hide")
      Key Status/Resolution FixVersion
      CDRIVER-4676 Fixed 1.25.0
      CXX-2710 Fixed 3.10.0
      CSHARP-4713 Done 2.24.0
      GODRIVER-2898 Backlog
      JAVA-5065 Fixed 4.11.0
      NODE-5440 Duplicate
      MOTOR-1149 Won't Do
      PYTHON-3823 Fixed 4.7
      PHPLIB-1187 Done
      RUBY-3290 Backlog
      RUST-1698 Fixed 2.8.0
      $i18n.getText("admin.common.words.show")
      #scriptField, #scriptField *{ border: 1px solid black; } #scriptField{ border-collapse: collapse; } #scriptField td { text-align: center; /* Center-align text in table cells */ } #scriptField td.key { text-align: left; /* Left-align text in the Key column */ } #scriptField a { text-decoration: none; /* Remove underlines from links */ border: none; /* Remove border from links */ } /* Add green background color to cells with FixVersion */ #scriptField td.hasFixVersion { background-color: #00FF00; /* Green color code */ } /* Center-align the first row headers */ #scriptField th { text-align: center; } Key Status/Resolution FixVersion CDRIVER-4676 Fixed 1.25.0 CXX-2710 Fixed 3.10.0 CSHARP-4713 Done 2.24.0 GODRIVER-2898 Backlog JAVA-5065 Fixed 4.11.0 NODE-5440 Duplicate MOTOR-1149 Won't Do PYTHON-3823 Fixed 4.7 PHPLIB-1187 Done RUBY-3290 Backlog RUST-1698 Fixed 2.8.0

      see note on DRIVERS-2557

      Summary

      Drivers should ensure any performance testing, including but not limited to the driver spec performance benchmarks:

      • uses a dedicated distro in evergreen (to avoid fluctuations in performance due to distro changes)
      • uses a patch-pinned server version for integration performance tests (to avoid fluctuations in performance due to server performance profile changes)
      • utilizes the performance analytics backend for change point detection via the perf.send command (the tooling uses sophisticated algorithms to detect real points of change in performance and minimize noise)
      • sets up actionable alerting based on performance results

      Motivation

      Who is the affected end user?

      This should allow driver teams to get ahead of performance regressions

      How does this affect the end user?

      N/A

      How likely is it that this problem or use case will occur?

      N/A

      If the problem does occur, what are the consequences and how severe are they?

      N/A

      Is this issue urgent?

      The sooner the drivers implement the standardized architecture, the sooner they can start building up a history of reliable performance data.

      Is this ticket required by a downstream team?

      No

      Is this ticket only for tests?

      Yes

      Acceptance Criteria

      This ticket can go directly into teams implementing. Depending on the team's existing setup, teams may choose to create an epic to address different aspects of the work outlined here.

      • The dedicated performance distro is rhel90-dbx-perf-large (others can be created if needed in coordination with the build team)
      • Driver evergreen tools exposes the patch pinned v6 server version 6.0.6 that can be referenced via the `v6.0-perf` version alias (the analogous perf-stable v7 version will be added later)
      • To use the performance analytics backend, it is sufficient to invoke the perf.send command in the evergreen run:
      • Notifications for change point detection can be set up in kanopy splunk and sent directly to slack (there are many additional notification options available)
        • NOTE: The change point detection works retrospectively; so as new data flows in, it could detect statistically significant changes in the distribution which did not exist before and create change points in the past. Usually if it is a large/prominent and sustained change, it gets detected within a few days of the commit date. In more noisy time series / less prominent changes, it could take a while before a change point gets detected on a commit. The sample query below does not limit the allowed date range of the commits for change point detection, however, this is something that can be added to the query if the notifications for commits too far in the past get too noisy. Here's an example query limiting the search range to 60 days:
          message="New change point detected." index="server-tig-prod" | spath output=project path="change_point.time_series_info.project" | search project IN ("mongo-node-driver-next", "node-bson") | spath output=run_date path="change_point.evg_create_date" | eval days_since=(now()-strptime(run_date, "%Y-%m-%dT%H:%M:%S%:z"))/86400 | search days_since < 60
          
        • NOTE #2: All change points can be triaged, linked to jira tickets, and marked as true or false positives in the build baron UI: https://performance-monitoring-and-analysis.server-tig.prod.corp.mongodb.com/baron (sample filter for the node project); however, this UI is somewhat clunky and, considering the expected volume of change points for a typical driver project, may not be the most efficient way for drivers to act on true positives. Therefore, drivers may choose to implement their own process of triaging change points without formally marking each one in the build baron system.
        • NOTE #3: Remember to set appropriate read/write permissions for your alert. Read permissions can be safely set to everyone. However, in order to set your custom alert edit permissions to just your team, your team's mana group will need to be mapped to a kanopy splunk role; if your team does not appear in the role list, you will need to file an IT ticket to request it to be added.

      Sample splunk query for a single evergreen project:

      message="New change point detected." index="server-tig-prod" | spath "change_point.time_series_info.project" | search "change_point.time_series_info.project"="mongo-node-driver-next"
      

      Sample splunk query for multiple evergreen projects:

      message="New change point detected." index="server-tig-prod" | spath "change_point.time_series_info.project" | search "change_point.time_series_info.project" IN ("mongo-node-driver-next", "node-bson")

      Sample notification message:

      New change point from `$result.change_point.commit_date$`
      *$result.change_point.message$* (<https://spruce.mongodb.com/task/$result.change_point.task_id$/trend-charts|CI Link>)
      
      ```
      Project: $result.change_point.time_series_info.project$
      Variant: $result.change_point.time_series_info.variant$
      Task: $result.change_point.time_series_info.task$
      Test: $result.change_point.time_series_info.test$
      Measurement: $result.change_point.time_series_info.measurement$
      Percent change: $result.change_point.percent_change$
      ```
      
      Included fields: change_point.repo_full_name,change_point.branch
      

       

        1. Node-splunk-perf-alert.png
          Node-splunk-perf-alert.png
          47 kB
        2. Node-splunk-perf-alert-slack-config.png
          Node-splunk-perf-alert-slack-config.png
          98 kB
        3. Node-splunk-slack-alert-example.png
          Node-splunk-slack-alert-example.png
          168 kB
        4. plugins_menu.PNG
          plugins_menu.PNG
          284 kB

            Assignee:
            Unassigned Unassigned
            Reporter:
            daria.pardue@mongodb.com Daria Pardue
            Votes:
            0 Vote for this issue
            Watchers:
            8 Start watching this issue

              Created:
              Updated: