Uploaded image for project: 'Evergreen'
  1. Evergreen
  2. EVG-15655

Some time series marked as analyzed in cedar but associated data not sent to SPS

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Open
    • Priority: Major - P3
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: Backlog
    • Component/s: plt
    • Labels:
      None

      Description

      Upon running a PRESTO query to ensure that time series marked as analyzed in cedar exist in expanded_metrics (SPS DB), a bunch of time series were discovered in expanded_metrics with data missing. Note that these are all tail data points. For example,

       

      project variant task_name test_name _id info created_at completed_at rollups analysis
      performance-4.4 linux-wt-standalone update Update.MatchedElementWithinArray a3a1c63cbb2c828b88a01fa420d5c21d59625958 {project=performance-4.4, version=performance_4.4_3641c00be722f394a2b0987cfcac023930e5bbf1, variant=linux-wt-standalone, task_name=update, task_id=performance_4.4_linux_wt_standalone_update_3641c00be722f394a2b0987cfcac023930e5bbf1_21_10_20_15_02_57, execution=0, test_name=Update.MatchedElementWithinArray, trial=0, tags=null, mainline=true, args=
      {thread_level=8}}
      2021-10-20 18:40:20.604 2021-10-20 18:40:20.604 {processed_at=2021-10-20 18:49:43.037, count=null, valid=null, stats=\{name=ops_per_sec, val=13402.079019098726}} {processed_at=2021-10-20 18:49:43.757}


       

      This data point does not exist in expanded_metrics DB. 

      The splunk log here indicates that SPS received an update for this specific time series at 10/20/21 2:49:43.617 PM EST which appears to be the time when Cedar has marked this as analyzed as per analysis {processed_at=2021-10-20 18:49:43.757 UTC.

      With our recent updates to logging, SPS now logs out the length of the data received. And as the splunk log above indicates, the length of the data received is 277 and that is the length of the time series in the expanded metrics DB(i.e with the missing data point), meaning no new data point was sent for this update.


               
      A bunch of such time series updates are observed where the most recent data point is missing. Here are some examples for missing time series from Oct 20 2021 1:00PM to 3:00PM EST. Though these seem to recover as new results are available for the time series, the root cause for the updates with missing tail data point needs to be investigated so as to ensure zero downtime for the performance pipeline.

        Attachments

        1. image-2021-10-20-16-41-37-958.png
          image-2021-10-20-16-41-37-958.png
          23 kB
        2. image-2021-10-20-16-44-56-429.png
          image-2021-10-20-16-44-56-429.png
          13 kB
        3. image-2021-10-20-16-50-48-218.png
          image-2021-10-20-16-50-48-218.png
          273 kB

          Issue Links

            Activity

              People

              Assignee:
              julian.edwards Julian Edwards
              Reporter:
              anjani.bhat Anjani Bhat
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

                Dates

                Created:
                Updated: