[SERVER-32351] The task-specific totalTimeSpentExecutingMetrics should not count time they are not actually running due to recursion Created: 14/Dec/17  Updated: 06/Dec/22  Resolved: 08/Jul/19

Status: Closed
Project: Core Server
Component/s: Diagnostics
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Bruce Lucas (Inactive) Assignee: Backlog - Service Architecture
Resolution: Won't Fix Votes: 0
Labels: SWDI
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: PNG File example.png    
Issue Links:
Depends
Related
is related to SERVER-30567 Break out serviceExecutorTaskStats by... Closed
Assigned Teams:
Service Arch
Operating System: ALL
Participants:

 Description   

The above chart presents some overall and some task-specific serviceExecutorTaskStats.metricsByTask.* execution time metrics introduced in SERVER-30567.

  • "tasks total currently executing" is the overall number of tasks currently executing, derived from serviceExecutorTaskStats.totalTimeExecutingMicros
  • "tasks source currently executing" is sourceMessage tasks currently executing, derived from serviceExecutorTaskStats.metricsByTask.sourceMessage.totalTimeExecutingMicros
  • "tasks process currently executing" is processMessage tasks currently executing, derived from serviceExecutorTaskStats.metricsByTask.processMessage.totalTimeExecutingMicros

Note that, surprisingly, the sum of the two task-specific execution time metrics is 2x the total execution time metrics, making it appear that there are ~500 total tasks executing at the same time (~250 processMessage tasks and ~250 sourceMessage tasks), even though there are only ~250 threads.

I believe this is because a sourceMessage task can be called recursively on the same stack as a processMessage task by dispatching the task rather than posting it, but when this happens the timer for the processMessage task continues running even though the task itself is not executing. So if this recursion occurs the time for the recursive sourceMessage task gets added to the time for the outer processMessage task, so that time is double-counted.

(On the other hand the overall serviceExecutorTaskStats.totalTimeExecutingMicros is correct and it does not double-count time in this way because the timer is only started and stopped based on posted, not dispatched, tasks.)



 Comments   
Comment by Benjamin Caimano (Inactive) [ 02/Apr/18 ]

Fyi, from a view inside the code, this looks like it shouldn't be an issue anymore. I'll be confirming tomorrow.

Generated at Thu Feb 08 04:29:58 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.