[SERVER-50374] Reduce jscore execution variance on the waterfall Created: 18/Aug/20  Updated: 05/Jun/23

Status: Backlog
Project: Core Server
Component/s: Build
Affects Version/s: None
Fix Version/s: None

Type: Task Priority: Major - P3
Reporter: Andrew Morrow (Inactive) Assignee: Backlog - Server Tooling and Methods (STM) (Inactive)
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
Assigned Teams:
Server Tooling & Methods
Participants:
Story Points: 3

 Description   

We currently observe high variance in execution times for the jscore task on the CI loop. While jscore usually takes about 5 minutes to run, sometimes it takes as long as 15.

We would like to add jscore to the commit queue tasks, and we could do so at 5 minutes, but not at 15.

If we can reduce the variance and get the runtime stable at 5 minutes, we could greatly expand the coverage of the commit queue.



 Comments   
Comment by Githook User [ 13/Oct/20 ]

Author:

{'name': 'Richard Samuels', 'email': 'richard.l.samuels@gmail.com', 'username': 'richardsamuels'}

Message: SERVER-50374 Partially reduce jsCore execution variance
Branch: master
https://github.com/mongodb/mongo/commit/0d941cb5c5a659ac526a9867342363f2f5020a6f

Comment by Richard Samuels (Inactive) [ 13/Oct/20 ]

This patch build off master shows that command -V is taking a very long time.

[2020/10/13 18:05:41.937] Running command 'shell.exec' in "do setup" (step 1.7 of 2)
[2020/10/13 18:05:55.616] bash: line 8: command: cygpath: not found

 

This patch build swaps command -V with which, and shows a significantly reduced time spent searching for cygpath:

[2020/10/13 13:59:31.941] Running command 'shell.exec' in "do setup" (step 1.7 of 2)
[2020/10/13 13:59:31.945] which: no cygpath in (/opt/go/bin:/opt/go/bin:/opt/go/bin:/usr/lib64/icecc/bin:/usr/local/sbin:/usr/local/bin:/usr/bin:/usr/sbin:/sbin:/bin:/opt/node/bin:/root/bin:/opt/node/bin:/opt/node/bin) 

 

This is consistent in many reruns, so the above CR will commit this change. However, this is only a piece of the variance displayed in jsCore runtimes. This execution's task logs show off the other source:

 [2020/10/09 19:49:16.413] [resmoke] 2020-10-09T19:49:16.412+0000 Exiting with code: 0
 [2020/10/09 19:49:16.413] [executor] 2020-10-09T19:49:16.413+0000 Total tar/gzip archive time is 0.00 seconds, for 0 file(s) 0 MB
 [2020/10/09 19:54:16.471] [resmoke] 2020-10-09T19:54:16.470+0000 Failed to flush all logs within a reasonable amount of time, treating logs as incomplete 

 i.e. logkeeper v2 performance. One other thing that I don't have logs handy for, is that when resmoke attempts to fetch build/test ids for storing individual test logs, we've seen that take up to 1 minute per id. Any further progress on reducing the variance is blocked by PM-1439.

 

Generated at Thu Feb 08 05:22:29 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.