[SERVER-47368] Request to investigate compile times taking ~40 minutes on selected_tests patch builds Created: 06/Apr/20  Updated: 27/Oct/23  Resolved: 17/Apr/20

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Task Priority: Major - P3
Reporter: Lydia Stepanek (Inactive) Assignee: [DO NOT ASSIGN] Backlog - Server Development Platform Team (SDP) (Inactive)
Resolution: Works as Designed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
Assigned Teams:
Server Development Platform
Participants:

 Description   

On DAG team, we recently wrapped up the Targeted Test Selection project. As part of the investigation into whether selected_tests patch builds were running in < 1 hour (one of the goals of the project), I noticed that compile was consistently taking ~40 minutes on the patch builds I looked at (examples below).

My understanding is that compile on patch builds should run in about ~20 minutes, so I thought it might be worth making a ticket about. There were a few patch builds that compiled in ~20 mins so I was wondering what the compile time difference is caused by.

 

All selected_tests patch builds in past week:



 Comments   
Comment by Andrew Morrow (Inactive) [ 15/Apr/20 ]

Thanks brian.mccarthy for noticing - that does indeed explain why the file was missing. I don't think we need to revisit the decision to tie the cache to the image. It has been working well and most builds are fast. Every caching scheme will have occasional unlucky outliers.

Comment by Andrew Morrow (Inactive) [ 10/Apr/20 ]

I see no reason why not.

Comment by Cristopher Stauffer [ 10/Apr/20 ]

In speaking to Sam, can we just increase the cache size?

Comment by Andrew Morrow (Inactive) [ 06/Apr/20 ]

I took a look at https://evergreen.mongodb.com/version/5e8761e057e85a1febcafdf4 and https://evergreen.mongodb.com/version/5e8676bb3627e001aa726295 to compare. I see from the SCons cache log that the slow build ended up with a cache hit rate of 47%, while the fast build had a cache hit rate of close to 100%. I think the variation here is entirely due to whether or not the build was well cached. Note that the slow one ran a day later, but both had the same base commit. Perhaps the RHEL 6.2 image shared cache is getting thrashed due to the number of builds and isn't retaining enough days info? brian.mccarthy, any way of knowing.

As an example, in the fast build, we found session_catalog.o in the cache:

CacheRetrieve(build/cached/mongo/db/session_catalog.o):  retrieving from 7320615c480ea8a0963450709d31626f
requests: 3028, hits: 3025, misses: 3, hit rate: 99.90%

But in the later and slower build it was gone:

CacheRetrieve(build/cached/mongo/db/session_catalog.o):  7320615c480ea8a0963450709d31626f not in cache
requests: 23, hits: 22, misses: 1, hit rate: 95.65%

Generated at Thu Feb 08 05:13:58 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.