[SERVER-56623] Guard against large compiles without icecream enabled in the invocation Created: 05/Mar/21  Updated: 29/Oct/23  Resolved: 04/May/21

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: 4.9.0

Type: Improvement Priority: Major - P3
Reporter: Maria van Keulen Assignee: Daniel Moody
Resolution: Fixed Votes: 0
Labels: workstations
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
related to SERVER-49502 ninja and/or scons should prompt for ... Closed
is related to SERVER-49502 ninja and/or scons should prompt for ... Closed
Backwards Compatibility: Fully Compatible
Sprint: Dev Platform 2021-05-17
Participants:

 Description   

Proposed Solution
This ticket specifically addresses the case of forgetting to include icecream in the scons/ninja invocation of a build and using a large -j value. See comment for more details.

Original Request
There have been various instances reported in #evergreen-workstations where users' workstations were unresponsive due to running a large -j value compile without icecream enabled. These issues can occur if a user forgets the icecream daemon isn't running, if there are issues with the icecream cluster, or if the user omits icecream from their invocation.

It would be great if the workstations could protect against such compiles. This comment describes a potential approach to address forgetting icecream from the invocation.



 Comments   
Comment by Githook User [ 04/May/21 ]

Author:

{'name': 'Daniel Moody', 'email': 'daniel.moody@mongodb.com', 'username': 'dmoody256'}

Message: SERVER-56623 make icecream guard from hanging system on large -j
Branch: master
https://github.com/mongodb/mongo/commit/0b620c24c55859f325ce23daa90b3c1c55bc76cb

Comment by Daniel Moody [ 26/Apr/21 ]

Code Review: https://mongodbcr.appspot.com/766980004/

Comment by Maria van Keulen [ 07/Apr/21 ]

Gotcha, thanks for the additional context. Splitting the icecream-side work into another ticket sounds good to me.

Comment by Daniel Moody [ 06/Apr/21 ]

maria.vankeulen That would be bit more tricky, because the ninja file after generation is static and is generated from a system state of some previous time, so icecream daemon state is subject to change.

I think that would more appropriately be worked into the icecream interface instead. So currently there are icerun and icecc, but instead we could have our own internal icecream script which can test icecream functioning and forward the commands to the real icecream scripts and bins, otherwise hard fail. Actually icecream itself should be doing something like this, it's just not graceful in its own failures.

I think something like that could spin off to another ticket, and this is scoped just to prevent large jobs from hanging the system when icecream was not selected.

Comment by Maria van Keulen [ 06/Apr/21 ]

Sounds great, thank you daniel.moody! Is it feasible to also guard against the case of running a large build with the icecream daemon turned off?

Comment by Daniel Moody [ 06/Apr/21 ]

Planned approach will be for pure scons build, if -j is not set, scons will check for icecream being set, and if its not set limit the jobs to number of cpus. An additional optional --force-jobs will be used to override this behavior. A message will always be printed saying scons is limiting the number of jobs if it takes such action.

For the ninja side, If no icecream is in use, scons will generate all pools with a max depth of the max cpus, and it will generate an extra build edge which will be built first and always rebuilt. This can be accomplished with a build edge with no dependencies, a single input which is the ninja file itself, and does not produce its output, and is listed as the first build edge in the ninja file. This build edge will echo that the number of jobs is being limited due to no icecream and to regenerate if not desired.

EDIT:
A couple of follow on notes:

  • I did not go with a hard and early fail, because while that would work well for scons, for ninja, a build edge can't really stop the build if a large j was selected. Maybe the two should just work differently, hard early fails are generally better?
  • The above solution for ninja would mean that non-icecream ninja builds are ALWAYS printing something about the max number jobs, even when you don't specify jobs. Ninja has no interface for that sort of thing from within a build edge. I imagine some people may not be able to tolerate such a message always being printed. It would not be printed only if you are using icecream, which I took as the default usecase with ninja.
Comment by Brooke Miller [ 19/Mar/21 ]

robert.guo mentioned it'd probably be better to do this in the build system, so we've reassigned this to SDP. FYI april.schoffer

Generated at Thu Feb 08 05:39:45 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.