[SERVER-49502] ninja and/or scons should prompt for confirmation when using a large "j" value without icecream Created: 14/Jul/20  Updated: 10/Jun/23  Resolved: 30/May/23

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Improvement Priority: Major - P3
Reporter: Louis Williams Assignee: Tausif Rahman (Inactive)
Resolution: Won't Do Votes: 2
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Duplicate
is duplicated by SERVER-72132 Prevent me from accidentally overwhel... Closed
Gantt Dependency
has to be done after SERVER-72349 Create mongo ninja python package Closed
Related
related to SERVER-56623 Guard against large compiles without ... Closed
is related to SERVER-56623 Guard against large compiles without ... Closed
is related to SERVER-72132 Prevent me from accidentally overwhel... Closed
Assigned Teams:
Server Development Platform
Sprint: Dev Platform 2020-08-24
Participants:

 Description   

Compiling with -j400 without icecream support is a recipe for disaster. It's an easy mistake to make, and it can happen due to either user misconfiguration or bugs in the ninja generator.

On virtual workstations, SSH connections will time out, and the only resolution is to reset the host.

It would be nice to add a prompt to say: "Are you sure you want to do this? You tried to compile with 400 jobs but your system only has 16 cores and icecream support isn't enabled. This will not end well".



 Comments   
Comment by Daniel Moody [ 30/May/23 ]

This was implemented in SERVER-56623.

Comment by Iryna Zhuravlova [ 30/May/23 ]

This has already been implemented in a different ticket. 

Comment by Louis Williams [ 30/Mar/23 ]

In recent history I have accidentally run a large j compile with the icecream daemon turned off and it ran slowly but did not pin my host, so anecdotally that may in fact be resolved. Out of fear, I have not confirmed that compiling without icecream in my ninja configuration does not have the original problem I described.

That said, I also can't think of a time that I've hit this recently. Because Ctrl-Cing Scons doesn't wipe out the build.ninja file anymore, I don't have to manually run Scons as much. That, and with the workstation setup script, it seems harder to hit this problem than when the virtual workstations were brand new.

Comment by Ryan Egesdahl (Inactive) [ 30/Mar/23 ]

daniel.gomezferro@mongodb.com If you are using an x86_64 virtual workstation created anytime within the past couple of years and haven’t built and installed your own Icecream or something, you have our patched version. Otherwise, it needs to be version 1.4 installed from our PPA.

Comment by Daniel Gomez Ferro [ 30/Mar/23 ]

make sure you are using our patched Icecream

ryan.egesdahl@mongodb.com how do we check that?

Comment by Ryan Egesdahl (Inactive) [ 30/Mar/23 ]

There is an edge case in our concurrency limiter where if the Icecream daemon wasn't working properly, it could still cause a large local concurrency. In addition, there was an upstream Icecream bug that would cause resource overuse in cases where a compile job was directed to the local host. We have fixed the resource overuse bug already in our patched Icecream binary that gets installed on virtual workstations, so this should not be happening for us anymore. If you are still seeing this happen, please make sure you are using our patched Icecream.

Comment by Daniel Gomez Ferro [ 13/Mar/23 ]

We should also check that iceccd is running.

Comment by Alex Neben [ 29/Dec/22 ]

We are making a ninja python wrapper for metrics collection. We should be able to add this check to that same wrapper. The ticket is linked.

Comment by Daniel Gottlieb (Inactive) [ 05/Aug/20 ]

louis.williams looking at this problem from a different angle: we can probably come up with a shell function to replace ninja that examines the ninja file being used and checks for "ICECC=icecc" on hygienic or "--icecream" being present in the scons invocation (technically a bit hard given that gets split across lines; simply grepping is probably sufficient). The bash function can then scan for a large -j and error in the case of a large -j but not seeing icecream being used.

Obviously not the ideal user experience (extending to non-workstation users requires copying/writing it for themselves), but it's plausible that could be auto-installed on ec2 workstations.

Comment by Andrew Morrow (Inactive) [ 05/Aug/20 ]

I don't believe it is possible for us to add this sort of conditional argument check to our generated Ninja file. I'm not aware of anything in the syntax that would let us express it, nor of anything that exposes the user provided number of jobs such that we could query it.

Implementing the check in SCons would have no effect for developers using Ninja, because the concurrency argument passed to SCons has no effect on the generated Ninja file. A user can invoke ninja on the generated file with any desired concurrency and we have no way to stop that.

We could, I suppose, offer something for direct SCons builds, but I'm not inclined to make SCons an interactive tool. It isn't really designed for that.

Another option would be to have the concurrency used when generating Ninja configure a Ninja pool of the same size and use that for all jobs. But that would mean that the value of `-j` used when running SCons to generate the Ninja file would be an upper bound on the available Ninja concurrency. It would be surprising, and I expect developers would frequently end up generating Ninja files for which real concurrency was not available, since they wouldn't ask for concurrency when running the Ninja Generator.

A better answer is to ensure that virtual workstations provide an emergency sshd that is running in the realtime scheduling class. There are many ways to accidentally saturate a remote machine such that it becomes unavailable via ssh running in the normal user scheduling class.

Generated at Thu Feb 08 05:20:01 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.