We frequently get failed builds in evergreen due to OOM and we have tried to manage this by limiting the concurrency for certain tasks, but this is not a very precise solution and has resulted in continued failed builds and maintenance.
We should create a scons tool which will modify link and compile jobs so that if the underlying command failed, we parse the stdout/stderr and identify the OOM error message depending on the compiler/platform. We then wait some random time, and try again up to some max number of retries.
- related to
-
SERVER-82612 Use all available CPU cores for scons in compilation tasks
- Backlog