-
Type: Bug
-
Resolution: Done
-
Priority: Critical - P2
-
Affects Version/s: None
-
Component/s: Testing Infrastructure
-
None
-
Fully Compatible
-
ALL
Recently I saw an increased number of tasks marked as SYSTEM UNRESPONSIVE.
It seems that in some cases those tasks were executing tests that actually failed but the task is wrongly labeled and the real error is swallowed.
For instance recently the serverless suite keep failing consistently with SYSTEM UNRESPONSIVE.
I've analyzed this task and I noticed that job3 was executing change_collection_server_stats.js (jobs logs) actually failed with:
[js_test:change_collection_server_stats] assert.soon failed (timeout 600000ms): () => { [js_test:change_collection_server_stats] // All change collection entries are removed but one. [js_test:change_collection_server_stats] return changeCollection.count() === 1; [js_test:change_collection_server_stats] }
But this error is not reported in the task logs nor in the evergreen UI.
- is related to
-
SERVER-75862 find root cause of system unresponsive live process hang_analyzer
- Closed
- related to
-
SERVER-75860 temporarily disable live process hang_analyzer
- Closed