[SERVER-63055] gcov and clang cause failures in ValidateCollections Created: 27/Jan/22  Updated: 29/Oct/23  Resolved: 28/Mar/22

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: 5.3.1

Type: Bug Priority: Major - P3
Reporter: Richard Samuels (Inactive) Assignee: Ryan Egesdahl (Inactive)
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
Related
related to SERVER-66134 Code Coverage build variants in Everg... Closed
is related to SERVER-60832 Code Coverage variant not generating ... Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Sprint: Dev Platform 2022-02-21, Dev Platform 2022-03-07, Dev Platform 2022-04-04
Participants:

 Description   

In SERVER-60832, we attempted to add support for using gcov with clang. When doing so (with either our v3 or v4 toolchains), new failures can be found in nestedarr1:ValidateCollections and nestedobj1:ValidateCollections in jsCore due to segfaults. Wider spread failures can be found in other tasks.

 

These failures only occur when using clang and gcov. With gcc, this issue does not occur.



 Comments   
Comment by Githook User [ 28/Mar/22 ]

Author:

{'name': 'Ryan Egesdahl', 'email': 'ryan.egesdahl@mongodb.com', 'username': 'deriamis'}

Message: SERVER-63055 Implement gcov with LLVM
Branch: master
https://github.com/mongodb/mongo/commit/05ea8339afabf09c07c43b374a060d8168fe873f

Comment by Ryan Egesdahl (Inactive) [ 28/Mar/22 ]

I've merged in from master, and this should be ready to go now. I have a patch build running, and once I know my commit applies correctly on top of the recent evergreen.yml changes, I'll continue with the merge.

Comment by Iryna Zhuravlova [ 21/Mar/22 ]

It is blocked on the Evergreen release EVG-16255

 

 

Comment by Ryan Egesdahl (Inactive) [ 14/Mar/22 ]

My investigation has fairly conclusively demonstrated that the segfaults are caused by optimization being disabled while debug is enabled. We halve the amount of stack space available to the task executor whenever we have debugging enabled, and on Clang with optimization disabled this causes us to exceed the stack size limit, resulting in the observed segfault. We can work around this by simply not halving the stack space limit if optimization is disabled.

There is still another problem evident in both the GCC and Clang builds (but more so Clang) in that disabling optimization seems to cause some tasks to fail consistently. The failure behavior makes me think there are some tight timeouts set for these tests, and the performance difference without optimization is causing them to be exceeded. I'm not going to do anything about these, but I will be removing the sole "unoptimized" builder we have for Linux in favor of these coverage builders.

Comment by Ryan Egesdahl (Inactive) [ 01/Mar/22 ]

richard.samuels I've tried this a few times now, and Evergreen builds fail with the same segfaults as before. However, local builds don't. I still don't know why that is, and I've been having trouble getting any information out of the core dumps to tell me where it's happening.

Generated at Thu Feb 08 05:56:47 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.