[SERVER-39789] v3 toolchain increased execution time for geo_array1.js Created: 25/Feb/19 Updated: 29/Oct/23 Resolved: 15/Mar/19 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Internal Code |
| Affects Version/s: | None |
| Fix Version/s: | 4.1.10 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Eric Milkie | Assignee: | Andrew Morrow (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||
| Operating System: | ALL | ||||||||
| Sprint: | Dev Tools 2019-03-11, Dev Tools 2019-03-25 | ||||||||
| Participants: | |||||||||
| Linked BF Score: | 43 | ||||||||
| Description |
|
Compare https://evergreen.mongodb.com/task/mongodb_mongo_master_enterprise_rhel_62_64_bit_coverage_jsCore_c065df4a1dae7fe0b73d7a85d13ff2beb47b459d_19_01_31_00_56_27 with the following commit that turned on v3, https://evergreen.mongodb.com/task/mongodb_mongo_master_enterprise_rhel_62_64_bit_coverage_jsCore_a285618a35742923e9e21fa2df4434406b121a4e_19_01_31_02_10_29 geo_array1.js took 42 seconds prior to the commit and 121 seconds after the commit. Most illustrative is the log line for the geo index build: (after v3 upgrade) |
| Comments |
| Comment by Andrew Morrow (Inactive) [ 15/Mar/19 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
schwerin - I filed SERVER-40149 to track the work to understand the long term consequences of this move and sent it to STM. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Githook User [ 15/Mar/19 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Author: {'name': 'Andrew Morrow', 'email': 'acm@mongodb.com', 'username': 'acmorrow'}Message: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Andy Schwerin [ 14/Mar/19 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
On the assumption that this change leaves us no worse than we were before the toolchain upgrade, I’m happy to commit this change to green up the tests. We should definitely either research this further under this ticket or under another. I’m afraid that some future toolchain will somehow make an even stronger assumption of single-threadedness and silently ruin our coverage data. I’ll leave it up to you and DAG or STM to decide whether to close this ticket or keep it open for further research after you commit your change. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Andrew Morrow (Inactive) [ 14/Mar/19 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
schwerin - Possibly. The argument for why we won't generate corrupt data in this mode is that for coverage (as opposed to PGO style profiling) we don't care about the actual number of times any given line was executed or a given branch taken: so as long as at least one counter increment persists then our data is still valid. If you are uncomfortable with that reasoning, I'm happy to hand this ticket over to DAG or STM while consider. But it seemed more important to get the tests green again than worry too much about it, since, as you point out, this is the situation we were already in. We could also commit the proposed change as-is, but leave the ticket open while we research alternatives (coverage sanitizer?). | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Andy Schwerin [ 13/Mar/19 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
If we don’t use -feprofile-update=atomic, might we be corrupting the profiles we’re generating? I realize that if we are we were in the old toolchain, but would we know if they were corrupt? | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Andrew Morrow (Inactive) [ 13/Mar/19 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
GCC 7 introduced a new flag -fprofile-update: https://gcc.gnu.org/gcc-7/changes.html. In environments like ours, it defaults to atomic. The documentation is somewhat misleading, as it indicates that it only applies to -fprofile-generate builds (the first leg of a PGO build, something we are interested in pursuing later). However, doing some builds of a toy program proves that it does have a real effect on -profile-arcs -ftest-coverage mode builds, which is what we use for our coverage builds:
Here is what the baseline assembly for main looks like with v2:
If we upgrade to v3, it looks pretty different:
Look at all those lock addq instructions! We can confirm that this change is under the control of the new flag:
And the locked instructions are gone. So adding -fprofile-update=single seems to get us back to the v2 style codegen even with the -pthread flag in play, as we want. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Andrew Morrow (Inactive) [ 11/Mar/19 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
As far as I know this only affects coverage builders. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by David Storch [ 11/Mar/19 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
charlie.swanson, thanks for the heads up! I figured out my performance regression, and it was unrelated to the introduction of the v3 toolchain: see | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Charlie Swanson [ 28/Feb/19 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
david.storch this may or may not be relevant but since you're working on a perf regression I thought you might want to check whether the timing aligned with any toolchain changes. |