[SERVER-39789] v3 toolchain increased execution time for geo_array1.js Created: 25/Feb/19  Updated: 29/Oct/23  Resolved: 15/Mar/19

Status: Closed
Project: Core Server
Component/s: Internal Code
Affects Version/s: None
Fix Version/s: 4.1.10

Type: Bug Priority: Major - P3
Reporter: Eric Milkie Assignee: Andrew Morrow (Inactive)
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
Related
Backwards Compatibility: Fully Compatible
Operating System: ALL
Sprint: Dev Tools 2019-03-11, Dev Tools 2019-03-25
Participants:
Linked BF Score: 43

 Description   

Compare https://evergreen.mongodb.com/task/mongodb_mongo_master_enterprise_rhel_62_64_bit_coverage_jsCore_c065df4a1dae7fe0b73d7a85d13ff2beb47b459d_19_01_31_00_56_27 with the following commit that turned on v3, https://evergreen.mongodb.com/task/mongodb_mongo_master_enterprise_rhel_62_64_bit_coverage_jsCore_a285618a35742923e9e21fa2df4434406b121a4e_19_01_31_02_10_29

geo_array1.js took 42 seconds prior to the commit and 121 seconds after the commit. Most illustrative is the log line for the geo index build:
(prior to v3)
[MongoDFixture:job0] 2019-02-22T19:50:15.226+0000 I INDEX [conn55] index build: inserted 30000 keys from external sorter into index in 2 seconds

(after v3 upgrade)
[MongoDFixture:job4] 2019-02-22T19:55:54.009+0000 I INDEX [conn4] index build: inserted 30000 keys from external sorter into index in 11 seconds



 Comments   
Comment by Andrew Morrow (Inactive) [ 15/Mar/19 ]

schwerin - I filed SERVER-40149 to track the work to understand the long term consequences of this move and sent it to STM.

Comment by Githook User [ 15/Mar/19 ]

Author:

{'name': 'Andrew Morrow', 'email': 'acm@mongodb.com', 'username': 'acmorrow'}

Message: SERVER-39789 Revert to single-thread mode for profile data
Branch: master
https://github.com/mongodb/mongo/commit/6b706747bbc3774c98720da14ae48489cdf9593e

Comment by Andy Schwerin [ 14/Mar/19 ]

On the assumption that this change leaves us no worse than we were before the toolchain upgrade, I’m happy to commit this change to green up the tests. We should definitely either research this further under this ticket or under another. I’m afraid that some future toolchain will somehow make an even stronger assumption of single-threadedness and silently ruin our coverage data. I’ll leave it up to you and DAG or STM to decide whether to close this ticket or keep it open for further research after you commit your change.

Comment by Andrew Morrow (Inactive) [ 14/Mar/19 ]

schwerin - Possibly. The argument for why we won't generate corrupt data in this mode is that for coverage (as opposed to PGO style profiling) we don't care about the actual number of times any given line was executed or a given branch taken: so as long as at least one counter increment persists then our data is still valid. If you are uncomfortable with that reasoning, I'm happy to hand this ticket over to DAG or STM while consider. But it seemed more important to get the tests green again than worry too much about it, since, as you point out, this is the situation we were already in. We could also commit the proposed change as-is, but leave the ticket open while we research alternatives (coverage sanitizer?).

Comment by Andy Schwerin [ 13/Mar/19 ]

If we don’t use -feprofile-update=atomic, might we be corrupting the profiles we’re generating? I realize that if we are we were in the old toolchain, but would we know if they were corrupt?

Comment by Andrew Morrow (Inactive) [ 13/Mar/19 ]

GCC 7 introduced a new flag -fprofile-update: https://gcc.gnu.org/gcc-7/changes.html. In environments like ours, it defaults to atomic. The documentation is somewhat misleading, as it indicates that it only applies to -fprofile-generate builds (the first leg of a PGO build, something we are interested in pursuing later). However, doing some builds of a toy program proves that it does have a real effect on -profile-arcs -ftest-coverage mode builds, which is what we use for our coverage builds:

$ cat ./foo.cpp
#include <iostream>
 
int main(int argc, char* argv[]) {
    if (argc) {
        std::cout << "Hello, World!\n";
    }
    return 0;
}

Here is what the baseline assembly for main looks like with v2:

$ /opt/mongodbtoolchain/v2/bin/g++ -fprofile-arcs -ftest-coverage ./foo.cpp -o foo.v2 -pthread
 
$ objdump -d ./foo.v2
...
0000000000402f16 <main>:
  402f16:       55                      push   %rbp
  402f17:       48 89 e5                mov    %rsp,%rbp
  402f1a:       48 83 ec 10             sub    $0x10,%rsp
  402f1e:       89 7d fc                mov    %edi,-0x4(%rbp)
  402f21:       48 89 75 f0             mov    %rsi,-0x10(%rbp)
  402f25:       48 8b 05 24 58 2c 00    mov    0x2c5824(%rip),%rax        # 6c8750 <__gcov0.main>
  402f2c:       48 83 c0 01             add    $0x1,%rax
  402f30:       48 89 05 19 58 2c 00    mov    %rax,0x2c5819(%rip)        # 6c8750 <__gcov0.main>
  402f37:       83 7d fc 00             cmpl   $0x0,-0x4(%rbp)
  402f3b:       74 21                   je     402f5e <main+0x48>
  402f3d:       48 8b 05 14 58 2c 00    mov    0x2c5814(%rip),%rax        # 6c8758 <__gcov0.main+0x8>
  402f44:       48 83 c0 01             add    $0x1,%rax
  402f48:       48 89 05 09 58 2c 00    mov    %rax,0x2c5809(%rip)        # 6c8758 <__gcov0.main+0x8>
  402f4f:       be 28 2d 49 00          mov    $0x492d28,%esi
  402f54:       bf 80 96 6c 00          mov    $0x6c9680,%edi
  402f59:       e8 a2 e9 05 00          callq  461900 <_ZStlsISt11char_traitsIcEERSt13basic_ostreamIcT_ES5_PKc>
  402f5e:       ba 00 00 00 00          mov    $0x0,%edx
  402f63:       48 8b 05 f6 57 2c 00    mov    0x2c57f6(%rip),%rax        # 6c8760 <__gcov0.main+0x10>
  402f6a:       48 83 c0 01             add    $0x1,%rax
  402f6e:       48 89 05 eb 57 2c 00    mov    %rax,0x2c57eb(%rip)        # 6c8760 <__gcov0.main+0x10>
  402f75:       89 d0                   mov    %edx,%eax
  402f77:       c9                      leaveq
  402f78:       c3                      retq
...

If we upgrade to v3, it looks pretty different:

$ /opt/mongodbtoolchain/v3/bin/g++ -fprofile-arcs -ftest-coverage ./foo.cpp -o foo -pthread
 
$ objdump -d foo
...
0000000000404c12 <main>:
  404c12:       55                      push   %rbp
  404c13:       48 89 e5                mov    %rsp,%rbp
  404c16:       48 83 ec 10             sub    $0x10,%rsp
  404c1a:       89 7d fc                mov    %edi,-0x4(%rbp)
  404c1d:       48 89 75 f0             mov    %rsi,-0x10(%rbp)
  404c21:       f0 48 83 05 26 4b 2c    lock addq $0x1,0x2c4b26(%rip)        # 6c9750 <__gcov0.main>
  404c28:       00 01
  404c2a:       83 7d fc 00             cmpl   $0x0,-0x4(%rbp)
  404c2e:       74 18                   je     404c48 <main+0x36>
  404c30:       f0 48 83 05 1f 4b 2c    lock addq $0x1,0x2c4b1f(%rip)        # 6c9758 <__gcov0.main+0x8>
  404c37:       00 01
  404c39:       be 29 25 49 00          mov    $0x492529,%esi
  404c3e:       bf 80 a6 6c 00          mov    $0x6ca680,%edi
  404c43:       e8 a8 e8 05 00          callq  4634f0 <_ZStlsISt11char_traitsIcEERSt13basic_ostreamIcT_ES5_PKc>
  404c48:       b8 00 00 00 00          mov    $0x0,%eax
  404c4d:       f0 48 83 05 0a 4b 2c    lock addq $0x1,0x2c4b0a(%rip)        # 6c9760 <__gcov0.main+0x10>
  404c54:       00 01
  404c56:       c9                      leaveq
  404c57:       c3                      retq
...

Look at all those lock addq instructions!

We can confirm that this change is under the control of the new flag:

$ /opt/mongodbtoolchain/v3/bin/g++ -fprofile-arcs -ftest-coverage -fprofile-update=atomic ./foo.cpp -o foo -pthread
 
$ objdump -d foo
...
0000000000404c12 <main>:
  404c12:       55                      push   %rbp
  404c13:       48 89 e5                mov    %rsp,%rbp
  404c16:       48 83 ec 10             sub    $0x10,%rsp
  404c1a:       89 7d fc                mov    %edi,-0x4(%rbp)
  404c1d:       48 89 75 f0             mov    %rsi,-0x10(%rbp)
  404c21:       f0 48 83 05 26 4b 2c    lock addq $0x1,0x2c4b26(%rip)        # 6c9750 <__gcov0.main>
  404c28:       00 01
  404c2a:       83 7d fc 00             cmpl   $0x0,-0x4(%rbp)
  404c2e:       74 18                   je     404c48 <main+0x36>
  404c30:       f0 48 83 05 1f 4b 2c    lock addq $0x1,0x2c4b1f(%rip)        # 6c9758 <__gcov0.main+0x8>
  404c37:       00 01
  404c39:       be 29 25 49 00          mov    $0x492529,%esi
  404c3e:       bf 80 a6 6c 00          mov    $0x6ca680,%edi
  404c43:       e8 a8 e8 05 00          callq  4634f0 <_ZStlsISt11char_traitsIcEERSt13basic_ostreamIcT_ES5_PKc>
  404c48:       b8 00 00 00 00          mov    $0x0,%eax
  404c4d:       f0 48 83 05 0a 4b 2c    lock addq $0x1,0x2c4b0a(%rip)        # 6c9760 <__gcov0.main+0x10>
  404c54:       00 01
  404c56:       c9                      leaveq
  404c57:       c3                      retq
...

$ /opt/mongodbtoolchain/v3/bin/g++ -fprofile-arcs -ftest-coverage -fprofile-update=single ./foo.cpp -o foo -pthread
 
$ objdump -d ./foo
...
0000000000404c12 <main>:
  404c12:       55                      push   %rbp
  404c13:       48 89 e5                mov    %rsp,%rbp
  404c16:       48 83 ec 10             sub    $0x10,%rsp
  404c1a:       89 7d fc                mov    %edi,-0x4(%rbp)
  404c1d:       48 89 75 f0             mov    %rsi,-0x10(%rbp)
  404c21:       48 8b 05 28 4b 2c 00    mov    0x2c4b28(%rip),%rax        # 6c9750 <__gcov0.main>
  404c28:       48 83 c0 01             add    $0x1,%rax
  404c2c:       48 89 05 1d 4b 2c 00    mov    %rax,0x2c4b1d(%rip)        # 6c9750 <__gcov0.main>
  404c33:       83 7d fc 00             cmpl   $0x0,-0x4(%rbp)
  404c37:       74 21                   je     404c5a <main+0x48>
  404c39:       48 8b 05 18 4b 2c 00    mov    0x2c4b18(%rip),%rax        # 6c9758 <__gcov0.main+0x8>
  404c40:       48 83 c0 01             add    $0x1,%rax
  404c44:       48 89 05 0d 4b 2c 00    mov    %rax,0x2c4b0d(%rip)        # 6c9758 <__gcov0.main+0x8>
  404c4b:       be 89 25 49 00          mov    $0x492589,%esi
  404c50:       bf 80 a6 6c 00          mov    $0x6ca680,%edi
  404c55:       e8 f6 e8 05 00          callq  463550 <_ZStlsISt11char_traitsIcEERSt13basic_ostreamIcT_ES5_PKc>
  404c5a:       ba 00 00 00 00          mov    $0x0,%edx
  404c5f:       48 8b 05 fa 4a 2c 00    mov    0x2c4afa(%rip),%rax        # 6c9760 <__gcov0.main+0x10>
  404c66:       48 83 c0 01             add    $0x1,%rax
  404c6a:       48 89 05 ef 4a 2c 00    mov    %rax,0x2c4aef(%rip)        # 6c9760 <__gcov0.main+0x10>
  404c71:       89 d0                   mov    %edx,%eax
  404c73:       c9                      leaveq
  404c74:       c3                      retq
...

And the locked instructions are gone. So adding -fprofile-update=single seems to get us back to the v2 style codegen even with the -pthread flag in play, as we want.

Comment by Andrew Morrow (Inactive) [ 11/Mar/19 ]

As far as I know this only affects coverage builders.

Comment by David Storch [ 11/Mar/19 ]

charlie.swanson, thanks for the heads up! I figured out my performance regression, and it was unrelated to the introduction of the v3 toolchain: see SERVER-40058.

Comment by Charlie Swanson [ 28/Feb/19 ]

david.storch this may or may not be relevant but since you're working on a perf regression I thought you might want to check whether the timing aligned with any toolchain changes.

Generated at Thu Feb 08 04:53:07 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.