[SERVER-44806] Move DocumentSource::getNext() implementation to the header Created: 22/Nov/19  Updated: 29/Oct/23  Resolved: 09/Dec/19

Status: Closed
Project: Core Server
Component/s: Aggregation Framework, Performance, Querying
Affects Version/s: None
Fix Version/s: 4.3.3

Type: Improvement Priority: Major - P3
Reporter: Charlie Swanson Assignee: Eric Cox (Inactive)
Resolution: Fixed Votes: 0
Labels: neweng, qopt-team
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: PNG File Screen Shot 2019-12-04 at 1.36.05 PM.png     PNG File Screen Shot 2019-12-04 at 1.37.16 PM.png     PNG File Screen Shot 2019-12-05 at 3.34.06 PM.png     PNG File Screen Shot 2019-12-05 at 3.35.29 PM.png     PNG File Screen Shot 2019-12-05 at 3.37.51 PM.png    
Issue Links:
Depends
Backwards Compatibility: Fully Compatible
Sprint: Query 2019-12-16
Participants:
Linked BF Score: 0

 Description   

While investigating a small performance regression we found as fallout from SERVER-42584, we noticed that doing this may have a minor but measurable performance improvement. As an experiment I ran a patch build and got [these numbers|performance_081a51065ac9b855a0d9e07d88701910b547ad96].



 Comments   
Comment by Githook User [ 09/Dec/19 ]

Author:

{'name': 'Eric Cox', 'username': 'ericox', 'email': 'eric.cox@mongodb.com'}

Message: SERVER-44806 Move DocumentSource::getNext() implementation to the header
Branch: master
https://github.com/mongodb/mongo/commit/34ce1c33da2ea46ea7cc12980e8a4e436ad276eb

Comment by Eric Cox (Inactive) [ 05/Dec/19 ]

acm Yes, the getNext function isn't always inlined. I pulled the binaries from evergreen for cases 1, 2, and 3 above.

For case 1 (baseline):

The obvious. getNext() isn't inlined so disassembling shows calls to getNext(). Here are baseline perf numbers relative to 

https://evergreen.mongodb.com/version/performance_58abcf6577982367232a6c76d1ee277a4031ed3c

 

 

For case 2:

There are 45/66 calls where getNext() was not inlined when simply moving getNext() to the header. But where the compiler can inline at various call-sites it is able to do so.  This can explain the perf number I posted above in the above comment.

 

$ objdump -d mongod-getNext-inline-2 | c+filt | egrep 
 
"DocumentSource::getNext|DocumentSource::_getNext_inlined_marker" 138206d: e8 8e 09 00 00        callq  1382a00 <mongo::DocumentSource::getNext()> 138226c: e8 8f 07 00 00        callq  1382a00 <mongo::DocumentSource::getNext()>0000000001382a00 <mongo::DocumentSource::getNext()>: 1382a3d: 74 7d                je     1382abc <mongo::DocumentSource::getNext()+0xbc> 1382a46: 74 74                je     1382abc <mongo::DocumentSource::getNext()+0xbc> 1382a52: 0f 84 81 00 00 00    je     1382ad9 <mongo::DocumentSource::getNext()+0xd9> 1382a70: e8 5b 39 6c 00        callq  1a463d0 <mongo::DocumentSource::_getNext_inlined_marker()> 1382a87: 74 27                je     1382ab0 <mongo::DocumentSource::getNext()+0xb0> 1382aa1: 75 14                jne    1382ab7 <mongo::DocumentSource::getNext()+0xb7> 1382ab5: eb d2                jmp    1382a89 <mongo::DocumentSource::getNext()+0x89> 1382ad7: eb 18                jmp    1382af1 <mongo::DocumentSource::getNext()+0xf1> 13a4a78: e8 53 19 6a 00        callq  1a463d0 <mongo::DocumentSource::_getNext_inlined_marker()>0000000001a463d0 <mongo::DocumentSource::_getNext_inlined_marker()>: 1a4f683: e8 78 33 93 ff        callq  1382a00 <mongo::DocumentSource::getNext()> 1a4f70c: e8 ef 32 93 ff        callq  1382a00 <mongo::DocumentSource::getNext()> 1a6203d: e8 8e 43 fe ff        callq  1a463d0 <mongo::DocumentSource::_getNext_inlined_marker()> 1a6795f: e8 6c ea fd ff        callq  1a463d0 <mongo::DocumentSource::_getNext_inlined_marker()> 1a69745: e8 86 cc fd ff        callq  1a463d0 <mongo::DocumentSource::_getNext_inlined_marker()> 1a6bda0: e8 5b 6c 91 ff        callq  1382a00 <mongo::DocumentSource::getNext()>

 

For case 3:
Forcing inlining with `attribute(always_inline)` shows that all getNext() calls are indeed inlined.

$ objdump -d mongod-getNext-always_inline-3 | c++filt | egrep "DocumentSource::getNext" | wc
 0 0 0
 
$ objdump -d mongod-getNext-inline-2 | c++filt | egrep "DocumentSource::getNext|DocumentSource::_getNext_inlined_marker" | wc
66 563 5725

Full inlining with `attribute(alwas_inline)` gives a few percent over moving getNext() to the header, it could be noise.

charlie.swanson let me know if you need further investigation.
 

Comment by Eric Cox (Inactive) [ 04/Dec/19 ]

acm As discussed on slack, Charlie and James are onboard with investigating further. I'll report back with performance results for diffs that have: 

1) A baseline (no changes)
2) getNext in the header that calls a function not used by anything else `getNext_was_inlined`
3) getNext in the header the same as 2 but annotated with `attribute(always_inline)`

Comment by Andrew Morrow (Inactive) [ 04/Dec/19 ]

eric.cox - Can't say that really convinces me. The function doesn't appear to actually get inlined, and I'm not surprised given its contents. Did you pull the mongod binaries for the two builds and compare the generated code and validate that getNext was actually inlined into important callsites when you moved the definition to the header? If not, I think it is unwise to just proceed without a good understanding of why moving the function to the header had this effect, despite the function not actually being inlined. You could find that your performance gains are ephemeral - a consequence of codegen peculiarities that are in play now, and that might cease to be in play in the future. At minimum, I'd suggest doing an additional experiment where you apply attribute(always_inline) to the definition in the header, verify that the function does really get inlined with that attribute, and that you observe the same performance deltas.

Comment by Eric Cox (Inactive) [ 04/Dec/19 ]

acm We took a closer look at the performance around inlining getNext() by moving the code to the header. Here are some attached screenshots of the performance results.

 

There seems to be a clear advantage to this approach to support the change so we are going to move forward with it.

Comment by Eric Cox (Inactive) [ 03/Dec/19 ]

Hey charlie.swanson, I started looking at this and was wondering if you did any further perf tests to verify Andrew's comment above. If so, can we chat briefly about the perf investigation?

Comment by Andrew Morrow (Inactive) [ 25/Nov/19 ]

charlie.swanson - I was curious about this. After taking a look at DocumentSource::getNext, I was skeptical that it would actually be inlined. So I built two mongod binaries, one where it was, and one where it wasn't.

In neither mongod binary did I find that DocumentSource::getNext was actually inlined. What I did observe was that in the case where DocumentSource::getNext was moved to the header and tagged inline, the emitted definition was not hot/cold partitioned, but in the case where it was defined out of line, it was.

$ nm --defined-only ./mongod.inline | c++filt | grep DocumentSource::getNext
000000000103f020 t mongo::DocumentSource::getNext()
 
$ nm --defined-only ./mongod | c++filt | grep DocumentSource::getNext
00000000016f85b0 T mongo::DocumentSource::getNext()
0000000000a1014a t mongo::DocumentSource::getNext() [clone .cold.629]

I think it is worth taking a closer look at this before simply moving it to the header. The performance consequences do not appear to be attributable to the function being inlined.

Generated at Thu Feb 08 05:07:02 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.