[SERVER-1272] mongo::BufBuilder::grow fails to be inlined, makes appending slow Created: 21/Jun/10 Updated: 12/Jul/16 Resolved: 20/Jul/10 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Internal Client |
| Affects Version/s: | 1.5.3 |
| Fix Version/s: | 1.5.6 |
| Type: | Improvement | Priority: | Major - P3 |
| Reporter: | Andrew Morrow (Inactive) | Assignee: | Alberto Lerner |
| Resolution: | Done | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: |
Ubuntu 10.04 x86_64, gcc-4.4.3 |
||
| Participants: |
| Description |
|
When profiling a program making heavy use of the BSON library, I noticed that a large percentage of time was spent in mongo::BufBuilder::grow. That was odd since I had pre-configured BufBuilder objects with a generous chunk of memory, and I confirmed that realloc was not being called, so the buffer was never actually growing. It turns out that the compiler was not inlining mongo::BufBuilder::grow. The function was emitted out-of-line and was being called through the PLT to do so much as append an integer to an already allocated buffer. I was able to improve throughput a good bit by partitioning BufBuilder::grow into a hot inlined function do to the space check, and an explicitly non-inlined cold function to handle reallocating on overflow: /* returns the pre-grow write position */ return data + oldlen; void grow_reallocate() _attribute_((noinline)) { int a = size * 2; if ( a == 0 ) a = 512; if ( l > a ) a = l + 16 * 1024; if( a > 64 * 1024 * 1024 ) msgasserted(10000, "BufBuilder grow() > 64MB"); data = (char *) realloc(data, a); size= a; }After this change, the available space check was inlined during BufBuilder::append calls, and grow_reallocate was emitted out of line (and never called) This seemed to buy me about a 20% improvement in throughput while constructing complex BSON documents. The _attribute((noinline)) is necessary here, since I don't want grow_reallocate inlined into 'grow'. Ideally, grow_reallocate would be at .cc scope, not in the header at all, but perhaps you want to keep the BSON library 'header only'. If that is the case, then maybe think about macroizing __attribute_((noinline)), and then marking cold functions, or functions that make expensive library calls like 'realloc', as non-inline-able, as there is little or no benefit to inlining them. Similarly, splitting in-header functions that may be complicated enough that the compiler could reasonably choose not to inline into an inlined hot/fast-path and a non-inlined slow/cold-path could allow the compiler to more aggressively inline common cases. |
| Comments |
| Comment by auto [ 20/Jul/10 ] |
|
Author: {'login': 'alerner', 'name': 'Alberto Lerner', 'email': 'alerner@10gen.com'}Message: |
| Comment by auto [ 06/Jul/10 ] |
|
Author: {'login': 'alerner', 'name': 'Alberto Lerner', 'email': 'alerner@10gen.com'}Message: |
| Comment by Alberto Lerner [ 06/Jul/10 ] |
|
There's an important set of functionality that does not require linking libmongoclient and we would like to maintain the possibility of pulling all BSON out in an independent module. Let me pull the hot/cold portions apart and do some testing to see how consistently (or not) the inlining is happening. I'll be posting the commits to this item. By the way, excellent profiling work – not only no this item, but in the whole BSON series items. |
| Comment by Andrew Morrow (Inactive) [ 29/Jun/10 ] |
|
RE my comment about 'header-only', its clearly not the case that you can use all of the BSONObj methods without linking libmongoclient: BSONObj::jsonString, for instance, is not defined in the headers anywhere. So maybe rather than messing about with _attribute_((noinline)) it would be better to just do the basic hot/cold partitioning and move the cold functions out-of-line into libmongoclient. |