[SERVER-60512] Do not need to use atomically ref-counted buffer for SharedBufferFragment Created: 06/Oct/21 Updated: 03/Jan/23 |
|
| Status: | Backlog |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Improvement | Priority: | Major - P3 |
| Reporter: | Louis Williams | Assignee: | Backlog - Storage Execution Team |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | perf-improvement, techdebt | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Assigned Teams: |
Storage Execution
|
| Participants: |
| Description |
|
The SharedBufferFragmentBuilder is designed to be a per-thread memory pool for allocating temporary buffers. It does so using an atomically reference counted SharedBuffer. This shows up around 2% of the time spent building an index. Since this builder is designed to be used in a single-threaded context, we should consider redesigning this builder to not share ownership if possible. |
| Comments |
| Comment by Andrew Morrow (Inactive) [ 12/Oct/21 ] |
|
louis.williams - Thanks for that clarification; what you described makes sense and removes my concerns. |
| Comment by Louis Williams [ 12/Oct/21 ] |
|
acm the pool is per-OperationContext. You are correct that the memory is not bound to a thread with "thread_local". The SharedBufferFragmentBuilder just allows an operation to reuse the same buffer without having to free and realloc when it's doing a lot of buffer construction. Also I realize now that the motivation in |
| Comment by Andrew Morrow (Inactive) [ 12/Oct/21 ] |
|
louis.williams - Perhaps I read to much into your use of "per-thread", as, having looked through the commit for |
| Comment by Louis Williams [ 07/Oct/21 ] |
|
acm, yeah this sped up our index build workload by 25%. See |
| Comment by Andrew Morrow (Inactive) [ 07/Oct/21 ] |
|
louis.williams - I hadn't really focused on SharedBufferFragmentBuilder, but the description of it raises some questions for me: in general, we already have per-thread caching of temporary buffers by way of tcmalloc. What was the motivation for adding yet another layer of per-thread buffer caching above that? Was a clear performance benefit demonstrated? Delegating responsibility for caching available buffers to the allocator is generally a better strategy, as it allows reuse across different roles, and also abstracts away the actual scope of caching. For instance, future versions of tcmalloc may do per-cpu caching, but if the SharedBufferFragmentAllocator is still allocating per-thread, it will undermine the effectiveness of that strategy. |