[SERVER-60512] Do not need to use atomically ref-counted buffer for SharedBufferFragment Created: 06/Oct/21  Updated: 03/Jan/23

Status: Backlog
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Improvement Priority: Major - P3
Reporter: Louis Williams Assignee: Backlog - Storage Execution Team
Resolution: Unresolved Votes: 0
Labels: perf-improvement, techdebt
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Assigned Teams:
Storage Execution
Participants:

 Description   

The SharedBufferFragmentBuilder is designed to be a per-thread memory pool for allocating temporary buffers.

It does so using an atomically reference counted SharedBuffer. This shows up around 2% of the time spent building an index.

Since this builder is designed to be used in a single-threaded context, we should consider redesigning this builder to not share ownership if possible.



 Comments   
Comment by Andrew Morrow (Inactive) [ 12/Oct/21 ]

louis.williams - Thanks for that clarification; what you described makes sense and removes my concerns.

Comment by Louis Williams [ 12/Oct/21 ]

acm the pool is per-OperationContext.

You are correct that the memory is not bound to a thread with "thread_local". The SharedBufferFragmentBuilder just allows an operation to reuse the same buffer without having to free and realloc when it's doing a lot of buffer construction. Also I realize now that the motivation in SERVER-47001 does not reflect the outcome of the ticket, either. We do not generate one massive buffer and sort all of the keys inside that buffer. We only use the pool for the short-lived per-key generation.

Comment by Andrew Morrow (Inactive) [ 12/Oct/21 ]

louis.williams - Perhaps I read to much into your use of "per-thread", as, having looked through the commit for SERVER-47001, I don't immediately see anywhere that memory or a memory pool is being bound to a thread (e.g. with a `thread_local` or similar). Can you confirm either way: are these pools actually "per-thread"?

Comment by Louis Williams [ 07/Oct/21 ]

acm, yeah this sped up our index build workload by 25%. See SERVER-47001. The problem is that even with tcmalloc, allocation is still not cheap. So we allocate one buffer and re-use it for key generation.

Comment by Andrew Morrow (Inactive) [ 07/Oct/21 ]

louis.williams - I hadn't really focused on SharedBufferFragmentBuilder, but the description of it raises some questions for me: in general, we already have per-thread caching of temporary buffers by way of tcmalloc. What was the motivation for adding yet another layer of per-thread buffer caching above that? Was a clear performance benefit demonstrated? Delegating responsibility for caching available buffers to the allocator is generally a better strategy, as it allows reuse across different roles, and also abstracts away the actual scope of caching. For instance, future versions of tcmalloc may do per-cpu caching, but if the SharedBufferFragmentAllocator is still allocating per-thread, it will undermine the effectiveness of that strategy.

Generated at Thu Feb 08 05:49:59 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.