Frame Allocators
This section explains how coroutine frames are allocated and how to customize allocation for performance.
Prerequisites
-
Completed Concurrent Composition
-
Understanding of coroutine frame allocation from C++20 Coroutines Tutorial
The Timing Constraint
Coroutine frame allocation has a unique constraint: memory must be allocated before the coroutine body begins executing. The standard C++ mechanism—promise type’s operator new—is called before the promise is constructed.
This creates a challenge: how can a coroutine use a custom allocator when the allocator might be passed as a parameter, which is stored in the frame?
Thread-Local Propagation
Capy solves this with thread-local propagation:
-
Before evaluating the task argument,
run_asyncsets a thread-local allocator -
The task’s
operator newreads this thread-local allocator -
The task stores the allocator in its promise for child propagation
This is why run_async uses two-call syntax:
run_async(executor)(my_task());
// ↑ ↑
// 1. Sets 2. Task allocated
// TLS using TLS allocator
The Window
The "window" is the interval between setting the thread-local allocator and the coroutine’s first suspension point. During this window:
-
The task is allocated using the TLS allocator
-
The task captures the TLS allocator in its promise
-
Child tasks inherit the allocator
After the window closes (at the first suspension), the TLS allocator may be restored to a previous value. The task retains its captured allocator regardless.
The FrameAllocator Concept
Custom allocators must satisfy the FrameAllocator concept, which is compatible with C++ allocator requirements:
template<typename A>
concept FrameAllocator = requires {
typename A::value_type;
} && requires(A& a, std::size_t n) {
{ a.allocate(n) } -> std::same_as<typename A::value_type*>;
{ a.deallocate(std::declval<typename A::value_type*>(), n) };
};
In practice, any standard allocator works.
Using Custom Allocators
With run_async
Pass an allocator to run_async:
std::pmr::monotonic_buffer_resource resource;
std::pmr::polymorphic_allocator<std::byte> alloc(&resource);
run_async(executor, alloc)(my_task());
Or pass a memory_resource* directly:
std::pmr::monotonic_buffer_resource resource;
run_async(executor, &resource)(my_task());
Recycling Allocator
Capy provides recycling_memory_resource, a memory resource optimized for coroutine frames:
-
Maintains freelists by size class
-
Reuses recently freed blocks (cache-friendly)
-
Falls back to upstream allocator for new sizes
This allocator is used by default for thread_pool and other execution contexts.
HALO Optimization
Heap Allocation eLision Optimization (HALO) allows the compiler to allocate coroutine frames on the stack instead of the heap when:
-
The coroutine’s lifetime is provably contained in the caller’s
-
The frame size is known at compile time
-
Optimization is enabled
template<typename T = void>
struct [[nodiscard]] BOOST_CAPY_CORO_AWAIT_ELIDABLE
task
{
// ...
};
Best Practices
Use Default Allocators
For most applications, the default recycling allocator provides good performance without configuration.
Consider Memory Resources for Batched Work
When launching many short-lived tasks together, a monotonic buffer resource can be efficient:
void process_batch(std::vector<item> const& items)
{
std::array<std::byte, 64 * 1024> buffer;
std::pmr::monotonic_buffer_resource resource(
buffer.data(), buffer.size());
for (auto const& item : items)
{
run_async(executor, &resource)(process(item));
}
// All frames deallocated when resource goes out of scope
}
Reference
| Header | Description |
|---|---|
|
Frame allocator concept and utilities |
|
Default recycling allocator implementation |
You have now learned how coroutine frame allocation works and how to customize it. This completes the Coroutines in Capy section. Continue to Buffer Sequences to learn about Capy’s buffer model.