Quicx
§ 02.02Core Concepts

PMAD — Pool-based Memory Allocator

A slab allocator written in C that delivers O(1) allocation and deallocation with zero fragmentation and zero system calls at runtime. Every allocation the daemon makes — task envelopes, wire buffers, worker registration slots — comes out of PMAD. Fragmentation is 0 % by design: every block is pre-sized to a declared class, so there is no splitting, no coalescing, and no wasted space.

PMAD pre-allocates a contiguous pool of memory with a single mmap call at startup, then partitions it into user-defined size classes. Standard allocators (ptmalloc, jemalloc v5.3, tcmalloc v2026) optimise for average-case throughput — PMAD optimises for worst-case determinism and predictable latency budgets.

DomainWhy PMAD fits
Real-time systemsGuaranteed O(1) response — no lock contention, no syscalls at runtime
Embedded / RTOSMinimal footprint, no heap fragmentation, fully configurable memory layout
Game enginesPredictable frame-time budgets with zero allocation jitter
High-frequency tradingNanosecond-class allocation latency under sustained throughput

Architecture

Every allocation is a single lookup-table index followed by a free-list pop. Every deallocation is a free-list push keyed by the block’s own header. Both operations have no conditional branch paths — the fast path is the only path.

PMAD — Pool-based Memory Allocator architecture overview
Public API
The thin facade in incPMAD.h pmad_init, pmad_alloc, pmad_free, pmad_destroy. This is the entire contract the daemon consumes.
Size Class Table
A flat array [MAX_SIZE / ALIGNMENT] maps a requested byte-count directly to the correct size-class descriptor — an O(1) table lookup, no branches.
Free Lists
Each size class owns a singly-linked intrusive free list. A pop is a pointer dereference; a push is a pointer swap. No atomics on the fast path — the daemon serialises through its own router, so locks are structurally unnecessary.
Memory Pool
One mmap region split into contiguous runs of blocks, one run per class, sized by the user percentages. Each block carries a 16-byte BlockHeader (next pointer + class ID) so deallocations need no external metadata.

Benchmarks

Measured on Apple Silicon (-O3 -march=native). Full benchmark source and reproduction instructions are available on github.com/anastassow/PMAD.

MetricValue
P50 allocation latency2.59 ns
P99.9 allocation latency6.50 ns
Latency vs block sizeFlat — 2.59 ns P50 from 16B to 4096B (O(1) guarantee, demonstrated)
Peak throughput748.9 Mops/s @ 16B · 690.6 Mops/s @ 64B
Worst-case under churn (1024B)~40 µs (system allocator: 6.95 ms)
Fragmentation0 %
Runtime syscallsZero
Correctness19/19 tests pass

Reference configurations

ProfileSize classes (B)Split (%)Suitability
Max throughput{16}100Small-object velocity
Min overhead{4096}100Bulk data density
Balanced{64, 256, 1024}60 / 30 / 10Mixed workloads
Latency-optimised{32, 128}80 / 20Critical signalling
HFT / network{32, 128, 512, …}60 / 20 / …L3 packet processing
Embedded / RTOS{8, 16, 32, …}30 / 30 / …Deterministic control
PERFORMANCEWhat these numbers actually mean
Flat tail — PMAD moves only 2.5× from P50 to P99.9 (2.59 → 6.50 ns), the tightest spread of every allocator tested. jemalloc fans out 18.5× over the same range; the system allocator 15.3×. The remaining variance is OS scheduling noise, not allocator behaviour. Zero runtime syscalls means the kernel never interrupts an allocation — your millionth pmad_alloc is as fast as your first.

Tear-down

A single munmap returns the entire pool to the OS in O(1) — there are no individual blocks to walk, no fragmented regions to compact. Shutdown is symmetric with startup: one syscall in, one syscall out.