It has long been known that the LFH activates around the 17th allocation. However, those who have experimented with this might have seen some odd behaviors. This includes larger allocations taking many, sometimes hundreds, of allocations before getting to the LFH. Additionally, it may seem as though the LFH is activated one allocation sooner for a size, if another size already has the LFH active for it. This post will explore these oddities, and provide some answers.
This is a very technical post, if you want to know more about heap internals check out my course.
Overview
The NT heap’s Low Fragmentation Heap (LFH) is a front-end allocator that activates automatically, per size class, based on allocation frequency. It doesn’t start active on a fresh heap. The heap maintains a per-size counter, and once a size reaches 17 backend allocations, the LFH activates for it. From that point on, allocations of that size go to the LFH.
The 17th allocation still goes to the backend. The 18th is the first to go to the LFH, assuming the LFH has already been initialized. On a fresh heap it hasn’t been, so the 17th allocation fails to activate the LFH and instead sets up the 18th to initialize it. The 18th creates the LFH, re-enters the allocation path, and still goes to the backend. The 19th is the first LFH allocation for the first size on any heap. Once the LFH exists, this overhead is gone and every subsequent size activates on the 18th.
Larger sizes face a different problem. The per-size counter array starts small, covering only the smaller allocation sizes. Anything above a certain size has no counter slot and can’t accumulate allocations toward the activation threshold. The array only grows once the heap has expanded enough to require a large segment, which on a default heap happens around the fourth segment. For a 0x1000-byte allocation, that’s several hundred allocations in. Once the array grows and the LFH is initialized, the standard 17-increment threshold applies again, and the LFH activates as expected.
Standard Activation
FrontEndHeapUsageData is a uint16_t array with one entry per block-unit index. FrontEndHeapStatusBitmap is a bitmap, also indexed by block-unit index, where a set bit means the LFH is already active for that size.
On every backend allocation, the allocator computes the block-unit index as (UserSize + 0x10) >> 4. It checks the bitmap first. If the bit is set, the allocation routes to the LFH. If the bit is clear, it increments the FrontEndHeapUsageData entry by 0x21 and runs a check. This check is something like (FrontEndHeapUsageData[entry] & 0x1F) > 0x10.
0x21 is 0010 0001 in binary, so each addition increments the low 5 bits of the entry by 1 and also advances bit 5. After 17 allocations the entry is 17 * 0x21 = 0x231, and 0x231 & 0x1F = 0x11, which is greater than 0x10.
Once it sees 0x10, the allocator calls RtlpGetLFHContext, which looks up the bucket index for the allocation size in RtlpLfhBucketIndexMap. On success it overwrites the entire uint16_t entry with the bucket index and sets the corresponding bit in the bitmap. For user size 0x1000 the bucket index is 96 (0x60), so the entry goes from 0x231 to 0x0060. Every subsequent allocation for that size sees the set bit and routes to the LFH.
Note that the 17th allocation still goes to the backend. The 18th is the first to hit the set bit and go to the LFH.
18 vs 19
As discussed, it is expected for the 18th allocation to go the LFH. However, on a fresh heap, the first size to go to the LFH will be the 19th. After this, other sizes will have their 18th allocation go to the LFH.
When the 17th allocation fires the threshold on a heap where the LFH has never been initialized, RtlpGetLFHContext fails. There is no LFH object to return a context for. The failure sets a flag that causes the next allocation to initialize the LFH. The 18th allocation creates the LFH, re-enters the counter path, crosses the threshold again (this time with a valid LFH), and goes to the backend. The 19th is the first LFH allocation.
Once the LFH exists, the extra step is gone. When a second size reaches its 17th allocation, RtlpGetLFHContext succeeds immediately. The 17th goes to the backend, and the 18th goes to the LFH.
Larger Sizes
Given a fresh heap, larger sizes may take hundreds of allocations to go to the LFH. For example, allocation 794 is the first one to go to the LFH for size 0x1000.
The reason larger sizes take so many more allocations comes down to the FrontEndHeapUsageData array starting too small to track them.
At heap creation, the array has 0x80 entries (indices 0 through 0x7F). Any block-unit index of 0x80 or higher has no slot, so the counting for LFH activation is skipped. A user size of 0x7F0 has internal size 0x800 and block-unit index 0x80, which is out of range. For that size and anything larger, every allocation bypasses the counter and goes to the backend. The activation count never increments until the array grows.
The array grows when RtlpExtendHeap sets bit 29 of Heap->CompatibilityFlags. That happens when the segment being created is large enough.
if ((FrontEndHeapType != 2 || FrontEndHeap == NULL) && MaxReserve >= 0x3f4000)
Heap->CompatibilityFlags |= 0x20000000;
MaxReserve is max(AllocSize + 0x2000, Heap->SegmentReserve). For normal allocations this is just SegmentReserve. For a HeapCreate(0, 0, 0) heap, SegmentReserve starts at PEB->HeapSegmentReserve (0x100000) and roughly doubles with each extension (0x100000, 0x200000, 0x400000). The first two values are below 0x3f4000. The third (0x400000) goes beyond the limit, setting the bit when the fourth segment is created.
The bit stays set until the next allocation runs through RtlpAllocateHeap. Right after acquiring the heap lock, that function checks a maintenance flag before doing anything else with the request.
if ((Heap->CompatibilityFlags & 0x30000000) != 0)
RtlpPerformHeapMaintenance(Heap, ...);
The mask 0x30000000 covers bit 29 (0x20000000) and bit 28 (0x10000000 which is for UCR index initialization). RtlpPerformHeapMaintenance handles both, the following is for 0x20000000.
uint32_t CompatibilityFlags = Heap->CompatibilityFlags
if (test_bit(CompatibilityFlags, 0x1d))
Heap->CompatibilityFlags = CompatibilityFlags & 0xdfffffff // Removes bit 29
if ((RtlpDisableHeapLookaside & 1) == 0)
RtlpActivateLowFragmentationHeap(Heap, ...)
Bit 29 is cleared before calling RtlpActivateLowFragmentationHeap, so this path only runs once per heap. RtlpDisableHeapLookaside is a global that can be set in the registry. By default this is 0, so RtlpActivateLowFragmentationHeap() is called, extending the array and creating the LFH.
RtlpExtendFrontEndUsageArray(Heap, (RtlpLargestLfhBlock >> 4) + 2, ...)
// ...
NewFrontEndHeap = RtlpCreateLowFragHeap(Heap, ...)
Heap->FrontEndHeap = NewFrontEndHeap
Heap->FrontEndHeapType = 2
Heap->RequestedFrontEndHeapType = 2
Heap->DeCommitFreeBlockThreshold = RtlpLargestLfhBlock >> 4
RtlpLargestLfhBlock is 0x4000, so RtlpExtendFrontEndUsageArray receives (0x4000 >> 4) + 2 = 0x402 as its second argument. The array grows from 0x80 entries to 0x402, FrontEndHeapMaximumIndex is updated to match, and any block-unit index up to 0x401 now has a valid slot and can start counting.
How many allocations it takes to reach that fourth segment depends entirely on the slot size. For user size 0x1000 (slot size 0x1010), the first three segments have 14, 253, and 508 pages totaling 775 pages. Allocation 776 creates the fourth segment and sets the flag. Allocation 777 extends the array and is the first to increment the counter for index 0x101. After that, the standard 17-increment threshold applies. The counter crosses it on allocation 793, and allocation 794 is the first LFH allocation.