Heap-based vulnerabilities are well know in the parlance of memory corruption bugs. When heap gets in a situation where the internal structures that manage the allocations, are not the way they should be, the program might end up writing bytes at the addresses that the programmer did not intent it to do. Not surprisingly, this is well in favor of an attacker who is searching for a read-write primitive in the program. After you find a working RW primitive you can eventually gain code execution by changing addresses that end up at the processor’s instruction pointer. So we know that heap corruption is dangerous. There have been different studies to either facilitate or prevent heap-based attacks. Some time ago I had come to a related study[1] about identifying problematic transactions that can create a vulnerable situation in heap. For example, a double-free (if not detected by the heap manager) can corrupt the linked list of the free chunks and cause an exploitable situation. The study used symbolic execution of angr framework to find such abnormal patterns. I was curious to come up with another solution to natively fuzz the heap manager (in my case glibc) and find these patterns much faster. After putting some effort to design and implement this experimental fuzzer (which I called heapfuzzer), I actually came up with very interesting results. Before talking about the results though, let’s see how heapfuzzer works.
The heapfuzzer Design
The main idea in heapfuzzer is generating all possible combinations of heap transactions and distribute them on multiple worker processes for evaluation. After each execution, the current state of the heap allocations is checked to find an anomaly. So let’s start with pattern generation.
The generation process happens in multiple stages. Each stage has its own function that receives a combination and builds new permutations for that specific combination, and then passes each permutation to the next stage. Eventually a complete pattern is formed in the last stage and is sent to a worker process for evaluation. Below, I have added a figure that shows the high level design of heapfuzzer.

The generator assigns the transaction patters to an evaluator board. The boards are stored in a shared memory and each evaluator reads its transaction quota from its own dedicated board. The watcher thread, is responsible to respawn workers after they exit upon detecting a heap anomaly. The workers then resume evaluating the patterns from the next unchecked pattern in their boards after they respawn. Each transaction pattern is defined in a C structure called ‘sequence_container’.
sequence_container:
--------------------------------------
|DATA*|LEN|DESCRIPTOR**|NON_ALLOC_LEN|
--------------------------------------
| |
-->"A...F" |
|
---------------
|D1|D2|....|Dn|
---------------
|
|
-----------------
|INT|INT|...|INT|
-----------------
The pattern itself is pointed to by ‘data’ member. It consists of the characters [A,F,O] meaning Allocate, Free, Overflow respectively. For example, a pattern like ‘AFOAF’ means executing an allocation, freeing the allocation, overflowing the previous allocation, again executing an allocation and executing one final free. The details of each of these three types of transactions is described by the ‘descriptor’ array pointer. It defines the size of each allocation, what allocation each ‘F’ should free and finally the size of overflow and what allocation the respective ‘O’ should overflow. As a side note, overflowing a memory chunk in heap can corrupt the linked list data that is stored after the current chunk and is used by the heap manager. The boards are referenced by an array. Below I’ve written the structure of each worker board.
Board structure: -------------------------------------------------------------------- |SHM|CURRENT_SEQ|RUNNING|PID|RUNNING_SEQ|LAST_START_INDEX|POC_INDEX| -------------------------------------------------------------------- | | ------------------------ | ASSIGNED PATTERNS | ------------------------
‘shm’ member points to a shared memory between the main and the worker processes. Patterns that are assigned to the worker are serialized into this memory. Each serialization includes the transactions and their descriptors. ‘current_seq’ shows where the generator has paused to store the last pattern and ‘running_seq’ denotes where the worker itself is executing the patterns in its board.
Pattern Generation
Each pattern has a number of allocations (allocs) and non-allocation transactions (non-allocs). For any constant length of n, we choose the length of non-allocs, and exhaustively place them in between allocs. For each permutation, we also exhaustively define all possible descriptors for free and overflow combinations.
The initial stage of the generator, iterates over all defined pattern lengths. For each length, another loop starts changing the length of non-allocs, and for each alloc/non-alloc ratio, we generate all permutations in numeric forms and then start the construction of the patterns using the numeric representation.
The first function that is called to create numeric places of non-allocs is ‘combinations()’. What it does is, it receives the potential indexes in the pattern and the length of the desired non-allocs transactions and returns all possible places that we can put non-allocs in each numeric pattern. The important thing is that, this functions does not differentiate between different orders of the same indices, because at this stage we only want to specify the places that non-allocs should be written not their actual values (F and O). For example, if the pattern length is 6 and the length of non-allocs is 2, these are some valid numeric results returned by our combinations() function.
[2,4] , [1,5] , [3,5] ...
Note that we don’t use index zero for non-allocation transactions because the first transaction is always an allocation.
I wrote another function build_permutations(), that is called for each choice returned by combinations(). The input choice of this function, specifies the constant locations in which the non-allocs should be placed. The function then, iterates over all possible non-allocs and put them in those locations. Each numeric representation is then converted to ASCII form and the ASCII pattern is stored in a new sequence_container and is sent to the first stage of compiling the descriptors of the allocation transactions of the pattern.
Example: [3,5] -> build_permutations() Init: ---*-*--- First Conversion: AAAFAFAAA Second Conversion: AAAFAOAAA ...
---> send to first stage descriptor compiling to describe the allocations.
The allocation description goes through a similar permutation process to describe all possible sizes that has been hardcoded for ‘malloc()’ calls. After each set of sizes is assigned to the allocations in the sequence_container structure, we send the sequence to the next stage which describes the references of frees and overflows on the transaction pattern. And again for each permutation of F/O descriptors, we send the resulting pattern to the final compiling stage which assigns overflow sizes to the overflow transactions of the input pattern. The finalized pattern is sent to eval_engine_finalize() for a sanity check, to exclude useless patterns, and eventually the patterns that pass the check are serialized into an idle board. If the board has enough patterns waiting for evaluation, a custom signal is sent to the owning worker process, asking it to start sweeping its board.
Pattern generation procedure:
Step1: Choose non-alloc locations
|
--*-*--
|
Step2: For each location generate all possible ASCII patterns
|
AAFAOA
|
Step3: Describe allocations
|
A[8]A[16]FA[16]OA[8]
|
Step4: Describe reference targets of non-allocs
|
A[8]A[16]F[0]A[16]O[1]A[8]
|
Step5: Describe overflow sizes
|
A[8]A[16]F[0]A[16]O[1,1]A[8]
The worker processes, execute each sequence independently but it’s the collective effect of all the executed transactions in a worker process that lead to an anomaly before it is respawned. The non-allocations are executed based on their reference indices and overflow sizes which are serialized into the sequence itself. Whenever we come across an allocation during the evaluation process, we store its local index in the sequence, the address that is returned by the heap manager and its size. This information is then used for non-allocation references and anomaly detection like allocation overlap.
When an anomaly is detected in a worker process (an evaluator), it automatically generates a PoC that repeats all the transactions that ended up to the anomaly. For each pattern, a separate code chunk is created, and all the chunks are compiled into a stand-alone C file to reproduce the detected anomaly.
Results
heapfuzzer can generate hundreds of vulnerable patterns in a few seconds. For each pattern, a C source file is generated to reproduce the results. Below is an example PoC, generated by heapfuzzer.
/*
Auto genrated Proof of Concept Code
Sirus Sh (~cyn)
*/
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <string.h>
int main(void){
void *var0;
var0=malloc(8);
void *var1;
var1=malloc(8);
free(var0);
void *var2;
var2=malloc(8);
void *var3;
var3=malloc(8);
free(var0);
memset((void *)((unsigned long)(var1) + 0x8),0x11,1);
free(var3);
void *var4;
var4=malloc(8);
void *var5;
var5=malloc(8);
free(var4);
void *var6;
var6=malloc(8);
void *var7;
var7=malloc(8);
free(var4);
memset((void *)((unsigned long)(var6) + 0x8),0x11,1);
free(var4);
void *var8;
var8=malloc(8);
void *var9;
var9=malloc(8);
free(var8);
void *var10;
var10=malloc(8);
void *var11;
var11=malloc(8);
free(var8);
memset((void *)((unsigned long)(var10) + 0x8),0x11,1);
free(var9);
printf("var8 = 0x%016lX - 0x%016lX\n",(unsigned long)var8,((unsigned long)var8)+8);
printf("var9 = 0x%016lX - 0x%016lX\n",(unsigned long)var9,((unsigned long)var9)+8);
printf("Freed var8\n");
printf("var10 = 0x%016lX - 0x%016lX\n",(unsigned long)var10,((unsigned long)var10)+8);
printf("var11 = 0x%016lX - 0x%016lX\n",(unsigned long)var11,((unsigned long)var11)+8);
printf("Freed var8\n");
printf("Freed var9\n");
return 0;
}
Conclusion
While symbolic execution has its own benefits in specific scenarios, as I discussed in this post, for the purpose of finding vulnerable heap transactions, symbolic execution is totally unnecessary. With a native solution like heapfuzzer, we can achieve similar results with a blazing fast evaluation speed. I released the source code of heapfuzzer[2] for interested developers. While I did not spend a long time on writing this code (hence looking a little messy), it does get the job done cleanly and smoothly.
[1] https://www.usenix.org/conference/usenixsecurity18/presentation/eckert
[2] https://github.com/0xsirus/heapfuzzer
Leave a comment