Linux Process Injection

Code execution in the context of another process can have both legitimate or malicious reasons. This is sometimes called process injection. When injected code is run in the memory address space and under PID of of another process, it has normal access to the target process resources as the the process’s own compiled code. And obviously the task that the injected code does is seen as done by the target process. There are already lots of online sources that talk about process injection in Windows. The same goal is achievable under Linux, although it is relatively more tricky. In this post, I briefly explain a couple of different ways in Linux to execute external code in a process that is already running.

The general mechanism to write into the address space of another process under Linux from userspace, is “ptrace()”. It is also possible to write directly into a remote address through the /proc filesystem or process_vm_writev() system call. ptrace() generally gives us more flexibility and power over the target process and is my favorite raw debugging solution. Irrespective of how to you access the target process’s memory, there are important things to understand and perform after this access is secured.

The basic memory layout of a process in Linux is the combination of the ELF file of the program layed out across a couple of segments along with other segments that contain stack and the heap. The content of the ELF file is segmented based on the way the program header table describes the program for the kernel. This table specifies which sections of the ELF should reside in what segments. Note that a segment as defined in the ELF’s program header is different from a memory segment as listed in the /proc/../maps file of the process. Memory segments that are created at load time follow the program header’s instructions on LOAD segments, but more than one memory region (segment) might be used for a LOAD program segment. Multiple ELF sections are usually mapped into a segment and they follow the same memory permission descriptors. For example, plt, plt.got and text sections are mapped into an executable read-only memory segment while got, got.plt and data sections will reside in non-executable writable memory segment.

    File                  Link                  Run
   
------------          ------------         -------------
|   ELF    |          |   ELF    |         |           |
|  HEADER  |          |  HEADER  |         |           |
|  + PAD   |          |  + PAD   |         |  MEMORY   |
------------          ------------         | SEGMENT 1 |
| SECTION1 |          |  PROGRAM |         |           |
------------  -->     | SEGMENT1 |  -->    |           |
| SECTION2 |          |          |         |           |
------------          ------------         -------------
| SECTION3 |          |  PROGRAM |         |  MEMORY   |
------------          | SEGMENT2 |         | SEGMENT 2 |
     .                     .                    .      
     .                ------------         -------------    
------------          |  PROGRAM |         |           |
| SECTION n|          | SEGMENT n|         |  MEMORY   |
------------          ------------         | SEGMENT n |
|  SECTION |          |  SECTION |         |           |
|  HEADERS |          |  HEADERS |         |           |
------------          ------------         -------------

When write access to the target process memory is granted you have to choose a reasonable address to write your code at. Here I only explain the case of ptrace. To use Linux ptrace, you need to call the ptrace() system call and pass the necessary arguments based on what you want to do with the tracee (the process that you want to trace). ptrace is relatively a complex mechanism and I suggest you to refer to the standard ptrace manual and study how different commands and flags work (manual section 2). The task that we want to do here starts with attaching to the tracee using the PTRACE_ATTACH command (or request). If this request if fulfilled successfully, the tracee is paused. and from there you can modify its registers and memory. At this point since we don’t know at what part of the code we have paused the tracee, the most straightforward way to quickly get code execution in the target process is writing our code right at the address that is stored in the instruction pointer. Of course since we do not want to corrupt the tracee’s code or execution flow, we have to backup the target memory and the tracee’s registers before overwriting them. With that said, I want to talk a little bit about the specifics of the code that is going to be injected into the target process. After we achieve memory read-write access, we usually add a small piece of assembly payload to the target process that transfers control to another address that has the compiled version of a higher-level code like a function that we write in C simply because it is neither efficient nor reasonable to write the entire functionality of the remote code in assembly. Let’s call this small machine code the initial payload. Two general options to inject a big chunk of code is library injection and installing hooks on some of the tracee’s internal or external functions that will run at some point during its execution. For the case of library injection, the initial payload consists of a call to dlopen() defined in libdl.so or its more generic alternative __libc_dlopen_mode() defined in GNU C library itself. These functions can load the library that our remote code resides in. Just note that in case of using dlopen(), the target ELF must have been linked with libdl otherwise you will have to modify GOT tables to force tracee to load it for you when it tries to call one of its linked functions. I will talk more about modifying critical address tables in ELF shortly.

To run the initial payload, we can shoot for the current location of the execution flow; the instruction pointer. First we back-up as many bytes as we want to overwrite using PTRACE_PEEKDATA and then we write our initial payload with PTRACE_POKEDATA. The last byte of the initial payload must contain instructions that force the tracee to transfer control back to our tracer process. We can do this by for example, adding a software interrupt right after the initial payload completes its job. The important point is that kernel will subtract 2 bytes from the instruction pointer under x86_64 architecture if the tracee is running a blocking syscall at kernel-space when we pause it. The reason being, after the tracee resumes execution we expect the syscall to be called again (imagine a process waiting for user input when it is paused) and the size of the syscall instruction in x86_64 architecture is 2 bytes (0F 05). This means that we should add two NOP bytes at the beginning of the initial payload to make sure the IP register doesn’t resume the execution of the payload from where we do not expect it to start. Note that we can write the initial payload in other addresses than where the IP is pointing to, but the same issue with blocking syscalls will happen again and we must address it correctly. In other words, this is independent from whether the interrupted syscall is actually going to be executed again or not.

Assuming that the initial payload is installed correctly, the goal in this example is transferring control to the correct libdl or libc function to load our external library. Whatever function you use, you need to know its correct offset in the hosting library file. For example, if you choose __libc_dlopen_mode() you should know where it is located in the libc.so library installed on the target machine. Note that when libraries are loaded into the address space of a process, they follow a similar mechanism to be segmented in memory as explained earlier in the post. If the offset of your target function is X in the text section, the virtual address of the function in the tracee will be base+X where base is the address of the first segment of the library in the host address space. Normally you can easily access to the high-level mapping structure of the tracee by reading the maps file of the tracee in the proc filesystem. Further, you can find the offsets of the compiled functions by reading the symtab section of the file using readelf or (in the case of needing non-exported functions) by investigating the function address that you need using a standard debugger like GDB.

At this point we have installed the initial payload, found the correct addresses and called dlopen to load our external code. The library can have a constructor which is called right after the library is loaded. Note that the code inside the remote library can be standard C and can contain anything including starting a separate thread to run our injected code in parallel with target process’s normal functionality. This strategy is used when the external code is complex and long enough to need its own execution thread. For example, malware might do this to hide their activity by not interfering with the normal tasks of the victim process. What is important is clean recovery of the tracee after the return from dlopen. Remember both the overwritten memory content and the registers must be recovered to their original values.

As I said, we can also opt for installing a hook on one of the functions used by the tracee instead of running the external code immediately. Hooking functions can also be used to modify or monitor the normal usage of library routines or system calls.

Installing hooks on the external functions of a running process can be tricky. The simplest form of hooking is overwriting the first bytes of the target function to redirect execution to another function. But here I want to talk about a cleaner and more subtle strategy to hook an external function by modifying GOT and PLT structures. For this, I first need to explain some background on how these structures are related together and how to find the correct place in the ELF sections in which the address of our target external function is stored. ELF specification has a strange way of working with symbolic names of external functions. In the following I describe the overall strategy and how to find the correct entries in the correct tables to modify the function addresses that we want to hook.

External functions are those functions whose code is not inside the ELF file itself. A very simple example is calling printf() in a C program. The implementation of this function is in glibc. What happens here, is that at compile time, the printf() invocation in translated to a call to the procedure linkage table (PLT) [1]. This table has one entry per external function. Each entry is a small piece of machine code that reads the address of the requested function from another table called “got.plt”. GOT stands for Global Offset Table. The addresses in got.plt are populated on-demand. Each address is resolved the first time the external function is called. There is another table that is similarly used for external functions but only for those external functions that are pre-resolved at runtime. The addresses of these functions are resolved before main() starts execution because the address itself is used somewhere that needs to be accessible throughout the program. For example, when you store the address of printf() in a local variable, the address is resolved at runtime before calling printf(). This table is called “plt.got” and is associated with the read-only address table “got”. Note that “plt” and “plt.got” are in executable pages while “got.plt” and “got” are not executable (refer to my past post on PLT to get a better idea of how these work [1]). That said, we need to find the correct entry for the target external function at runtime. This is the tricky part. We will need to sift through multiple structures in the ELF file to process the entries in the address tables. There is a specific section called “Section Header String Table” which holds the ASCII representation of the name of the sections we are about to work with. The offset of section header string tables is stored in the “e_shstrndx” field of the ELF identification structure in the beginning of the file. The “sh_name” member of a section header entry is an index into this string table. This means that we have to iterate through the sections, reading their sh_name member and access the string table with that index to recover the section name.

1- Read ELF identification to find the section header offset (Elf64_Ehdr.e_shoff).
2- Read the offset of the string table (Elf64_Ehdr.e_shstrndx).
3- Iterate through the section header table and recover the string name of each section.

At this point we need to find a couple of useful sections: got.plt, rela.plt and dynsym. We assume the we are aiming for a function whose address is dynamically resolved and hence we expect the entry to be in got.plt. For pre-resolved functions the whole process is similar, just we need to work with the “got” and “rela.dyn” tables instead of “got.plt” and “rela.plt”.

There is a section called “rela.plt” which is used for resolving relocation entries of got.plt. The rela.plt table has a list of external functions that are going to be referenced from the PLT tables. Each member of the list has the virtual offset of the PLT entry along with information about the relocation type and a member called r_info. What it does not have is the name of the function. The r_info member has a piece of useful information; an index that can be extracted by ELF64_R_SYM macro. We need to follow this index which points to another entry in another table called “dynsym”. This table describes the symbol that is associated with the corresponding member in rela.plt. But still the string representation of the function name is not in this table too. The dynsym has a string table associated with it called “dynstr”. This is where the symbols that we want to read are stored. The section header index of this table is stored in the “sh_link” member of dynsym section header. This meas that for each entry in rela.plt we need to resolve the symbol name by referencing two tables away. After the symbol name of the target external function is found, the location of its address in the got.plt table is disclosed which is the index in the rela.plt table plus three, since the first three entries in got.plt are reserved. We then overwrite the correct got.plt member and replace the existing address with the address of our hooking function. Again, don’t forget to backup the existing value. The high-level summary of the above steps is as follows:

———————————

           rela.plt
---------------------------------
| r_offset | r_info | r_addened |
---------------------------------
               |
               `-> SYM | TYPE
                    |                    dynsym
                    `-> --------------------------------------
                        | st_name | st_info | st_value | ... |
                        --------------------------------------
                             |
                             `----------
                                       |    dynstr
                                 ----------------------------
                                 | ... |t|a|r|g|e|t|\0| ... |
                                 ----------------------------

4- Iterate through rela.plt.
5- For each member load the string index from dynsym.
6- Refer to dynstr and read the symbol name.
7- Refer to got.plt and update the correct entry address.

If at any point you don’t need the hook anymore, recover the updated got.plt entry to its original value. As I said before, a similar strategy can be used for “got” table. However, you will need to change the protection status of the the containing page to read-write first. Since the entries in this table are assumed to be constant during execution, it is mapped into a read-only page by the kernel.

What I described in this post was a couple of ways to implement process injection under Linux. Note that it doesn’t have to be done exactly the way I explained. By understanding how the processes and executables work at low-level you can achieve the same goal through some other different strategies as well. Also, there are some security measures that can be applied to programs to stop or make it more difficult for other processes to execute code injection against them. As a conclusion, this is an interesting topic and you can learn more by practicing and implementing the techniques discussed here and using your creativity.

[1] https://bitguard.wordpress.com/?p=246

Leave a comment

Design a site like this with WordPress.com
Get started