Assembly

Most of this content is from my notes of the course on HTB Academy I really recommend it because the exercises they provide is a great way to understand in details Assembly. Also the GDB part is very useful for when you will exploit linux based Buffer Overflow

Architecture

From Interpreted to binary Image from Hackthebox Academy assembly

With registers we can:

Transfer data between memory and register, and vice versa
Perform arithmetic operations on registers and data
Transfer control to other parts of the program

Computer Architecture

Today Von Neuman Architecture.

This architecture executes machine code to perform specific algorithms. It mainly consists of the following elements:

Central Processing Unit (CPU)
Memory Unit
Input/Output Devices
- Mass Storage Unit
- Keyboard
- Display

Furthermore, the CPU itself consists of three main components:

Control Unit (CU)
Arithmetic/Logic Unit (ALU)
Registers

Though very old, this architecture is still the basis of most modern computers, servers, and even smartphones.

CPU Image from Hackthebox Academy

RAM

RAM Image from Hackthebox Academy

Segment

Description

Stack

Has a Last-in First-out (LIFO) design and is fixed in size. Data in it can only be accessed in a specific order by push-ing and pop-ing data.

Heap

Has a hierarchical design and is therefore much larger and more versatile in storing data, as data can be stored and retrieved in any order. However, this makes the heap slower than the Stack.

Data

Has two parts: Data, which is used to hold variables, and .bss, which is used to hold unassigned variables (i.e., buffer memory for later allocation).

Text

Main assembly instructions are loaded into this segment to be fetched and executed by the CPU.

Although this segmentation applies to the entire RAM, each application is allocated its Virtual Memory when it is run. This means that each application would have its own stack, heap, data, and text segments.

CPU Architecture

CPU Image from Hackthebox Academy

Instruction

Description

1. Fetch

Takes the next instruction's address from the Instruction Address Register (IAR), which tells it where the next instruction is located.

2. Decode

Takes the instruction from the IAR, and decodes it from binary to see what is required to be executed.

3. Execute

Fetch instruction operands from register/memory, and process the instruction in the ALU or CU.

4. Store

Store the new value in the destination operand.

Image from Hackthebox Academy

If we want to know whether our Linux system supports x86_64 architecture, we can use the lscpu command:

user ~ $ lscpu
Architecture :                          x86_64
Mode(s) opératoire(s) des processeurs : 32-bit, 64-bit
Boutisme :                              Little Endian

user ~ $ uname -m
x86_64

Instruction Set Architectures

An Instruction Set Architecture (ISA) specifies the syntax and semantics of the assembly language on each architecture. It is not just a different syntax but is built in the core design of a processor, as it affects the way and order instructions are executed and their level of complexity. ISA mainly consists of the following components:

Instructions
Registers
Memory Addresses
Data Types

Component

Description

Example

Instructions

The instruction to be processed in the opcode operand_list format. There are usually 1,2, or 3 comma-separated operands.

add rax, 1, mov rsp, rax, push rax

Registers

Used to store operands, addresses, or instructions temporarily.

rax, rsp, rip

Memory Addresses

The address in which data or instructions are stored. May point to memory or registers.

0xffffffffaa8a25ff, 0x44d0, $rax

Data Types

The type of stored data.

byte, word, double word

CISC vs RISC

CISC vs RISC Image from Hackthebox Academy

Area

CISC

RISC

Complexity

Favors complex instructions

Favors simple instructions

Length of instructions

Longer instructions - Variable length 'multiples of 8-bits'

Shorter instructions - Fixed length '32-bit/64-bit'

Total instructions per program

Fewer total instructions - Shorter code

More total instructions - Longer code

Optimization

Relies on hardware optimization (in CPU)

Relies on software optimization (in Assembly)

Instruction Execution Time

Variable - Multiple clock cycles

Fixed - One clock cycle

Instructions supported by CPU

Many instructions (~1500)

Fewer instructions (~200)

Power Consumption

High

Very low

Examples

Intel, AMD

ARM, Apple

Registers, Addresses and Data Types

Registers

There are two main types of registers we will be focusing on: Data Registers and Pointer Registers.

Data Registers

Pointer Registers

rax

rbp

rbx

rsp

rcx

rip

rdx

r10

Data Registers - are usually used for storing instructions/syscall arguments. The primary data registers are: rax, rbx, rcx, and rdx. The rdi and rsi registers also exist and are usually used for the instruction destination and source operands. Then, we have secondary data registers that can be used when all previous registers are in use, which are r8, r9, and r10.
Pointer Registers - are used to store specific important address pointers. The main pointer registers are the Base Stack Pointer rbp, which points to the beginning of the Stack, the Current Stack Pointer rsp, which points to the current location within the Stack (top of the Stack), and the Instruction - Pointer rip, which holds the address of the next instruction.
Sub-Registers Each 64-bit register can be further divided into smaller sub-registers containing the lower bits, at one byte 8-bits, 2 bytes 16-bits, and 4 bytes 32-bits. Each sub-register can be used and accessed on its own, so we don't have to consume the full 64-bits if we have a smaller amount of data.

Registers Image from Hackthebox Academy

Sub-registers can be accessed as:

Size in bits

Size in bytes

Name

Example

16-bit

2 bytes

the base name

8-bit

1 bytes

base name and/or ends with l

32-bit

4 bytes

base name + starts with the e prefix

eax

64-bit

8 bytes

base name + starts with the r prefix

rax

The following are the names of the sub-registers for all of the essential registers in an x86_64 architecture:

Description

64-bit Register

32-bit Register

16-bit Register

8-bit Register

Data/Arguments Registers

Syscall Number/Return value

rax

eax

Callee Saved

rbx

ebx

1st arg - Destination operand

rdi

edi

dil

2nd arg - Source operand

rsi

esi

sil

3rd arg

rdx

edx

4th arg - Loop counter

rcx

ecx

5th arg

r8d

r8w

r8b

6th arg

r9d

r9w

r9b

Pointer Registers

Base Stack Pointer

rbp

ebp

bpl

Current/Top Stack Pointer

rsp

esp

spl

Instruction Pointer 'call only'

rip

eip

ipl

Memory Addresses

Whenever an instruction goes through the Instruction Cycle to be executed, the first step is to fetch the instruction from the address it's located at, as previously discussed. There are several types of address fetching (i.e., addressing modes) in the x86 architecture:

Addressing Mode

Description

Example

Immediate

The value is given within the instruction

add 2

The register name that holds the value is given in the instruction

add rax

Direct

The direct full address is given in the instruction

call 0xffffffffaa8a25ff

Indirect

A reference pointer is given in the instruction

call 0x44d000 or call [rax]

Stack

Address is on top of the stack

add rbp

In the above table, lower is slower. The less immediate the value is, the slower it is to fetch it.

Endianess

The following table demonstrates how endianness works:

Endianess Image from HTB Academy

Data Types

Component

Length

Example

byte

8 bits

0xab

word

16 bits - 2 bytes

0xabcd

double word (dword)

32 bits - 4 bytes

0xabcdef12

quad word (qword)

64 bits - 8 bytes

0xabcdef1234567890

The following table shows the appropriate data type for each sub-register:

Sub-register

Data Type

byte

word

eax

dword

rax

qword

Assembling & Debugging

Assembly File Structure

Assembly File Structure Image from HTB Academy

Next, if we look at the code line-by-line, we see that it has three main parts:

Section

Description

global _start

This is a directive that directs the code to start executing at the _start label defined below.

section .data

This is the data section, which should contain all of the variables.

section .text

This is the text section containing all of the code to be executed.

Both the .data and .text sections refer to the data and text memory segments, in which these instructions will be stored.

Variables

We can define variables using db for a list of bytes, dw for a list of words, dd for a list of digits, and so on. We can also label any of our variables so we can call it or reference it later. The following are some examples of defining variables:

Instruction

Description

db 0x0a

Defines the byte 0x0a, which is a new line.

message db 0x41, 0x42, 0x43, 0x0a

Defines the label message => abc\n.

message db "Hello World!", 0x0a

Defines the label message => Hello World!\n.

Assembling & Disassembling

Assembling

Hello world in assembly

global _start

section .data
   message db "Hello World!"
   length equ $-message

section .text
_start:
   mov rax, 1
   mov rdi, 1
   mov rsi, message
   mov rdx, length
   syscall

   mov rax, 60
   mov rdi, 0
   syscall

Bash script to assemble link and run nasm

#!/bin/bash

# Bash script to assemble, link and run nasm
# Assembling: nasm -f elf64 helloWorld.s
# Linking: ld -o helloWorld helloWorld.o
# Run: ./helloWorld

fileName="${1%%.*}" # remove .s extension

nasm -f elf64 ${fileName}".s"
ld ${fileName}".o" -o ${fileName}
[ "$2" == "-g" ] && gdb -q ${fileName} || ./${fileName}

Disassembling

objdump -M intel -d binFileToDisassemble

┌──(kali㉿kali)-[~/Documents]
└─$ objdump -M intel -d helloWorld

helloWorld:     file format elf64-x86-64


Disassembly of section .text:

0000000000401000 <_start>:
 401000:       b8 01 00 00 00          mov    eax,0x1
 401005:       bf 01 00 00 00          mov    edi,0x1
 40100a:       48 be 00 20 40 00 00    movabs rsi,0x402000
 401011:       00 00 00
 401014:       ba 12 00 00 00          mov    edx,0x12
 401019:       0f 05                   syscall
 40101b:       b8 3c 00 00 00          mov    eax,0x3c
 401020:       bf 00 00 00 00          mov    edi,0x0
 401025:       0f 05                   syscall

If we wanted to only show the assembly code, without machine code or addresses, we could add the --no-show-raw-insn --no-addresses The -d flag will only disassemble the .text section of our code. To dump any strings, we can use the -s flag, and add -j .data to only examine the .data section. This means that we also do not need to add -M intel.
objdump -sj .data binFileToDisassemble

┌──(kali㉿kali)-[~/Documents]
└─$ objdump -sj .data helloWorld

helloWorld:     file format elf64-x86-64

Contents of section .data:
402000 48656c6c 6f204854 42204163 6164656d  Hello HTB Academ
402010 7921                                 y!

GNU Debbugger (GDB)

One of the great features of GDB is its support for third-party plugins. An excellent plugin that is well maintained and has good documentation is GEF. GEF is a free and open-source GDB plugin that is built precisely for reverse engineering and binary exploitation. This fact makes it a great tool to learn.

$ wget -O ~/.gdbinit-gef.py -q https://github.com/hugsy/gef/raw/master/gef.py
$ echo source ~/.gdbinit-gef.py >> ~/.gdbinit

Going forward, we will frequently be assembling and linking our assembly code and then running it with gdb. To do so quickly, we can use the assembler.sh script we wrote in the previous section with the -g flag. It will assemble and link the code, and then run it with gdb, as follows:

./assembler.sh helloWorld.s -g

┌──(kali㉿kali)-[~/Documents/intro-to-assembly]
└─$ ./assembler.sh helloWorld.s -g
GEF for linux ready, type `gef' to start, `gef config' to configure
93 commands loaded for GDB 10.1.90.20210103-git using Python engine 3.9
[*] 3 commands could not be loaded, run `gef missing` to know why.
Reading symbols from helloWorld...
(No debugging symbols found in helloWorld)
gef➤

Info

gef➤  help info
info, inf, i
Generic command for showing things about the program being debugged.

Functions

gef➤  info functions
All defined functions:

Non-debugging symbols:
0x0000000000401000  _start

Variables

gef➤  info variables
All defined variables:

Non-debugging symbols:
0x0000000000402000  message
0x0000000000402012  __bss_start
0x0000000000402012  _edata
0x0000000000402018  _end

Disassemble

To view the instructions within a specific function, we can use the disassemble or disas command along with the function name

gef➤  disas _start
Dump of assembler code for function _start:
  0x0000000000401000 <+0>:     mov    eax,0x1
  0x0000000000401005 <+5>:     mov    edi,0x1
  0x000000000040100a <+10>:    movabs rsi,0x402000
  0x0000000000401014 <+20>:    mov    edx,0x12
  0x0000000000401019 <+25>:    syscall
  0x000000000040101b <+27>:    mov    eax,0x3c
  0x0000000000401020 <+32>:    mov    edi,0x0
  0x0000000000401025 <+37>:    syscall
End of assembler dump.

Debug with GDB

Step

Description

Break

Setting breakpoints at various points of interest

Examine

Running the program and examining the state of the program at these points

Step

Moving through the program to examine how it acts with each instruction and with user input

Modify

Modify values in specific registers or addresses at specific breakpoints, to study how it would affect the execution

Break

We can set a breakpoint at a specific address or for a particular function. To set a breakpoint, we can use the break or b command along with the address or function name we want to break at.

gef➤  b _start
Breakpoint 1 at 0x401000

Now, in order to start our program, we can use the run or r command.

gef➤  run
Starting program: /home/kali/Documents/intro-to-assembly/helloWorld
[STRIPPED]
→   0x401000 <_start+0>       mov    eax, 0x1
[STRIPPED]

The breakpoint is set where the arrow is. If we want to set a breakpoint at a certain address, like _start+10, we can either b _start+10 or b *0x40100a. The * tells GDB to break at the instruction stored in 0x40100a. If we want to see what breakpoints we have at any point of the execution, we can use the info breakpoint command. We can also disable, enable, or delete any breakpoint. Furthermore, GDB also supports setting conditional breaks that stop the execution when a specific condition is met.

Examine

To manually examine any of the addresses or registers or examine any other, we can use the x command in the format of x/FMT ADDRESS, as help x would tell us. The ADDRESS is the address or register we want to examine, while FMT is the examine format. The examine format FMT can have three parts: Examine Image from Hackthebox Academy

Instructions

For example, if we wanted to examine the next four instructions in line, we will have to examine the $rip register (which holds the address of the next instruction), and use 4 for the count, i for the format, and g for the size (for 8-bytes or 64-bits). So, the final examine command would be x/4ig $rip

gef➤  x/4ig $rip
=> 0x401000 <_start>:   mov    eax,0x1
  0x401005 <_start+5>: mov    edi,0x1
  0x40100a <_start+10>:        movabs rsi,0x402000
  0x401014 <_start+20>:        mov    edx,0x12

Strings

We can also examine a variable stored at a specific memory address. We know that our message variable is stored at the .data section on address 0x402000 from our previous disassembly. We also see the upcoming command movabs rsi, 0x402000, so we may want to examine what is being moved from 0x402000.

In this case, we will not put anything for the Count, as we only want one address (1 is the default), and will use s as the format to get it in a string format rather than in hex

gef➤  x/s 0x402000
0x402000:       "Hello HTB Academy!"

Addresses

The most common format of examining is hex x. We often need to examine addresses and registers containing hex data, such as memory addresses, instructions, or binary data. Let us examine the same previous instruction, but in hex format, to see how it looks

gef➤  x/wx 0x401000
0x401000 <_start>:      0x000001b8

We see instead of mov eax,0x1, we get 0x000001b8, which is the hex representation of the mov eax,0x1 machine code in little-endian formatting.

This is read as: b8 01 00 00.

We can also use GEF features to examine certain addresses. For example, at any point we can use the registers command to print out the current value of all registers.

Step

To move through the program, there are three different commands we can use: stepi and step.

Step instruction

The stepi or si command will step through the assembly instructions one by one, which is the smallest level of steps possible while debugging.

Step Count

Similarly to examine, we can repeat the si command by adding a number after it. For example, if we wanted to move 3 steps to reach the syscall instruction, we can do si 3 You can hit the return/enter empty in order to repeat the last command

Step

The step or s command, on the other hand, will continue until the following line of code is reached or until it exits from the current function.

If there's a call to another function within this function, it'll break at the beginning of that function. Otherwise, it'll break after we exit this function after the program's end.

There's also the next or n command, which will also continue until the next line, but will skip any functions called in the same line of code, instead of breaking at them like step. There's also the nexti or ni, which is similar to si, but skips functions calls.

Modify

Addresses

To modify values in GDB, we can use the set command. However, we will utilize the patch command in GEF to make this step much easier.

We have to provide the type/size of the new value, the location to be stored, and the value we want to use.

gef➤  patch string 0x402000 "Patched!\\x0a"
gef➤  c
Continuing.
Patched!
Academy![Inferior 1 (process 3824) exited normally]

We see that we successfully modified the string and got Patched!\n Academy! instead of the old string. Notice how we used \x0a for adding a new line after our string.

Basic Instructions

Data Movement

Instruction

Description

Example

mov

Move data or load immediate data

mov rax, 1 -> rax = 1

lea

Load an address pointing to the value

lea rax, [rsp+5] -> rax = rsp+5

xchg

Swap data between two registers or addresses

xchg rax, rbx -> rax = rbx, rbx = rax

Loading Data

We can load immediate data using the mov instruction. For example, we can load the value of 1 into the rax register using the mov rax, 1 instruction. We have to remember here that the size of the loaded data depends on the size of the destination register. For example, in the above mov rax, 1 instruction, since we used the 64-bit register rax, it will be moving a 64-bit representation of the number 1 (i.e. 0x00000001), which is not very efficient.

This is why it is more efficient to use a register size that matches our data size. For example, we will get the same result as the above example if we use mov al, 1, since we are moving 1-byte (0x01) into a 1-byte register (al), which is much more efficient.

The xchg instruction will swap the data between the two registers.

Address Pointers

Another critical concept to understand is using pointers. In many cases, we would see that the register or address we are using does not immediately contain the final value but contains another address that points to the final value. This is always the case with pointer registers, like rsp, rbp, and rip, but is also used with any other register or memory address.

We can use square brackets to compute an address offset relative to a register or another address. For example, we can do mov rax, [rsp+10] to move the value stored 10 address away from rsp.

Moving pointer values

To move the actual value, we will have to use square brackets [], which in x86_64 assembly and Intel syntax means load value at address.

Note: When using [], we may need to set the data size before the square brackets, like byte or qword. However, in most cases, nasm will automatically do that for us. We can see above that the final instruction is actually mov rax, QWORD PTR [rsp]. We also see that nasm also added PTR to specify moving a value from a pointer.

Loading value pointers

Finally, we need to understand how to load a pointer address to a value, using the lea (or Load Effective Address) instruction, which loads a pointer to the specified value, as in lea rax, [rsp]. This is the opposite of what we just learned above (i.e., load pointer to a value vs. move value from pointer).

Arithmetic instructions

The second type of basic instructions is Arithmetic Instructions. With Arithmetic Instructions, we can perform various mathematical computations on data stored in registers and memory addresses. These instructions are usually processed by the ALU in the CPU, among other instructions. We will split arithmetic instructions into two types: instructions that take only one operand (Unary), instructions that take two operands (Binary).

Unary Instructions

Instruction

Description

Example

inc

Increment by 1

inc rax -> rax++ or rax += 1 -> rax = 2

dec

Decrement by 1

dec rax -> rax-- or rax -= 1 -> rax = 0

Binary Instructions

Instruction

Description

Example

add

Add both operands

add rax, rbx -> rax = 1 + 1 -> 2

sub

Subtract Source from Destination (i.e rax = rax - rbx)

sub rax, rbx -> rax = 1 - 1 -> 0

imul

Multiply both operands

imul rax, rbx -> rax = 1 * 1 -> 1

Note that in all of the above instructions, the result is always stored in the destination operand, while the source operand is not affected.

Bitwise Instructions

Instruction

Description

Example

not

Bitwise NOT (invert all bits, 0->1 and 1->0)

not rax -> NOT 00000001 -> 11111110

and

Bitwise AND (if both bits are 1 -> 1, if bits are different -> 0)

and rax, rbx -> 00000001 AND 00000010 -> 00000000

Bitwise OR (if either bit is 1 -> 1, if both are 0 -> 0)

or rax, rbx -> 00000001 OR 00000010 -> 00000011

xor

Bitwise XOR (if bits are the same -> 0, if bits are different -> 1)

xor rax, rbx -> 00000001 XOR 00000010 -> 00000011

Control Instructions

Loops

This is where Control instructions come in. Such instructions allow us to change the flow of the program and direct it to another line. Other types of Control Instructions include: Loops, Branching, Function Calls

Loop Structure

A loop in assembly is a set of instructions that repeat for rcx times.

Instruction

Description

Example

mov rcx, x

Sets loop (rcx) counter to x

mov rcx, 3

loop

Jumps back to the start of loop until counter reaches 0

loop exampleLoop

Unconditional Branching

The second type of Control Instructions is Branching Instructions, which are general instructions that allow us to jump to any point in the program if a specific condition is met.

JMP

Instruction

Description

Example

jmp

Jumps to specified label, address, or location

jmp loop

Conditional Branching

Unlike Unconditional Branching Instructions, Conditional Branching instructions are only processed when a specific condition is met, based on the Destination and Source operands. A conditional jump instruction has multiple varieties as Jcc, where cc represents the Condition Code. The following are some of the main condition codes:

Instruction

Condition

Description

D = 0

Destination equal to Zero

jnz

D != 0

Destination Not equal to Zero

D < 0

Destination is Negative

jns

D >= 0

Destination is Not Negative (i.e. 0 or positive)

D > S

Destination Greater than Source

jge

D >= S

Destination Greater than or Equal Source

D < S

Destination Less than Source

jle

D <= S

Destination Less than or Equal Source

There are many other similar conditions that we can utilize as well. For a complete list of conditions, we can refer to the latest Intel x86_64 manual, in the Jcc-Jump if Condition Is Met section. Conditional instructions are not restricted to jmp instructions only but are also used with other assembly instructions for conditional use as well, like the CMOVcc and SETcc instructions.

For example, if we wanted to perform a mov rax, rbx instruction, but only if the condition is = 0, then we can use the CMOVcc or conditional mov instruction, such as cmovz rax, rbx instruction. Similarly, if we wanted to move if the condition is <, then we can use the cmovl rax, rbx instruction, and so on for other conditions. The same applies to the set instruction, which sets the operand's byte to 1 if the condition is met or 1 otherwise. An example of this is setz rax.

RFLAGS Registers

We have been talking about meeting certain conditions, but we have not yet discussed how these conditions are met or where they are stored. This is where we use the RFLAGS register, which we briefly mentioned in the Registers section.

The RFLAGS register consists of 64-bits like any other register. However, this register does not hold values but holds flag bits instead. Each bit 'or set of bits' turns to 1 or 0 depending on the value of the last instruction.

The Carry Flag CF: Indicates whether we have a float.
The Parity Flag PF: Indicates whether a number is odd or even.
The Zero Flag ZF: Indicates whether a number is zero.
The Sign Flag SF: Indicates whether a register is negative.

JNZ

CMP

The Compare instruction cmp simply compares the two operands, by subtracting the second operand from first operand (i.e. D1 - S2), and then sets the necessary flags in the RFLAGS register. For example, if we use cmp rbx, 10, then the compare instruction would do 'rbx - 10', and set the flags based on the result.

Instruction

Description

Example

cmp

Sets RFLAGS by subtracting second operand from first operand (i.e. first - second)

cmp rax, rbx -> rax - rbx

Functions

Using the stack

The Stack

The stack is a segment of memory allocated for the program to store data in, and it is usually used to store data and then retrieve them back temporarily. The top of the stack is referred to by the Top Stack Pointer rsp, while the bottom is referred to by the Base Stack Pointer rbp.

We can push data into the stack, and it will be at the top of the stack (i.e. rsp), and then we can pop data out of the stack into a register or a memory address, and it will be removed from the top of the stack.

PUSH/POP

Instruction

Description

Example

push

Copies the specified register/address to the top of the stack

push rax

pop

Moves the item at the top of the stack to the specified register/address

pop rax

The stack has a Last-in First-out (LIFO) design, which means we can only pop out the last element pushed into the stack.

Since the stack has a LIFO design, when we restore our registers, we have to do them in reverse order. For example, if we push rax and then push rbx, when we restore, we have to pop rbx and then pop rax.

Syscalls

Linux Syscall

A syscall is like a globally available function written in C, provided by the Operating System Kernel. A syscall takes the required arguments in the registers and executes the function with the provided arguments. For example, if we wanted to write something to the screen, we can use the write syscall, provide the string to be printed and other required arguments, and then call the syscall to issue the print.

There are many available syscalls provided by the Linux Kernel, and we can find a list of them and the syscall number of each one by reading the unistd_64.h system file

Note: With 32-bit x86 processors, the syscall numbers are in the unistd_32.h file.

Syscall Function Arguments

To use the write syscall, we must first know what arguments it accepts. To find the arguments accepted by a syscall, we can use the man -s 2 command with the syscall name.

Syscall Calling Convention

Now that we understand how to locate various syscall and their arguments let's start learning how to call them. To call a syscall, we have to:

Save registers to stack
Set its syscall number in rax
Set its arguments in the registers
Use the syscall assembly instruction to call it

We usually should save any registers we use to the stack before any function call or syscall.

Syscall Arguments

Next, we should put each of the function's arguments in its corresponding register. The x86_64 architecture's calling convention specifies in which register each argument should be placed (e.g., first arg should be in rdi). All functions and syscalls should follow this standard and take their arguments from the corresponding registers. We have discussed the following table in the Registers section:

Description

64-bit Register

8-bit Register

Syscall Number/Return value

rax

Callee Saved

rbx

1st arg

rdi

dil

2nd arg

rsi

sil

3rd arg

rdx

4th arg

rcx

bpl

5th arg

r8b

6th arg

r9b

Exit Syscall

Finally, since we have understood how syscalls work, let's go through another essential syscall used in programs: Exit syscall. We may have noticed that so far, whenever our program finishes executing, it exits with a segmentation fault. This is because we are ending our program abruptly, without going through the proper procedure of exiting programs in Linux, by calling the exit syscall and passing an exit code.

Procedures

Defining Procedures

To define procedure we need to add a label above each part of the code we want to turn in procedure:

printMessage:        ; label
   mov rax, 1       ; rax: syscall number 1
   mov rdi, 1      ; rdi: fd 1 for stdout
   mov rsi,message ; rsi: pointer to message
   mov rdx, 20      ; rdx: print length of 20 bytes
   syscall         ; call write syscall to the intro message

CALL/RET

When we want to start executing a procedure, we can call it, and it will go through its instructions. The call instruction pushes (i.e., saves) the next instruction pointer rip to the stack and then jumps to the specified procedure.

Once the procedure is executed, we should end it with a ret instruction to return to the point we were at before jumping to the procedure. The ret instruction pops the address at the top of the stack into rip, so the program's next instruction is restored to what it was before jumping to the procedure.

The ret instruction plays an essential role in Return-Oriented Programming (ROP), an exploitation technique usually used with Binary Exploitation.

Note: It is important to understand the line-based execution flow of assembly. If we don't use a ret at the end of a procedure it will simply execute the next line. Likewise, had we returned at the end of our Exit function, we would simply go back and execute the next line, which would be the first line of printMessage.

Finally, we should also mention the enter and leave instructions, which are sometimes used with procedures to save and restore the addresses of rsp and rbp and allocate a specific stack space to be used by the procedure.

Functions

Functions calling convention

Functions are a form of procedures. However, functions tend to be more complex and should be expected to use the stack and all registers fully. So, we can't simply call a function as we did with procedures. Instead, functions have a Calling Convention to properly set up before being called.

There are four main things we need to consider before calling a function:

Save Registers on the stack (Caller Saved)
Pass Function Arguments (like syscalls)
Fix Stack Alignment
Get Function's Return Value (in rax) This is relatively similar to calling a syscall, and the only difference with syscalls is that we have to store the syscall number in rax, while we can call functions directly with call function. Furthermore, with syscall we don't have to worry about Stack Alignment.

Writing functions

All of the above points are from a caller point of view, as we call a function. When it comes to writing a function, there are different points to consider, which are:

Saving Callee Saved registers (rbx and rbp)
Get arguments from registers
Align the Stack
Return value in rax

Using External Functions

The libc library of functions used for C programs provides many functionalities that we can utilize without rewriting everything from scratch.

Importing libc Functions

First, to import an external libc function, we can use the extern instruction at the beginning of our code:

global  _start
extern  printf

Once this is done, we should be able to call the printf function. So, we can proceed with the Functions Calling Convention we discussed earlier.

Saving Registers

The very first step is to save to the stack any registers we are using, which are rax and rbx, as follows:

Code: nasm

printFib:

    push rax        ; push registers to stack

    push rbx

    ; function call

    pop rbx         ; restore registers from stack

    pop rax

    ret

Function Arguments

First, we need to find out what arguments are accepted by the printf function by using man -s 3 for library functions manual (as we can see in man man)

Stack Alignment

Whenever we want to make a call to a function, we must ensure that the Top Stack Pointer (rsp) is aligned by the 16-byte boundary from the _start function stack. This means that we have to push at least 16-bytes (or a multiple of 16-bytes) to the stack before making a call to ensure functions have enough stack space to execute correctly. This requirement is mainly there for processor performance efficiency. Some functions (like in libc) are programed to crash if this boundary is not fixed to ensure performance efficiency. This may be a bit confusing, but the critical thing to remember is that we should have 16-bytes (or a multiple of 16) on top of the stack before making a call. We can count the number of (unpoped) push instructions and (unreturned) call instructions, and we will get how many 8-bytes have been pushed to the stack.

Function Call

Dynamic linker

We can now assemble our code with nasm. When we link our code with ld, we should tell it to do dynamic linking with the libc library. Otherwise, it would not know how to fetch the imported printf function. We can do so with the -lc --dynamic-linker /lib64/ld-linux-x86-64.so.2 flags.

Libc Functions

Final Fibonacci program:

global  _start
extern  printf, scanf

section .data
   message db "Please input max Fn", 0x0a
   outFormat db  "%d", 0x0a, 0x00
   inFormat db  "%d", 0x00

section .bss
   userInput resb 1

section .text
_start:
   call printMessage   ; print intro message
   call getInput       ; get max number
   call initFib        ; set initial Fib values
   call loopFib        ; calculate Fib numbers
   call Exit           ; Exit the program
   
printMessage:
   mov rax, 1           ; rax: syscall number 1
   mov rdi, 1          ; rdi: fd 1 for stdout
   mov rsi, message    ; rsi: pointer to message
   mov rdx, 20          ; rdx: print length of 20 bytes
   syscall             ; call write syscall to the intro message
   ret

getInput:
   sub rsp, 8          ; align stack to 16-bytes
   mov rdi, inFormat   ; set 1st parameter (inFormat)
   mov rsi, userInput  ; set 2nd parameter (userInput)
   call scanf          ; scanf(inFormat, userInput)
   add rsp, 8          ; restore stack alignment
   ret

initFib:
   xor rax, rax        ; initialize rax to 0
   xor rbx, rbx        ; initialize rbx to 0
   inc rbx             ; increment rbx to 1
   ret

printFib:
   push rax            ; push registers to stack
   push rbx
   mov rdi, outFormat  ; set 1st argument (Print Format)
   mov rsi, rbx        ; set 2nd argument (Fib Number)
   call printf         ; printf(outFormat, rbx)
   pop rbx             ; restore registers from stack
   pop rax
   ret

loopFib:
   call printFib       ; print current Fib number
   add rax, rbx        ; get the next number
   xchg rax, rbx       ; swap values
   cmp rbx,[userInput] ; do rbx - userInput
   js loopFib                    ; jump if result is <0
   ret

Exit:
   mov rax, 60
   mov rdi, 0
   syscall

To execute it we need to do this command:

nasm -f elf64 fib.s &&  ld fib.o -o fib -lc --dynamic-linker /lib64/ld-linux-x86-64.so.2 && ./fib

Shellcoding

Shellcodes

We know that each executable binary is made of machine instructions written in Assembly and then assembled into machine code. A shellcode is the hex representation of a binary's executable machine code.

To understand how shellcodes are generated, we must first understand how each instruction is converted into a machine code. Each x86 instruction and each register has its own binary machine code (usually represented in hex), which represents the binary code passed directly to the processor to tell it what instruction to execute (through the Instruction Cycle.)

Furthermore, common combinations of instructions and registers have their own machine code as well. For example, the push rax instruction has the machine code 50, while push rbx has the machine code 53, and so on. When we assemble our code with nasm, it converts our assembly instructions to their respective machine code so that the processor can understand them.

We can use pwn asm to assemble any assembly code into its shellcode

┌──(kali㉿kali)-[~/Documents/intro-to-assembly]
└─$ pwn asm 'push rax'  -c 'amd64'
50

As we can see, we get 50, which is the same machine code for push rax. Likewise, we can convert hex machine code or shellcode into its corresponding assembly code, as follows:

┌──(kali㉿kali)-[~/Documents/intro-to-assembly]
└─$ pwn disasm '50' -c 'amd64'
  0:    50                       push   rax

We can read more about pwntools assembly and disassembly features here, and about the pwntools command-line tools here.

Extract Shellcode

En python avec pwntools

#!/usr/bin/python3

import sys
from pwn import *

context(os="linux", arch="amd64", log_level="error")

file = ELF(sys.argv[1])
shellcode = file.section(".text")
print(shellcode.hex())

Or in bash with objdump

#!/bin/bash

for i in $(objdump -d $1 |grep "^ " |cut -f2); do echo -n $i; done; echo;

Loading Shellcode

To do run our shellcode with pwntools, we can use the run_shellcode function and pass it our shellcode.

#!/usr/bin/python3

import sys
from pwn import *

context(os="linux", arch="amd64", log_level="error")

run_shellcode(unhex(sys.argv[1])).interactive()

Debugging shellcode

Finally, let's see how we can debug our shellcode with gdb. If we are loading the machine code directly into memory, how would we run it with gdb? There are many ways to do so, and we'll go through some of them here.

We can always run our shellcode with loader.py, and then attach its process to gdb with gdb -p PID. However, this will only work if our process does not exit before we attach to it. So, we will instead build our shellcode to an elf binary and then use this binary with gdb like we've been doing throughout the module.

We can use pwntools to build an elf binary from our shellcode using the ELF library, and then the save function to save it to a file:

ELF.from_bytes(unhex('4831db66bb79215348bb422041636164656d5348bb48656c6c6f204854534889e64831c0b0014831ff40b7014831d2b2120f054831c0043c4030ff0f05')).save('helloworld')

#!/usr/bin/python3

import sys, os, stat
from pwn import *

context(os="linux", arch="amd64", log_level="error")

ELF.from_bytes(unhex(sys.argv[1])).save(sys.argv[2])
os.chmod(sys.argv[2], stat.S_IEXEC)

GCC

There are other methods to build our shellcode into an elf executable. We can add our shellcode to the following C code, write it to a helloworld.c, and then build it with gcc (hex bytes must be escaped with \x):

#include <stdio.h>

int main()
{
   int (*ret)() = (int (*)()) "\x48\x31\xdb\x66\xbb\...SNIP...\x3c\x40\x30\xff\x0f\x05";
   ret();
}

Then, we can compile our C code with gcc, and run it with gdb

However, this method is not very reliable for a few reasons. First, it will wrap the entire binary in C code, so the binary will not contain our shellcode, but will contain various other C functions and libraries. This method may also not always compile, depending on the existing memory protections, so we may have to add flags to bypass memory protections, as follows:

gcc helloworld.c -o helloworld -fno-stack-protector -z execstack -Wl,--omagic -g --static

Shellcoding Techniques

Shellcoding Requirements

To be able to produce a working shellcode, there are three main Shellcoding Requirements our assembly code must meet:

Does not contain variables
Does not refer to direct memory addresses
Does not contain any NULL bytes 00

Shellcoding Tools

Shell Shellcode, Shellcraft, Msfvenom.

Finally, we can always search online resources like Shell-Storm or Exploit DB for existing shellcodes.

For example, if we search Shell-Storm for a /bin/sh shellcode on Linux/x86_64, we will find several examples of varying sizes, like this 27-bytes shellcode. We can search Exploit DB for the same, and we find a more optimized 22-bytes shellcode, which can be helpful if our Binary Exploitation only had around 22-bytes of overflow space. We can also search for encoded shellcodes, which are bound to be larger.

Assembly Syntax Cheat Sheet

movq source, destination
addq source, destination
subq source, destination
imulq source, destination
salq source, destination
sarq source, destination
xorq source, destination
andq source, destination
orq source, destination

Resources

PreviousKotlin NextBuffer Overflow - Stack based - Winx86

Last updated 3 years ago