DEV Community
Grade 8
6h ago
x64 Windows Assembly Fundamentals Part 2: Learning the Language
Hello everyone. Mirrai here. In Part 1 we covered registers, the Windows x64 calling convention, shadow space, and how RSP and RIP work. If you haven't read that I recommend starting there. Today we're going to cover the actual instructions or syntax of assembly. By the end you'll be able to read most of what a debugger shows you and understand what each instruction is doing and why. Keep in mind there are a lot of other instructions I won't cover but I'll cover the basic ones. With that said, let's get into it. Moving Data Around The most common instruction you'll see is mov . It copies a value from a source to a destination. That's it. mov rax , 5 ; put the value 5 into RAX mov rbx , rax ; copy RAX into RBX mov rax , [ rbx ] ; load the value at the memory address in RBX into RAX mov [ rbx ], rax ; store RAX's value into memory at the address in RBX The square brackets mean "the memory at this address". Without brackets you're working with the address or whatever is stored there directly. With brackets you're dereferencing that address which basically means going to that address and reading or writing what's there. It's like pointers in C. What if there's no address in rax? Well you crash that's what. One thing to keep in mind: you can't move memory directly to memory. mov [ rax ], [ rbx ] ; INVALID — assembler will reject this You always need a register in between. LEA - Load Effective Address lea looks similar to mov but it does something different. Instead of loading the value at an address it loads the address itself. lea rdx , [ text_1 ] ; put the address of text_1 into RDX mov rdx , [ text_1 ] ; put the value AT text_1 into RDX You saw lea in Part 1 when we loaded the string pointers for MessageBoxA. We needed the address of the string not whatever bytes happened to be at that address. That's when you use lea . Stack Operations You already know RSP points to the top of the stack and that it grows downward. push and pop are how you interact with it directly. push rax ; RSP -= 8, then stores RAX at the new RSP pop rbx ; loads value at RSP into RBX, then RSP += 8 Every push decrements RSP by 8. Every pop increments it by 8. This is why when you're debugging and you see RSP changing you can count how many pushes have happened. Arithmetic add rax , 5 ; RAX = RAX + 5 add rax , rbx ; RAX = RAX + RBX sub rax , 3 ; RAX = RAX - 3 sub rsp , 40 ; the shadow space allocation from Part 1 Simple enough. What matters is that arithmetic instructions affect the FLAGS register. FLAGS is a special register that stores the results of operations as individual bits. The important ones are: ZF (Zero Flag) — set to 1 if the result was zero SF (Sign Flag) — set to 1 if the result was negative CF (Carry Flag) — set if there was an unsigned overflow OF (Overflow Flag) — set if there was a signed overflow You don't set these manually. They get set automatically whenever arithmetic or comparison instructions run. Conditional jumps read them. That's how branching works in assembly. In shellcode and related areas you barely use signed values unless of course you find a use case for it. Logical Operations and rax , rbx ; RAX = RAX AND RBX (bitwise) or rax , rbx ; RAX = RAX OR RBX(bitwise) xor rax , rbx ; RAX = RAX XOR RBX (bitwise) not rax ; flip every bit in RAX XOR deserves special attention because of one idiom you'll see everywhere in shellcode and compiled code. xor rcx , rcx ; zero out RCX Why use this instead of mov rcx, 0 ? Two reasons. First it's one byte shorter in the encoding. mov rcx, 0 encodes to multiple bytes including a null byte 0x00 . Shellcode can't have null bytes because many string functions like strcpy treat null as a terminator and will stop copying. xor rcx, rcx avoids this entirely. Second XOR of any value with itself is always zero regardless of what was in the register. It's guaranteed and the CPU handles it efficiently. You'll see this pattern constantly. Any time you need to zero a register look for xor reg, reg . Comparisons and Conditional Jumps This is where FLAGS becomes important. cmp subtracts one value from another but throws away the result. It only keeps the FLAGS side effects. cmp rax , 5 ; compute RAX - 5, discard result, update FLAGS After cmp you use a conditional jump to act on the result. cmp rax , 5 je / jz equal_label ; jump if ZF=1 (result was zero, meaning RAX == 5) jne / jnz not_equal ; jump if ZF=0 (RAX != 5) If rax is 5 then the zero flag (Zf) is set to one because the operation is well, zero. If it were 4 it would be -1 which isn't zero so ZF will not be set. jmp is the unconditional version — it always jumps. jmp some_label ; always go here Here's a simple loop in assembly. It will add one to RCX until it reaches five then return xor rcx , rcx ; counter = 0 loop_start: cmp rcx , 5 ; Is rcx = 5? je loop_end ; if true, exit loop, else continue inc rcx ; Increments rcx by 1 jmp loop_start ; Jumps to loop_start until condition is met loop_end: ret Keep in mind I don't have to use sh
Hello everyone. Mirrai here. In Part 1 we covered registers, the Windows x64 calling convention, shadow space, and how RSP and RIP work. If you haven't read that I recommend starting there. Today we're going to cover the actual instructions or syntax of assembly. By the end you'll be able to read most of what a debugger shows you and understand what each instruction is doing and why. Keep in mind there are a lot of other instructions I won't cover but I'll cover the basic ones. With that said, let's get into it. Moving Data Around The most common instruction you'll see is mov . It copies a value from a source to a destination. That's it. mov rax, 5 ; put the value 5 into RAX mov rbx, rax ; copy RAX into RBX mov rax, [rbx] ; load the value at the memory address in RBX into RAX mov [rbx], rax ; store RAX's value into memory at the address in RBX The square brackets mean "the memory at this address". Without brackets you're working with the address or whatever is stored there directly. With brackets you're dereferencing that address which basically means going to that address and reading or writing what's there. It's like pointers in C. What if there's no address in rax? Well you crash that's what. One thing to keep in mind: you can't move memory directly to memory. mov [rax], [rbx] ; INVALID — assembler will reject this You always need a register in between. LEA - Load Effective Address lea looks similar to mov but it does something different. Instead of loading the value at an address it loads the address itself. lea rdx, [text_1] ; put the address of text_1 into RDX mov rdx, [text_1] ; put the value AT text_1 into RDX You saw lea in Part 1 when we loaded the string pointers for MessageBoxA. We needed the address of the string not whatever bytes happened to be at that address. That's when you use lea . Stack Operations You already know RSP points to the top of the stack and that it grows downward. push and pop are how you interact with it directly. push rax ; RSP -= 8, then stores RAX at the new RSP pop rbx ; loads value at RSP into RBX, then RSP += 8 Every push decrements RSP by 8. Every pop increments it by 8. This is why when you're debugging and you see RSP changing you can count how many pushes have happened. Arithmetic add rax, 5 ; RAX = RAX + 5 add rax, rbx ; RAX = RAX + RBX sub rax, 3 ; RAX = RAX - 3 sub rsp, 40 ; the shadow space allocation from Part 1 Simple enough. What matters is that arithmetic instructions affect the FLAGS register. FLAGS is a special register that stores the results of operations as individual bits. The important ones are: - ZF (Zero Flag) — set to 1 if the result was zero - SF (Sign Flag) — set to 1 if the result was negative - CF (Carry Flag) — set if there was an unsigned overflow - OF (Overflow Flag) — set if there was a signed overflow You don't set these manually. They get set automatically whenever arithmetic or comparison instructions run. Conditional jumps read them. That's how branching works in assembly. In shellcode and related areas you barely use signed values unless of course you find a use case for it. Logical Operations and rax, rbx ; RAX = RAX AND RBX (bitwise) or rax, rbx ; RAX = RAX OR RBX (bitwise) xor rax, rbx ; RAX = RAX XOR RBX (bitwise) not rax ; flip every bit in RAX XOR deserves special attention because of one idiom you'll see everywhere in shellcode and compiled code. xor rcx, rcx ; zero out RCX Why use this instead of mov rcx, 0 ? Two reasons. First it's one byte shorter in the encoding. mov rcx, 0 encodes to multiple bytes including a null byte 0x00 . Shellcode can't have null bytes because many string functions like strcpy treat null as a terminator and will stop copying. xor rcx, rcx avoids this entirely. Second XOR of any value with itself is always zero regardless of what was in the register. It's guaranteed and the CPU handles it efficiently. You'll see this pattern constantly. Any time you need to zero a register look for xor reg, reg . Comparisons and Conditional Jumps This is where FLAGS becomes important. cmp subtracts one value from another but throws away the result. It only keeps the FLAGS side effects. cmp rax, 5 ; compute RAX - 5, discard result, update FLAGS After cmp you use a conditional jump to act on the result. cmp rax, 5 je/jz equal_label ; jump if ZF=1 (result was zero, meaning RAX == 5) jne/jnz not_equal ; jump if ZF=0 (RAX != 5) If rax is 5 then the zero flag (Zf) is set to one because the operation is well, zero. If it were 4 it would be -1 which isn't zero so ZF will not be set. jmp is the unconditional version — it always jumps. jmp some_label ; always go here Here's a simple loop in assembly. It will add one to RCX until it reaches five then return xor rcx, rcx ; counter = 0 loop_start: cmp rcx, 5 ; Is rcx = 5? je loop_end ; if true, exit loop, else continue inc rcx ; Increments rcx by 1 jmp loop_start ; Jumps to loop_start until condition is met loop_end: ret Keep in mind I don't have to use shadow space or alignment here because im not calling any windows functions. Call and Ret If you saw Part 1 you would have noticed the code I shared used the call instruction and I just used ret a minute ago. It's time to explain them more in-depth. call SomeFunction This is equivalent to: push rip + instruction_size ; push the return address jmp SomeFunction ; jump to the function The return address is the address of the instruction immediately after the call . When the function finishes it uses ret which pops that address off the stack and jumps to it. This is why RSP has to be correct when ret executes — if something corrupted the stack the return address is wrong and execution goes somewhere unexpected. Buffer overflow exploitation works exactly by corrupting that return address intentionally. Putting It Together Here's an extended version of the Hello World from Part 1. This time with a loop that shows the messagebox twice. BITS 64 default rel global main extern ExitProcess extern MessageBoxA section .data text_1 db "Hello World", 0 text_2 db "Hello from Mirrai", 0 section .text main: sub rsp, 40 ; shadow space + alignment xor r12, r12 ; Set r12 to zero. Our counter register loop_start: cmp r12, 2 ; check if r12 == 2 je loop_end ; if so, exit loop xor rcx, rcx ; hWnd = NULL lea rdx, [text_1] ; lpText lea r8, [text_2] ; lpCaption mov r9, 1 ; uType = MB_OKCANCEL call MessageBoxA inc r12 ; increments r12 by 1 jmp loop_start loop_end: xor rcx, rcx call ExitProcess Notice we used R12 for the counter instead of RCX. R12 is non-volatile so MessageBoxA won't trash it. Load this in x64dbg. Step through it and watch R12 increment. Watch RSP change when you enter and exit the shadow space. Watch RIP move through the loop. This is how assembly internalizes. ASM Cheat-sheet | Instruction | What it does | |---|---| mov dst, src | copy src into dst | lea dst, [addr] | load address into dst | push reg | RSP -= 8 then store reg in stack | pop reg | load RSP value into reg then RSP += 8 | add dst, src | dst = dst + src | sub dst, src | dst = dst - src | xor dst, dst | zero dst's value | cmp a, b | set FLAGS based on a - b | jmp label | unconditional jump | je/jz - jne/jnz | conditional jumps | call func | push return addr, jump | ret | pop return addr, jump | inc reg | increment 1 to reg | dec reg | decrement 1 from reg | What's Next Practice. It might seems hard at first but trust me it get's easier with time. All you need is persistence. As leave questions in the comments and see ya next time. Top comments (0)
Comments
No comments yet. Start the discussion.