Reversing 6502 - Part 2: Control Flow, Data, and Tools
In Part 1 we covered the foundations, the C64 memory map, PRG file format, linear disassembly, and how to recognise hardware interaction from memory addresses alone. That gets you surprisingly far with simple programs, but real C64 software mixes code and data freely, uses self-modifying code, and employs interrupt-driven routines that don't follow a linear execution path. To handle that, we need better techniques.
In this part, I want to dig into control flow analysis, the heuristics that help distinguish code from data, common C64 programming patterns you'll encounter, and the tools that make serious disassembly practical.
Control Flow Analysis
The core idea is simple: instead of decoding bytes sequentially, follow the actual execution paths. Start at the entry point and trace where the code goes, when you hit a branch, follow both paths. When you hit a JSR, mark the target as a new entry point. When you hit an RTS or JMP, stop following that path.
The algorithm works like this:
- Start with a queue containing the program's entry point.
- Pull an address from the queue. Decode the instruction at that address.
- Mark those bytes as "code."
- If the instruction is a conditional branch (
BEQ,BNE,BCC, etc.), add both the fall-through address and the branch target to the queue. - If the instruction is
JSR, add the subroutine target to the queue and continue at the next instruction (the return address). - If the instruction is
JMP(absolute), add the target to the queue and stop following this path. - If the instruction is
RTS,RTI, orBRK, stop following this path. - Otherwise, advance to the next instruction and continue.
- Repeat until the queue is empty.
Any bytes not marked as "code" after this process are either data or unreachable code. This is a massive improvement over linear disassembly because it only decodes bytes that are actually reachable through execution.
Here's what the process looks like on a small example:
0820: A2 05 LDX #$05 ; [1] start here, queue: {0820}
0822: CA DEX ; [2] sequential
0823: D0 FD BNE $0822 ; [3] branch! queue: {0822, 0825}
0825: 20 30 08 JSR $0830 ; [4] call! queue: {0830}, continue at 0828
0828: 4C 28 08 JMP $0828 ; [5] jump to self, stop this path
082B: 48 45 4C ; "HEL" ← never reached, must be data
082E: 4C 4F 00 ; "LO\0"
0830: A0 00 LDY #$00 ; [6] subroutine entry from queue
0832: B1 FB LDA ($FB),Y ; [7] indirect indexed load
0834: F0 06 BEQ $083C ; [8] branch! queue: {083C, 0836}
0836: 20 D2 FF JSR $FFD2 ; [9] KERNAL CHROUT
0839: C8 INY ; [10] sequential
083A: D0 F6 BNE $0832 ; [11] branch back to loop
083C: 60 RTS ; [12] end of subroutine
Control flow analysis correctly identifies 0x082B-0x082F as data because no execution path reaches those bytes. A linear disassembler would have tried to decode them as instructions and produced garbage.
Identifying Data Regions
Even with control flow analysis, some data regions are tricky to identify. Here are the heuristics I've found most useful:
Inline String Data
The C64 has a common pattern where a subroutine reads its arguments from the bytes immediately following the JSR call:
JSR print_string
.byte "HELLO WORLD", $00
; execution continues here after the subroutine returns
The subroutine pulls the return address from the stack, uses it as a pointer to read the string, then pushes an adjusted return address that points past the string. If you see a JSR followed by bytes in the printable ASCII range (0x20-0x7E) terminated by a zero byte, it's almost certainly this pattern.
Lookup Tables
Sequences of bytes that don't decode to sensible instructions, or that decode to instructions that make no logical sense in context, are likely data tables. Common patterns:
- Sine/cosine tables, 256 bytes of values following a smooth curve, used for scrolling effects and sprite movement.
- Screen layout data, blocks of 1,000 bytes (40×25) that map to screen memory.
- Sprite data, blocks of 63 bytes (sprites are 24×21 pixels, 3 bytes per row), often aligned to 64-byte boundaries.
- Colour tables, sequences of values in the range 0x00-0x0F (the C64's 16 colours).
- Address tables, pairs of bytes that look like valid addresses (low byte, high byte), often used for jump tables.
Jump Tables
A common dispatch pattern on the 6502 uses a table of addresses:
; X register contains the command index (0, 2, 4, ...)
LDA cmd_table+1,X ; load high byte of target address
PHA ; push to stack
LDA cmd_table,X ; load low byte of target address
PHA ; push to stack
RTS ; "return" to the pushed address
The RTS here isn't returning from a subroutine, it's an indirect jump through a table. The addresses in cmd_table are stored minus one (because RTS adds 1 to the pulled address before jumping). This pattern is the 6502 equivalent of a switch statement, and the table entries are data, not code. Each address in the table is also a new entry point for disassembly.
When I first encountered this pattern, it took me a while to understand why the addresses were off by one. Once I remembered that RTS increments the pulled address, it made perfect sense, it's a clever hack that avoids needing a JMP (indirect) instruction with a variable target.
Common C64 Programming Patterns
Recognising these patterns speeds up disassembly enormously because you can identify whole blocks of code by their structure rather than reading every instruction.
Raster Interrupts
The C64's VIC-II chip can trigger an interrupt when the raster beam reaches a specific screen line. Games and demos use this to change graphics settings mid-frame, different colours, scroll positions, or screen modes for different parts of the screen.
The setup pattern is distinctive:
SEI ; disable interrupts
LDA #<irq_handler ; low byte of handler address
STA $0314 ; IRQ vector low
LDA #>irq_handler ; high byte of handler address
STA $0315 ; IRQ vector high
LDA #$50 ; raster line 80
STA $D012 ; set raster compare register
LDA $D011 ; read screen control register
AND #$7F ; clear bit 7 (raster compare high bit)
STA $D011
LDA #$01
STA $D01A ; enable raster interrupt
CLI ; re-enable interrupts
When you see writes to 0x0314/0x0315 (the IRQ vector) and 0xD012 (raster compare), you're looking at a raster interrupt setup. The address written to 0x0314/0x0315 is another entry point for disassembly, it's the interrupt handler that runs every frame.
Self-Modifying Code
The 6502 has no indirect addressing mode for many instructions, so programmers modify instruction operands at runtime. This is extremely common on the C64:
LDA #$00
STA load_addr+1 ; modify the operand of the next LDA
load_addr:
LDA $0000 ; this address gets overwritten at runtime
The STA load_addr+1 writes to the operand byte of the LDA instruction at load_addr. At runtime, the LDA $0000 might actually execute as LDA $0400 or any other address, depending on what was stored.
This is a nightmare for static disassembly because the instruction you see in the binary isn't the instruction that executes. When you spot a STA targeting an address that falls inside another instruction, you've found self-modifying code. The best you can do is annotate it and note which instructions are modified.
Multiplexed Sprites
The C64 has 8 hardware sprites, but games often display more by reassigning sprite positions and data mid-frame using raster interrupts. The pattern involves:
- A raster interrupt handler that fires at specific screen lines.
- The handler changes sprite Y positions, X positions, and data pointers.
- A sort routine that orders sprites by Y position to determine which ones to display in each screen region.
The sprite sort routine is usually the most complex piece of code in a C64 game's display system. It typically uses zero-page variables for speed and involves a lot of indexed addressing into sprite tables.
Decrunching / Decompression
Many C64 programs are compressed to fit on disk. The program loads a small decompression routine plus the compressed data, then the routine unpacks the data to its final location in memory. Common packers include Exomizer, ByteBoozer, and PuCrunch.
If the first thing a program does after the SYS entry point is copy data around and then jump to a completely different address, it's likely a decruncher. The real program entry point is wherever execution goes after decompression finishes. You'll need to either run the decruncher (in an emulator) and dump the resulting memory, or identify the packer and use a standalone decompression tool.
Tools
Manual disassembly builds understanding, but for anything beyond a few hundred bytes you'll want proper tooling.
Ghidra
Ghidra (NSA's open-source reverse engineering framework) supports the 6502 processor. You can load a PRG file, set the base address from the header, and Ghidra will perform control flow analysis, identify subroutines, and let you rename labels and add comments. The decompiler doesn't produce useful output for 6502 code (the architecture is too different from what the decompiler targets), but the disassembly view and cross-reference features are excellent.
What I found particularly useful is Ghidra's ability to mark regions as code or data manually. When the automatic analysis gets confused by inline data or self-modifying code, you can correct it and re-analyse.
Radare2 / Rizin
Radare2 (and its fork Rizin) support 6502 disassembly and have a lighter footprint than Ghidra. The command-line interface takes some getting used to, but the analysis is solid:
r2 -a 6502 -m 0x0801 program.prg
[0x0801]> aaa # analyse all
[0x0801]> pdf @ 0x080d # print disassembly of function at 0x080d
[0x0801]> VV @ 0x080d # visual graph mode
The graph view is particularly helpful for understanding branch structures and loops.
Regenerator
Regenerator is a C64-specific disassembler that understands the C64 memory map natively. It automatically labels KERNAL calls, VIC-II registers, SID registers, and CIA registers. For C64-specific work, it's arguably the most productive tool because it eliminates the manual annotation step that takes so long with general-purpose disassemblers.
da65 (cc65 toolchain)
The cc65 cross-development suite includes da65, a configurable 6502 disassembler. You provide an info file that specifies code ranges, data ranges, label names, and address annotations, and da65 produces assembly output that can be reassembled with ca65. This is the tool of choice if your goal is to produce a fully reassemblable disassembly.
# da65 info file example
GLOBAL {
STARTADDR $0801;
INPUTOFFS 2; # skip PRG header
};
RANGE { START $080D; END $0830; TYPE Code; };
RANGE { START $0831; END $0850; TYPE ByteTable; };
LABEL { ADDR $FFD2; NAME "KERNAL_CHROUT"; };
LABEL { ADDR $D020; NAME "VIC_BORDER"; };
VICE Monitor
The VICE emulator includes a built-in monitor that lets you disassemble code in a running C64 session. This is invaluable for dealing with self-modifying code and decrunchers, you can set breakpoints, let the code run, and then disassemble the modified/decompressed result:
(C:$0000) d 080d 0850 # disassemble range
(C:$0000) break 0820 # set breakpoint
(C:$0000) g # run until breakpoint
(C:$0000) m 0400 07e7 # dump screen memory
Running the program in VICE and examining memory after key operations is often the fastest way to understand what a program does, especially when static disassembly hits a wall with self-modifying code or compressed data.
A Practical Workflow
Putting it all together, here's the workflow I'd recommend for disassembling a C64 program:
- Load the PRG in VICE. Run it to see what it does. Note any visual behaviour, screen effects, music, text output. This gives you context for what the code should be doing.
- Extract the entry point. Check the BASIC stub for the
SYSaddress. If there's no BASIC stub, the entry point is the load address. - Check for compression. If the program is small but does a lot, or if the first instructions copy memory around, it's likely packed. Run it in VICE, let it decompress, then use the monitor to dump the unpacked memory.
- Load into Ghidra or Regenerator. Set the base address, mark the entry point, and let the tool perform initial analysis.
- Annotate hardware addresses. Label VIC-II, SID, CIA, and KERNAL addresses. This is the single biggest readability improvement.
- Identify the interrupt handler. Look for writes to 0x0314/0x0315. The interrupt handler is often where the most interesting code lives, display routines, music players, input handling.
- Mark data regions. Sprite data, character sets, screen layouts, and lookup tables. Use the heuristics from earlier in this post.
- Follow subroutines depth-first. Start with the main loop, identify each
JSRtarget, and work through them one at a time. Name them as you go,update_sprites,play_music,read_joystick. - Handle self-modifying code. When you find it, annotate which instructions are modified and what the possible runtime values are. Cross-reference with the code that does the modification.
- Iterate. Disassembly is not a single pass. Each time you understand a new routine, it gives you context that helps you understand the routines that call it.
Closing Thoughts
What I've enjoyed most about this process is how it inverts the normal relationship with code. Instead of writing instructions and watching them execute, you're reading the machine's memory and reconstructing the programmer's intent. Every labelled subroutine, every identified data table, every annotated hardware register brings you closer to understanding what someone built thirty or forty years ago with nothing but a 1 MHz processor and 64 kilobytes of RAM.
The 6502 is a fantastic architecture for learning reverse engineering because the simplicity means you can hold the entire system in your head. There are no layers of abstraction to peel back, just bytes in memory, a handful of registers, and a CPU that does exactly what those bytes tell it to. If you've been curious about reverse engineering but found modern architectures intimidating, the C64 is a brilliant place to start.