Vintage Adventures - MOS 6502 - Part 1 :: guy@secdev.uk , Guy Dixon

This next set of posts are a bit of a distraction from security themes articles, and we’ll explore some vintage computer hardware.

The MOS 6502 is a classic CPU that drove the home computer revolution in the late 1970s and early 1980s. Along with the Zilog Z80, it brought computing to the masses. The 6502 powered some of the most iconic machines of the era, the Apple II, the Commodore 64, the Atari 2600, and the British-built BBC Micro, among others. It even found its way into the original Nintendo Entertainment System (as the Ricoh 2A03, a modified 6502).

What made the 6502 special wasn’t raw power, it was simplicity and cost. Suddenly, hobbyists and small companies could afford to build computers around it.

The Architecture

The 6502 is an 8-bit CPU with a 16-bit address bus, giving it the ability to directly address 64KB of memory. By modern standards that’s tiny, but in the late ’70s it was plenty of room to build games, word processors, and even early networking software.

Registers

The register set is minimal, just six registers in total:

Register	Name	Size	Purpose
A	Accumulator	8-bit	Main working register for arithmetic and logic operations
X	Index Register X	8-bit	General purpose, often used for loop counters and indexed addressing
Y	Index Register Y	8-bit	General purpose, similar to X but with different addressing mode support
SP	Stack Pointer	8-bit	Points to the current top of the stack (lives in page $01, addresses $0100–$01FF)
PC	Program Counter	16-bit	Holds the address of the next instruction to execute
P	Processor Status	8-bit	Flags register, each bit represents a condition flag

The processor status register (P) packs a lot of information into 8 bits:

  7 6 5 4 3 2 1 0
  N V - B D I Z C

N, Negative: set when the result of an operation has bit 7 set
V, Overflow: set when an arithmetic operation produces a signed overflow
B, Break: distinguishes hardware interrupts from BRK instructions
D, Decimal: enables Binary Coded Decimal (BCD) mode for ADC and SBC
I, Interrupt Disable: when set, maskable interrupts (IRQ) are ignored
Z, Zero: set when the result of an operation is zero
C, Carry: used for multi-byte arithmetic and as a borrow flag for subtraction

If you’re coming from modern architectures, the thing that stands out is how few registers there are. There’s no general-purpose register file, almost everything flows through the accumulator. The X and Y registers help with addressing and loops, but the A register does the heavy lifting.

Addressing Modes

One area where the 6502 punches above its weight is addressing modes. Despite the simple register set, the CPU supports 13 addressing modes that give programmers a surprising amount of flexibility:

Mode	Syntax	Example	Description
Immediate	`#$nn`	`LDA #$42`	Operand is the literal value
Zero Page	`$nn`	`LDA $80`	Address in the first 256 bytes (page zero), fast access
Zero Page,X	`$nn,X`	`LDA $80,X`	Zero page address offset by X register
Zero Page,Y	`$nn,Y`	`LDX $80,Y`	Zero page address offset by Y register
Absolute	`$nnnn`	`LDA $1234`	Full 16-bit address
Absolute,X	`$nnnn,X`	`LDA $1234,X`	Absolute address offset by X
Absolute,Y	`$nnnn,Y`	`LDA $1234,Y`	Absolute address offset by Y
Indirect	`($nnnn)`	`JMP ($FFFC)`	Address points to a pointer (JMP only)
(Indirect,X)	`($nn,X)`	`LDA ($80,X)`	Indexed indirect, zero page pointer offset by X
(Indirect),Y	`($nn),Y`	`LDA ($80),Y`	Indirect indexed, zero page pointer, then offset by Y
Implied		`INX`	No operand, the instruction implies what it operates on
Accumulator	`A`	`ROL A`	Operates directly on the accumulator
Relative	`$nn`	`BEQ $05`	Signed offset from current PC (branch instructions only)

Zero page addressing is worth calling out, it’s one of the 6502’s clever tricks. Because zero page addresses only need one byte instead of two, these instructions are both smaller and faster. Experienced 6502 programmers treat the zero page like an extended register file, storing frequently-used variables there for speed.

The Instruction Set

The 6502 has 56 official instructions. They break down into a few logical groups:

Load and Store

Opcode	Name	Description
LDA	Load Accumulator	Load a value into A
LDX	Load X	Load a value into X
LDY	Load Y	Load a value into Y
STA	Store Accumulator	Store A into memory
STX	Store X	Store X into memory
STY	Store Y	Store Y into memory

Arithmetic

Opcode	Name	Description
ADC	Add with Carry	A = A + operand + carry flag
SBC	Subtract with Carry	A = A - operand - (1 - carry flag)
INC	Increment Memory	Add 1 to a memory location
INX	Increment X	X = X + 1
INY	Increment Y	Y = Y + 1
DEC	Decrement Memory	Subtract 1 from a memory location
DEX	Decrement X	X = X - 1
DEY	Decrement Y	Y = Y - 1

Logic

Opcode	Name	Description
AND	Logical AND	A = A & operand
ORA	Logical OR	A = A \| operand
EOR	Exclusive OR	A = A ^ operand
BIT	Bit Test	Test bits in memory against accumulator

Shift and Rotate

Opcode	Name	Description
ASL	Arithmetic Shift Left	Shift bits left, bit 0 becomes 0, bit 7 goes to carry
LSR	Logical Shift Right	Shift bits right, bit 7 becomes 0, bit 0 goes to carry
ROL	Rotate Left	Shift left through carry, carry goes to bit 0, bit 7 goes to carry
ROR	Rotate Right	Shift right through carry, carry goes to bit 7, bit 0 goes to carry

Compare

Opcode	Name	Description
CMP	Compare Accumulator	Compare A with operand (sets flags, doesn’t store result)
CPX	Compare X	Compare X with operand
CPY	Compare Y	Compare Y with operand

Branch

Opcode	Name	Description
BCC	Branch on Carry Clear	Branch if C = 0
BCS	Branch on Carry Set	Branch if C = 1
BEQ	Branch on Equal	Branch if Z = 1
BNE	Branch on Not Equal	Branch if Z = 0
BMI	Branch on Minus	Branch if N = 1
BPL	Branch on Plus	Branch if N = 0
BVC	Branch on Overflow Clear	Branch if V = 0
BVS	Branch on Overflow Set	Branch if V = 1

Jump and Subroutine

Opcode	Name	Description
JMP	Jump	Set PC to address
JSR	Jump to Subroutine	Push return address to stack, then jump
RTS	Return from Subroutine	Pull return address from stack, jump to it
RTI	Return from Interrupt	Pull status and return address from stack
BRK	Break	Trigger a software interrupt

Transfer

Opcode	Name	Description
TAX	Transfer A to X	X = A
TAY	Transfer A to Y	Y = A
TXA	Transfer X to A	A = X
TYA	Transfer Y to A	A = Y
TSX	Transfer SP to X	X = SP
TXS	Transfer X to SP	SP = X

Stack

Opcode	Name	Description
PHA	Push Accumulator	Push A onto the stack
PHP	Push Processor Status	Push the status flags onto the stack
PLA	Pull Accumulator	Pop the top of the stack into A
PLP	Pull Processor Status	Pop the top of the stack into the status flags

Flag Control

Opcode	Name	Description
CLC	Clear Carry	C = 0
CLD	Clear Decimal	D = 0
CLI	Clear Interrupt Disable	I = 0
CLV	Clear Overflow	V = 0
SEC	Set Carry	C = 1
SED	Set Decimal	D = 1
SEI	Set Interrupt Disable	I = 1

Miscellaneous

Opcode	Name	Description
NOP	No Operation	Does nothing, advances PC by 1

That’s the full set. No multiply, no divide, no floating point. If you need any of that, you build it yourself out of shifts, adds, and loops. That constraint is part of what makes 6502 programming interesting, you learn to think in terms of what the hardware actually gives you.

Building an Emulator

The core loop of a 6502 emulator is surprisingly straightforward. The CPU does the same thing over and over:

Fetch, Read the byte at the current Program Counter (PC). This is the opcode.
Decode, Look up the opcode to determine which instruction it is, what addressing mode it uses, and how many bytes the full instruction occupies (1, 2, or 3).
Execute, Run the logic for that instruction. This might update registers, modify memory, change flags, or alter the PC itself (in the case of jumps and branches).
Advance, Move the PC forward past the instruction bytes (unless the instruction already changed the PC).

That’s it. Fetch, decode, execute, advance. Every CPU works this way at a fundamental level, the 6502 just makes it easy to see because there’s so little abstraction in the way.

In code, the emulator’s main loop looks something like this (pseudocode):

while running:
    opcode = memory[PC]
    instruction = decode(opcode)
    instruction.execute(operands)
    PC += instruction.length

The decode step is typically a lookup table, a 256-entry array (one for each possible byte value) that maps opcodes to their instruction handler, addressing mode, byte length, and cycle count. Most of those 256 slots map to real instructions; the rest are “illegal” opcodes that the original hardware handled in undocumented (and sometimes useful) ways.

Each instruction handler is a small function. For example, LDA in immediate mode just copies the operand byte into the A register and updates the Zero and Negative flags. ADC is more involved, it needs to handle the carry flag, check for overflow, and optionally deal with BCD mode. But none of them are individually complex.

The trickiest parts of getting an emulator right tend to be:

Flag behaviour, Getting the exact flag updates correct for every instruction. The N and Z flags are straightforward, but the V (overflow) flag for ADC/SBC trips people up. The carry flag’s role in subtraction (it acts as a “borrow” flag, inverted) is another common source of bugs.
Addressing mode edge cases, Indirect addressing has a famous hardware bug: JMP ($xxFF) wraps within the page instead of crossing a page boundary. If you don’t emulate this bug, some real 6502 programs won’t work correctly.
Cycle accuracy, If you just want to run programs, you can ignore cycle counts. But if you’re emulating a full system (like an NES or C64), you need accurate cycle timing because the CPU, video chip, and sound hardware are all synchronized.

For this series, we’ll start simple, get the instructions working correctly, then layer on features in later parts.

A Test Program

Here’s a simple 6502 assembly program that exercises a handful of instructions. Nothing fancy, it loads values into registers, shuffles them around with transfers, increments and decrements, and pushes the status flags to the stack.

; Simple 6502 test program
; Start at address $0200

.org $0200

start:
    LDA #$42    ; Load 0x42 into A
    LDX #$10    ; Load 0x10 into X
    LDY #$20    ; Load 0x20 into Y
    TAX         ; Transfer A to X (X = 0x42)
    TAY         ; Transfer A to Y (Y = 0x42)
    INX         ; Increment X (X = 0x43)
    INY         ; Increment Y (Y = 0x43)
    DEX         ; Decrement X (X = 0x42)
    DEY         ; Decrement Y (Y = 0x42)
    NOP         ; No operation
    PHP         ; Push status flags to stack

; Pad to 64KB
.org $FFFF
    .byte $00

When assembled into machine code, this becomes 14 bytes:

A9 42 A2 10 A0 20 AA A8 E8 C8 CA 88 EA 08

Each instruction is either 1 byte (implied addressing, the opcode is the whole instruction) or 2 bytes (immediate addressing, opcode plus one operand byte). There are no 3-byte instructions in this program since we’re not using any absolute addresses.

Decoding the Hex

Let’s walk through the machine code byte by byte and see exactly what the CPU does. This is the process our emulator will automate.

A9 42, LDA #$42

A9 is the opcode for LDA in immediate mode. The next byte, 42, is the operand. The CPU loads the value $42 into the accumulator. Since $42 is non-zero and bit 7 is clear, the Zero flag is cleared and the Negative flag is cleared. PC advances by 2.

After: A = $42, X = $00, Y = $00

A2 10, LDX #$10

A2 is LDX immediate. The CPU loads $10 into the X register. Same flag logic, non-zero, bit 7 clear. PC advances by 2.

After: A = $42, X = $10, Y = $00

A0 20, LDY #$20

A0 is LDY immediate. Loads $20 into Y. PC advances by 2.

After: A = $42, X = $10, Y = $20

AA, TAX

AA is TAX (Transfer A to X). This is a single-byte implied instruction, no operand needed. The value in A ($42) is copied to X. The old value of X ($10) is gone. Flags update based on the new value of X. PC advances by 1.

After: A = $42, X = $42, Y = $20

A8, TAY

A8 is TAY (Transfer A to Y). Same idea, A is copied to Y. PC advances by 1.

After: A = $42, X = $42, Y = $42

E8, INX

E8 is INX (Increment X). X goes from $42 to $43. Flags update based on the new value. PC advances by 1.

After: A = $42, X = $43, Y = $42

C8, INY

C8 is INY (Increment Y). Y goes from $42 to $43. PC advances by 1.

After: A = $42, X = $43, Y = $43

CA, DEX

CA is DEX (Decrement X). X goes from $43 back to $42. PC advances by 1.

After: A = $42, X = $42, Y = $43

88, DEY

88 is DEY (Decrement Y). Y goes from $43 back to $42. PC advances by 1.

After: A = $42, X = $42, Y = $42

EA, NOP

EA is NOP. Does nothing. PC advances by 1. Registers and flags unchanged.

After: A = $42, X = $42, Y = $42

08, PHP

08 is PHP (Push Processor Status). The current value of the P register (status flags) is pushed onto the stack. The stack pointer decrements by 1. PC advances by 1.

After: A = $42, X = $42, Y = $42, status flags on stack.

And that’s the whole program. Eleven instructions, fourteen bytes. The emulator just needs to repeat this fetch-decode-execute cycle for each one.

What’s Next

In Part 2, we’ll start building the emulator for real, setting up the CPU state, implementing the instruction handlers, and getting this test program to run. We’ll also add a simple debugger so we can step through instructions and inspect registers, which is invaluable when things inevitably don’t work on the first try.