This next set of posts are a bit of a distraction from security themes articles, and we’ll explore some vintage computer hardware.

The MOS 6502 is a classic CPU that drove the home computer revolution in the late 1970s and early 1980s. Along with the Zilog Z80, it brought computing to the masses. The 6502 powered some of the most iconic machines of the era, the Apple II, the Commodore 64, the Atari 2600, and the British-built BBC Micro, among others. It even found its way into the original Nintendo Entertainment System (as the Ricoh 2A03, a modified 6502).

What made the 6502 special wasn’t raw power, it was simplicity and cost. Suddenly, hobbyists and small companies could afford to build computers around it.

The Architecture

The 6502 is an 8-bit CPU with a 16-bit address bus, giving it the ability to directly address 64KB of memory. By modern standards that’s tiny, but in the late ’70s it was plenty of room to build games, word processors, and even early networking software.

Registers

The register set is minimal, just six registers in total:

Register Name Size Purpose
A Accumulator 8-bit Main working register for arithmetic and logic operations
X Index Register X 8-bit General purpose, often used for loop counters and indexed addressing
Y Index Register Y 8-bit General purpose, similar to X but with different addressing mode support
SP Stack Pointer 8-bit Points to the current top of the stack (lives in page $01, addresses $0100–$01FF)
PC Program Counter 16-bit Holds the address of the next instruction to execute
P Processor Status 8-bit Flags register, each bit represents a condition flag

The processor status register (P) packs a lot of information into 8 bits:

  7 6 5 4 3 2 1 0
  N V - B D I Z C
  • N, Negative: set when the result of an operation has bit 7 set
  • V, Overflow: set when an arithmetic operation produces a signed overflow
  • B, Break: distinguishes hardware interrupts from BRK instructions
  • D, Decimal: enables Binary Coded Decimal (BCD) mode for ADC and SBC
  • I, Interrupt Disable: when set, maskable interrupts (IRQ) are ignored
  • Z, Zero: set when the result of an operation is zero
  • C, Carry: used for multi-byte arithmetic and as a borrow flag for subtraction

If you’re coming from modern architectures, the thing that stands out is how few registers there are. There’s no general-purpose register file, almost everything flows through the accumulator. The X and Y registers help with addressing and loops, but the A register does the heavy lifting.

Addressing Modes

One area where the 6502 punches above its weight is addressing modes. Despite the simple register set, the CPU supports 13 addressing modes that give programmers a surprising amount of flexibility:

Mode Syntax Example Description
Immediate #$nn LDA #$42 Operand is the literal value
Zero Page $nn LDA $80 Address in the first 256 bytes (page zero), fast access
Zero Page,X $nn,X LDA $80,X Zero page address offset by X register
Zero Page,Y $nn,Y LDX $80,Y Zero page address offset by Y register
Absolute $nnnn LDA $1234 Full 16-bit address
Absolute,X $nnnn,X LDA $1234,X Absolute address offset by X
Absolute,Y $nnnn,Y LDA $1234,Y Absolute address offset by Y
Indirect ($nnnn) JMP ($FFFC) Address points to a pointer (JMP only)
(Indirect,X) ($nn,X) LDA ($80,X) Indexed indirect, zero page pointer offset by X
(Indirect),Y ($nn),Y LDA ($80),Y Indirect indexed, zero page pointer, then offset by Y
Implied INX No operand, the instruction implies what it operates on
Accumulator A ROL A Operates directly on the accumulator
Relative $nn BEQ $05 Signed offset from current PC (branch instructions only)

Zero page addressing is worth calling out, it’s one of the 6502’s clever tricks. Because zero page addresses only need one byte instead of two, these instructions are both smaller and faster. Experienced 6502 programmers treat the zero page like an extended register file, storing frequently-used variables there for speed.

The Instruction Set

The 6502 has 56 official instructions. They break down into a few logical groups:

Load and Store

Opcode Name Description
LDA Load Accumulator Load a value into A
LDX Load X Load a value into X
LDY Load Y Load a value into Y
STA Store Accumulator Store A into memory
STX Store X Store X into memory
STY Store Y Store Y into memory

Arithmetic

Opcode Name Description
ADC Add with Carry A = A + operand + carry flag
SBC Subtract with Carry A = A - operand - (1 - carry flag)
INC Increment Memory Add 1 to a memory location
INX Increment X X = X + 1
INY Increment Y Y = Y + 1
DEC Decrement Memory Subtract 1 from a memory location
DEX Decrement X X = X - 1
DEY Decrement Y Y = Y - 1

Logic

Opcode Name Description
AND Logical AND A = A & operand
ORA Logical OR A = A | operand
EOR Exclusive OR A = A ^ operand
BIT Bit Test Test bits in memory against accumulator

Shift and Rotate

Opcode Name Description
ASL Arithmetic Shift Left Shift bits left, bit 0 becomes 0, bit 7 goes to carry
LSR Logical Shift Right Shift bits right, bit 7 becomes 0, bit 0 goes to carry
ROL Rotate Left Shift left through carry, carry goes to bit 0, bit 7 goes to carry
ROR Rotate Right Shift right through carry, carry goes to bit 7, bit 0 goes to carry

Compare

Opcode Name Description
CMP Compare Accumulator Compare A with operand (sets flags, doesn’t store result)
CPX Compare X Compare X with operand
CPY Compare Y Compare Y with operand

Branch

Opcode Name Description
BCC Branch on Carry Clear Branch if C = 0
BCS Branch on Carry Set Branch if C = 1
BEQ Branch on Equal Branch if Z = 1
BNE Branch on Not Equal Branch if Z = 0
BMI Branch on Minus Branch if N = 1
BPL Branch on Plus Branch if N = 0
BVC Branch on Overflow Clear Branch if V = 0
BVS Branch on Overflow Set Branch if V = 1

Jump and Subroutine

Opcode Name Description
JMP Jump Set PC to address
JSR Jump to Subroutine Push return address to stack, then jump
RTS Return from Subroutine Pull return address from stack, jump to it
RTI Return from Interrupt Pull status and return address from stack
BRK Break Trigger a software interrupt

Transfer

Opcode Name Description
TAX Transfer A to X X = A
TAY Transfer A to Y Y = A
TXA Transfer X to A A = X
TYA Transfer Y to A A = Y
TSX Transfer SP to X X = SP
TXS Transfer X to SP SP = X

Stack

Opcode Name Description
PHA Push Accumulator Push A onto the stack
PHP Push Processor Status Push the status flags onto the stack
PLA Pull Accumulator Pop the top of the stack into A
PLP Pull Processor Status Pop the top of the stack into the status flags

Flag Control

Opcode Name Description
CLC Clear Carry C = 0
CLD Clear Decimal D = 0
CLI Clear Interrupt Disable I = 0
CLV Clear Overflow V = 0
SEC Set Carry C = 1
SED Set Decimal D = 1
SEI Set Interrupt Disable I = 1

Miscellaneous

Opcode Name Description
NOP No Operation Does nothing, advances PC by 1

That’s the full set. No multiply, no divide, no floating point. If you need any of that, you build it yourself out of shifts, adds, and loops. That constraint is part of what makes 6502 programming interesting, you learn to think in terms of what the hardware actually gives you.

Building an Emulator

The core loop of a 6502 emulator is surprisingly straightforward. The CPU does the same thing over and over:

  1. Fetch, Read the byte at the current Program Counter (PC). This is the opcode.
  2. Decode, Look up the opcode to determine which instruction it is, what addressing mode it uses, and how many bytes the full instruction occupies (1, 2, or 3).
  3. Execute, Run the logic for that instruction. This might update registers, modify memory, change flags, or alter the PC itself (in the case of jumps and branches).
  4. Advance, Move the PC forward past the instruction bytes (unless the instruction already changed the PC).

That’s it. Fetch, decode, execute, advance. Every CPU works this way at a fundamental level, the 6502 just makes it easy to see because there’s so little abstraction in the way.

In code, the emulator’s main loop looks something like this (pseudocode):

while running:
    opcode = memory[PC]
    instruction = decode(opcode)
    instruction.execute(operands)
    PC += instruction.length

The decode step is typically a lookup table, a 256-entry array (one for each possible byte value) that maps opcodes to their instruction handler, addressing mode, byte length, and cycle count. Most of those 256 slots map to real instructions; the rest are “illegal” opcodes that the original hardware handled in undocumented (and sometimes useful) ways.

Each instruction handler is a small function. For example, LDA in immediate mode just copies the operand byte into the A register and updates the Zero and Negative flags. ADC is more involved, it needs to handle the carry flag, check for overflow, and optionally deal with BCD mode. But none of them are individually complex.

The trickiest parts of getting an emulator right tend to be:

  • Flag behaviour, Getting the exact flag updates correct for every instruction. The N and Z flags are straightforward, but the V (overflow) flag for ADC/SBC trips people up. The carry flag’s role in subtraction (it acts as a “borrow” flag, inverted) is another common source of bugs.
  • Addressing mode edge cases, Indirect addressing has a famous hardware bug: JMP ($xxFF) wraps within the page instead of crossing a page boundary. If you don’t emulate this bug, some real 6502 programs won’t work correctly.
  • Cycle accuracy, If you just want to run programs, you can ignore cycle counts. But if you’re emulating a full system (like an NES or C64), you need accurate cycle timing because the CPU, video chip, and sound hardware are all synchronized.

For this series, we’ll start simple, get the instructions working correctly, then layer on features in later parts.

A Test Program

Here’s a simple 6502 assembly program that exercises a handful of instructions. Nothing fancy, it loads values into registers, shuffles them around with transfers, increments and decrements, and pushes the status flags to the stack.

; Simple 6502 test program
; Start at address $0200

.org $0200

start:
    LDA #$42    ; Load 0x42 into A
    LDX #$10    ; Load 0x10 into X
    LDY #$20    ; Load 0x20 into Y
    TAX         ; Transfer A to X (X = 0x42)
    TAY         ; Transfer A to Y (Y = 0x42)
    INX         ; Increment X (X = 0x43)
    INY         ; Increment Y (Y = 0x43)
    DEX         ; Decrement X (X = 0x42)
    DEY         ; Decrement Y (Y = 0x42)
    NOP         ; No operation
    PHP         ; Push status flags to stack

; Pad to 64KB
.org $FFFF
    .byte $00

When assembled into machine code, this becomes 14 bytes:

A9 42 A2 10 A0 20 AA A8 E8 C8 CA 88 EA 08

Each instruction is either 1 byte (implied addressing, the opcode is the whole instruction) or 2 bytes (immediate addressing, opcode plus one operand byte). There are no 3-byte instructions in this program since we’re not using any absolute addresses.

Decoding the Hex

Let’s walk through the machine code byte by byte and see exactly what the CPU does. This is the process our emulator will automate.

A9 42, LDA #$42

A9 is the opcode for LDA in immediate mode. The next byte, 42, is the operand. The CPU loads the value $42 into the accumulator. Since $42 is non-zero and bit 7 is clear, the Zero flag is cleared and the Negative flag is cleared. PC advances by 2.

After: A = $42, X = $00, Y = $00

A2 10, LDX #$10

A2 is LDX immediate. The CPU loads $10 into the X register. Same flag logic, non-zero, bit 7 clear. PC advances by 2.

After: A = $42, X = $10, Y = $00

A0 20, LDY #$20

A0 is LDY immediate. Loads $20 into Y. PC advances by 2.

After: A = $42, X = $10, Y = $20

AA, TAX

AA is TAX (Transfer A to X). This is a single-byte implied instruction, no operand needed. The value in A ($42) is copied to X. The old value of X ($10) is gone. Flags update based on the new value of X. PC advances by 1.

After: A = $42, X = $42, Y = $20

A8, TAY

A8 is TAY (Transfer A to Y). Same idea, A is copied to Y. PC advances by 1.

After: A = $42, X = $42, Y = $42

E8, INX

E8 is INX (Increment X). X goes from $42 to $43. Flags update based on the new value. PC advances by 1.

After: A = $42, X = $43, Y = $42

C8, INY

C8 is INY (Increment Y). Y goes from $42 to $43. PC advances by 1.

After: A = $42, X = $43, Y = $43

CA, DEX

CA is DEX (Decrement X). X goes from $43 back to $42. PC advances by 1.

After: A = $42, X = $42, Y = $43

88, DEY

88 is DEY (Decrement Y). Y goes from $43 back to $42. PC advances by 1.

After: A = $42, X = $42, Y = $42

EA, NOP

EA is NOP. Does nothing. PC advances by 1. Registers and flags unchanged.

After: A = $42, X = $42, Y = $42

08, PHP

08 is PHP (Push Processor Status). The current value of the P register (status flags) is pushed onto the stack. The stack pointer decrements by 1. PC advances by 1.

After: A = $42, X = $42, Y = $42, status flags on stack.

And that’s the whole program. Eleven instructions, fourteen bytes. The emulator just needs to repeat this fetch-decode-execute cycle for each one.

What’s Next

In Part 2, we’ll start building the emulator for real, setting up the CPU state, implementing the instruction handlers, and getting this test program to run. We’ll also add a simple debugger so we can step through instructions and inspect registers, which is invaluable when things inevitably don’t work on the first try.