Vintage Adventures - MOS 6502 - Part 1
This next set of posts are a bit of a distraction from security themes articles, and we’ll explore some vintage computer hardware.
The MOS 6502 is a classic CPU that drove the home computer revolution in the late 1970s and early 1980s. Along with the Zilog Z80, it brought computing to the masses. The 6502 powered some of the most iconic machines of the era, the Apple II, the Commodore 64, the Atari 2600, and the British-built BBC Micro, among others. It even found its way into the original Nintendo Entertainment System (as the Ricoh 2A03, a modified 6502).
What made the 6502 special wasn’t raw power, it was simplicity and cost. Suddenly, hobbyists and small companies could afford to build computers around it.
The Architecture
The 6502 is an 8-bit CPU with a 16-bit address bus, giving it the ability to directly address 64KB of memory. By modern standards that’s tiny, but in the late ’70s it was plenty of room to build games, word processors, and even early networking software.
Registers
The register set is minimal, just six registers in total:
| Register | Name | Size | Purpose |
|---|---|---|---|
| A | Accumulator | 8-bit | Main working register for arithmetic and logic operations |
| X | Index Register X | 8-bit | General purpose, often used for loop counters and indexed addressing |
| Y | Index Register Y | 8-bit | General purpose, similar to X but with different addressing mode support |
| SP | Stack Pointer | 8-bit | Points to the current top of the stack (lives in page $01, addresses $0100–$01FF) |
| PC | Program Counter | 16-bit | Holds the address of the next instruction to execute |
| P | Processor Status | 8-bit | Flags register, each bit represents a condition flag |
The processor status register (P) packs a lot of information into 8 bits:
7 6 5 4 3 2 1 0
N V - B D I Z C
- N, Negative: set when the result of an operation has bit 7 set
- V, Overflow: set when an arithmetic operation produces a signed overflow
- B, Break: distinguishes hardware interrupts from BRK instructions
- D, Decimal: enables Binary Coded Decimal (BCD) mode for ADC and SBC
- I, Interrupt Disable: when set, maskable interrupts (IRQ) are ignored
- Z, Zero: set when the result of an operation is zero
- C, Carry: used for multi-byte arithmetic and as a borrow flag for subtraction
If you’re coming from modern architectures, the thing that stands out is how few registers there are. There’s no general-purpose register file, almost everything flows through the accumulator. The X and Y registers help with addressing and loops, but the A register does the heavy lifting.
Addressing Modes
One area where the 6502 punches above its weight is addressing modes. Despite the simple register set, the CPU supports 13 addressing modes that give programmers a surprising amount of flexibility:
| Mode | Syntax | Example | Description |
|---|---|---|---|
| Immediate | #$nn |
LDA #$42 |
Operand is the literal value |
| Zero Page | $nn |
LDA $80 |
Address in the first 256 bytes (page zero), fast access |
| Zero Page,X | $nn,X |
LDA $80,X |
Zero page address offset by X register |
| Zero Page,Y | $nn,Y |
LDX $80,Y |
Zero page address offset by Y register |
| Absolute | $nnnn |
LDA $1234 |
Full 16-bit address |
| Absolute,X | $nnnn,X |
LDA $1234,X |
Absolute address offset by X |
| Absolute,Y | $nnnn,Y |
LDA $1234,Y |
Absolute address offset by Y |
| Indirect | ($nnnn) |
JMP ($FFFC) |
Address points to a pointer (JMP only) |
| (Indirect,X) | ($nn,X) |
LDA ($80,X) |
Indexed indirect, zero page pointer offset by X |
| (Indirect),Y | ($nn),Y |
LDA ($80),Y |
Indirect indexed, zero page pointer, then offset by Y |
| Implied | INX |
No operand, the instruction implies what it operates on | |
| Accumulator | A |
ROL A |
Operates directly on the accumulator |
| Relative | $nn |
BEQ $05 |
Signed offset from current PC (branch instructions only) |
Zero page addressing is worth calling out, it’s one of the 6502’s clever tricks. Because zero page addresses only need one byte instead of two, these instructions are both smaller and faster. Experienced 6502 programmers treat the zero page like an extended register file, storing frequently-used variables there for speed.
The Instruction Set
The 6502 has 56 official instructions. They break down into a few logical groups:
Load and Store
| Opcode | Name | Description |
|---|---|---|
| LDA | Load Accumulator | Load a value into A |
| LDX | Load X | Load a value into X |
| LDY | Load Y | Load a value into Y |
| STA | Store Accumulator | Store A into memory |
| STX | Store X | Store X into memory |
| STY | Store Y | Store Y into memory |
Arithmetic
| Opcode | Name | Description |
|---|---|---|
| ADC | Add with Carry | A = A + operand + carry flag |
| SBC | Subtract with Carry | A = A - operand - (1 - carry flag) |
| INC | Increment Memory | Add 1 to a memory location |
| INX | Increment X | X = X + 1 |
| INY | Increment Y | Y = Y + 1 |
| DEC | Decrement Memory | Subtract 1 from a memory location |
| DEX | Decrement X | X = X - 1 |
| DEY | Decrement Y | Y = Y - 1 |
Logic
| Opcode | Name | Description |
|---|---|---|
| AND | Logical AND | A = A & operand |
| ORA | Logical OR | A = A | operand |
| EOR | Exclusive OR | A = A ^ operand |
| BIT | Bit Test | Test bits in memory against accumulator |
Shift and Rotate
| Opcode | Name | Description |
|---|---|---|
| ASL | Arithmetic Shift Left | Shift bits left, bit 0 becomes 0, bit 7 goes to carry |
| LSR | Logical Shift Right | Shift bits right, bit 7 becomes 0, bit 0 goes to carry |
| ROL | Rotate Left | Shift left through carry, carry goes to bit 0, bit 7 goes to carry |
| ROR | Rotate Right | Shift right through carry, carry goes to bit 7, bit 0 goes to carry |
Compare
| Opcode | Name | Description |
|---|---|---|
| CMP | Compare Accumulator | Compare A with operand (sets flags, doesn’t store result) |
| CPX | Compare X | Compare X with operand |
| CPY | Compare Y | Compare Y with operand |
Branch
| Opcode | Name | Description |
|---|---|---|
| BCC | Branch on Carry Clear | Branch if C = 0 |
| BCS | Branch on Carry Set | Branch if C = 1 |
| BEQ | Branch on Equal | Branch if Z = 1 |
| BNE | Branch on Not Equal | Branch if Z = 0 |
| BMI | Branch on Minus | Branch if N = 1 |
| BPL | Branch on Plus | Branch if N = 0 |
| BVC | Branch on Overflow Clear | Branch if V = 0 |
| BVS | Branch on Overflow Set | Branch if V = 1 |
Jump and Subroutine
| Opcode | Name | Description |
|---|---|---|
| JMP | Jump | Set PC to address |
| JSR | Jump to Subroutine | Push return address to stack, then jump |
| RTS | Return from Subroutine | Pull return address from stack, jump to it |
| RTI | Return from Interrupt | Pull status and return address from stack |
| BRK | Break | Trigger a software interrupt |
Transfer
| Opcode | Name | Description |
|---|---|---|
| TAX | Transfer A to X | X = A |
| TAY | Transfer A to Y | Y = A |
| TXA | Transfer X to A | A = X |
| TYA | Transfer Y to A | A = Y |
| TSX | Transfer SP to X | X = SP |
| TXS | Transfer X to SP | SP = X |
Stack
| Opcode | Name | Description |
|---|---|---|
| PHA | Push Accumulator | Push A onto the stack |
| PHP | Push Processor Status | Push the status flags onto the stack |
| PLA | Pull Accumulator | Pop the top of the stack into A |
| PLP | Pull Processor Status | Pop the top of the stack into the status flags |
Flag Control
| Opcode | Name | Description |
|---|---|---|
| CLC | Clear Carry | C = 0 |
| CLD | Clear Decimal | D = 0 |
| CLI | Clear Interrupt Disable | I = 0 |
| CLV | Clear Overflow | V = 0 |
| SEC | Set Carry | C = 1 |
| SED | Set Decimal | D = 1 |
| SEI | Set Interrupt Disable | I = 1 |
Miscellaneous
| Opcode | Name | Description |
|---|---|---|
| NOP | No Operation | Does nothing, advances PC by 1 |
That’s the full set. No multiply, no divide, no floating point. If you need any of that, you build it yourself out of shifts, adds, and loops. That constraint is part of what makes 6502 programming interesting, you learn to think in terms of what the hardware actually gives you.
Building an Emulator
The core loop of a 6502 emulator is surprisingly straightforward. The CPU does the same thing over and over:
- Fetch, Read the byte at the current Program Counter (PC). This is the opcode.
- Decode, Look up the opcode to determine which instruction it is, what addressing mode it uses, and how many bytes the full instruction occupies (1, 2, or 3).
- Execute, Run the logic for that instruction. This might update registers, modify memory, change flags, or alter the PC itself (in the case of jumps and branches).
- Advance, Move the PC forward past the instruction bytes (unless the instruction already changed the PC).
That’s it. Fetch, decode, execute, advance. Every CPU works this way at a fundamental level, the 6502 just makes it easy to see because there’s so little abstraction in the way.
In code, the emulator’s main loop looks something like this (pseudocode):
while running:
opcode = memory[PC]
instruction = decode(opcode)
instruction.execute(operands)
PC += instruction.length
The decode step is typically a lookup table, a 256-entry array (one for each possible byte value) that maps opcodes to their instruction handler, addressing mode, byte length, and cycle count. Most of those 256 slots map to real instructions; the rest are “illegal” opcodes that the original hardware handled in undocumented (and sometimes useful) ways.
Each instruction handler is a small function. For example, LDA in immediate mode just copies the operand byte into the A register and updates the Zero and Negative flags. ADC is more involved, it needs to handle the carry flag, check for overflow, and optionally deal with BCD mode. But none of them are individually complex.
The trickiest parts of getting an emulator right tend to be:
- Flag behaviour, Getting the exact flag updates correct for every instruction. The N and Z flags are straightforward, but the V (overflow) flag for ADC/SBC trips people up. The carry flag’s role in subtraction (it acts as a “borrow” flag, inverted) is another common source of bugs.
- Addressing mode edge cases, Indirect addressing has a famous hardware bug:
JMP ($xxFF)wraps within the page instead of crossing a page boundary. If you don’t emulate this bug, some real 6502 programs won’t work correctly. - Cycle accuracy, If you just want to run programs, you can ignore cycle counts. But if you’re emulating a full system (like an NES or C64), you need accurate cycle timing because the CPU, video chip, and sound hardware are all synchronized.
For this series, we’ll start simple, get the instructions working correctly, then layer on features in later parts.
A Test Program
Here’s a simple 6502 assembly program that exercises a handful of instructions. Nothing fancy, it loads values into registers, shuffles them around with transfers, increments and decrements, and pushes the status flags to the stack.
; Simple 6502 test program
; Start at address $0200
.org $0200
start:
LDA #$42 ; Load 0x42 into A
LDX #$10 ; Load 0x10 into X
LDY #$20 ; Load 0x20 into Y
TAX ; Transfer A to X (X = 0x42)
TAY ; Transfer A to Y (Y = 0x42)
INX ; Increment X (X = 0x43)
INY ; Increment Y (Y = 0x43)
DEX ; Decrement X (X = 0x42)
DEY ; Decrement Y (Y = 0x42)
NOP ; No operation
PHP ; Push status flags to stack
; Pad to 64KB
.org $FFFF
.byte $00
When assembled into machine code, this becomes 14 bytes:
A9 42 A2 10 A0 20 AA A8 E8 C8 CA 88 EA 08
Each instruction is either 1 byte (implied addressing, the opcode is the whole instruction) or 2 bytes (immediate addressing, opcode plus one operand byte). There are no 3-byte instructions in this program since we’re not using any absolute addresses.
Decoding the Hex
Let’s walk through the machine code byte by byte and see exactly what the CPU does. This is the process our emulator will automate.
A9 42, LDA #$42
A9 is the opcode for LDA in immediate mode. The next byte, 42, is the operand. The CPU loads the value $42 into the accumulator. Since $42 is non-zero and bit 7 is clear, the Zero flag is cleared and the Negative flag is cleared. PC advances by 2.
After: A = $42, X = $00, Y = $00
A2 10, LDX #$10
A2 is LDX immediate. The CPU loads $10 into the X register. Same flag logic, non-zero, bit 7 clear. PC advances by 2.
After: A = $42, X = $10, Y = $00
A0 20, LDY #$20
A0 is LDY immediate. Loads $20 into Y. PC advances by 2.
After: A = $42, X = $10, Y = $20
AA, TAX
AA is TAX (Transfer A to X). This is a single-byte implied instruction, no operand needed. The value in A ($42) is copied to X. The old value of X ($10) is gone. Flags update based on the new value of X. PC advances by 1.
After: A = $42, X = $42, Y = $20
A8, TAY
A8 is TAY (Transfer A to Y). Same idea, A is copied to Y. PC advances by 1.
After: A = $42, X = $42, Y = $42
E8, INX
E8 is INX (Increment X). X goes from $42 to $43. Flags update based on the new value. PC advances by 1.
After: A = $42, X = $43, Y = $42
C8, INY
C8 is INY (Increment Y). Y goes from $42 to $43. PC advances by 1.
After: A = $42, X = $43, Y = $43
CA, DEX
CA is DEX (Decrement X). X goes from $43 back to $42. PC advances by 1.
After: A = $42, X = $42, Y = $43
88, DEY
88 is DEY (Decrement Y). Y goes from $43 back to $42. PC advances by 1.
After: A = $42, X = $42, Y = $42
EA, NOP
EA is NOP. Does nothing. PC advances by 1. Registers and flags unchanged.
After: A = $42, X = $42, Y = $42
08, PHP
08 is PHP (Push Processor Status). The current value of the P register (status flags) is pushed onto the stack. The stack pointer decrements by 1. PC advances by 1.
After: A = $42, X = $42, Y = $42, status flags on stack.
And that’s the whole program. Eleven instructions, fourteen bytes. The emulator just needs to repeat this fetch-decode-execute cycle for each one.
What’s Next
In Part 2, we’ll start building the emulator for real, setting up the CPU state, implementing the instruction handlers, and getting this test program to run. We’ll also add a simple debugger so we can step through instructions and inspect registers, which is invaluable when things inevitably don’t work on the first try.