
This project implements a compact 8-bit CPU in Verilog.
Instead of relying on a fixed hardcoded program, the processor receives a short instruction sequence through SPI, stores it into an internal program memory, and then executes that sequence autonomously.
The design is intended for Tiny Tapeout and aims to balance:
The final architecture uses:
ACC)R1)The system works in two clearly separated phases.
When RUN = 0, an external controller sends 24-bit SPI frames to the chip.
Each frame contains:
The chip stores those instructions into its internal program memory.
When RUN = 1, the CPU starts executing from address 0 and runs autonomously from its internal instruction memory.
This means the chip itself does not parse files or access a filesystem.
A file such as a .hex program exists outside the ASIC, and an external controller converts that file into SPI frames and sends them into the chip.
The project is organized into the following RTL blocks.
alu8.v8-bit arithmetic and logic unit used by the CPU.
It provides the datapath for arithmetic and logic instructions.
tiny8_cpu.vCPU core responsible for:
tiny8_prgmem.vInternal instruction memory.
The final implementation uses 10 valid instruction slots.
Valid addresses are:
0123456789tiny8_spi_loader.vSPI loader that receives 24-bit frames and converts them into:
for the internal program memory.
tt_um_tiny8_risclike.vTop-level wrapper used for external integration.
It connects:
The internal program memory stores:
Only addresses 0..9 are valid.
If the CPU reads an address outside the valid range, the memory returns:
16'hD000which corresponds to:
HALTThis prevents undefined execution if an invalid branch target is reached.
The CPU uses a very small internal state.
ACCMain accumulator register.
This is the primary working register of the CPU.
Most arithmetic and logic operations write their result back into ACC.
R1Auxiliary register.
This is typically used as the second operand for ALU operations.
ZZero flag.
This flag is updated when the CPU needs to know whether a result is zero.
It is used by conditional branches such as BZ and BNZ.
port_outVisible output register.
This is the value that appears externally on uo[7:0].
PCProgram counter.
This selects the current instruction address.
ACC meansACC stands for accumulator.
An accumulator is a register used to hold the current working result of the processor.
In simple CPUs, instead of having many general-purpose registers, one register is treated as the main operand/result register.
In this project:
ACCACC and R1ACCOUT sends ACC to the visible outputA simple way to think about it is:
ACC = the main working registerR1 = helper registerExample:
5 into ACC3 into R1ADDACC = 8So the accumulator is where the CPU keeps the main result it is currently working on.
After reset:
ACC = 0R1 = 0port_out = 0PC = 0halted = 0The CPU fetches an instruction from the current address and executes it.
If execution advances sequentially past address 9, it wraps back to address 0.
Branch instructions use the low address bits of the instruction.
If the requested branch address is invalid, execution is redirected to address 0.
When HALT is executed:
halted becomes 1When RUN = 0, the design is in program load mode.
When RUN = 1, the design is in execution mode.
This separation is important:
RUN = 0 → load instructionsRUN = 1 → execute programThe opcode is stored in:
instr[15:12]The ISA includes:
NOPLDI_ACCLDI_R1MOV_ACC_R1ADDSUBANDORXORCMPOUTOUT_R1JMPBZBNZHALTNOPOpcode: 0x0
Encoding: 0x0000
No operation is performed.
The CPU simply advances to the next instruction.
Useful for:
LDI_ACCOpcode: 0x1
Encoding: 0x1xxx
Loads an 8-bit immediate value into ACC.
Effect:
ACC <- imm8Also updates Z if the loaded value is zero.
Example:
0x102A → ACC = 0x2ALDI_R1Opcode: 0x2
Encoding: 0x2xxx
Loads an 8-bit immediate value into R1.
Effect:
R1 <- imm8Example:
0x2033 → R1 = 0x33ADDOpcode: 0x3
Encoding: 0x3000
Adds R1 to ACC.
Effect:
ACC <- ACC + R1Also updates Z based on the result.
SUBOpcode: 0x4
Encoding: 0x4000
Subtracts R1 from ACC.
Effect:
ACC <- ACC - R1Also updates Z.
ANDOpcode: 0x5
Encoding: 0x5000
Bitwise AND between ACC and R1.
Effect:
ACC <- ACC & R1Useful for:
OROpcode: 0x6
Encoding: 0x6000
Bitwise OR between ACC and R1.
Effect:
ACC <- ACC | R1Useful for:
XOROpcode: 0x7
Encoding: 0x7000
Bitwise XOR between ACC and R1.
Effect:
ACC <- ACC ^ R1Useful for:
CMPOpcode: 0x8
Encoding: 0x8000
Comparison through the subtraction path.
This instruction does not store the subtraction result into ACC.
Instead, it uses the subtraction result internally to update Z.
In practice, it is useful to test whether:
ACC == R1If ACC - R1 == 0, then:
Z = 1Otherwise:
Z = 0This instruction is mainly intended to support:
BZBNZOUTOpcode: 0x9
Encoding: 0x9000
Copies ACC into the visible output register.
Effect:
port_out <- ACCExternally this appears on:
uo[7:0]JMPOpcode: 0xA
Encoding: 0xA00a
Unconditional jump.
Effect:
PC <- aIf the target address is invalid, execution is redirected to address 0.
BZOpcode: 0xB
Encoding: 0xB00a
Branch if zero.
Effect:
Z = 1, jump to address aUseful after:
CMPBNZOpcode: 0xC
Encoding: 0xC00a
Branch if not zero.
Effect:
Z = 0, jump to address aHALTOpcode: 0xD
Encoding: 0xD000
Stops execution.
Effect:
halted = 1Externally, halt status is visible.
MOV_ACC_R1Opcode: 0xE
Encoding: 0xE000
Copies R1 into ACC.
Effect:
ACC <- R1Also updates Z depending on the value copied.
Useful for:
R1 into the main working registerACCOUT_R1Opcode: 0xF
Encoding: 0xF000
Copies R1 directly into the visible output register.
Effect:
port_out <- R1Useful when:
R1 already contains the value you want to showACCInstructions are loaded using 24-bit SPI frames.
[23:16] = address byte[15:0] = instruction wordProgram writes are accepted only when:
RUN = 0When RUN = 1, execution mode is active and memory writes are blocked.
The CPU does not load a .hex file directly.
Instead:
Reset the design.
This initializes:
Set:
RUN = 0Send one SPI frame per instruction.
Observe:
PROGRAM_LOADEDSet:
RUN = 1The CPU begins execution from address 0.
Monitor:
uo[7:0]HALTEDRUN_ECHOui[0] = RUNuo[7:0] = CPU output portuio[0] = SPI_SCK inputuio[1] = SPI_CS_N inputuio[2] = SPI_MOSI inputuio[3] = PROGRAM_LOADED outputuio[4] = HALTED outputuio[5] = RUN_ECHO outputuio[6] = unuseduio[7] = unusedExample:
| Address | Instruction | Meaning |
|---|---|---|
| 0 | 0x2033 |
LDI_R1 0x33 |
| 1 | 0xE000 |
MOV_ACC_R1 |
| 2 | 0x9000 |
OUT |
| 3 | 0xF000 |
OUT_R1 |
| 4 | 0xD000 |
HALT |
R1 = 0x33ACC = 0x330x33R1For the example above, the SPI frames are:
| Address | Instruction | 24-bit frame |
|---|---|---|
| 0 | 0x2033 |
0x002033 |
| 1 | 0xE000 |
0x01E000 |
| 2 | 0x9000 |
0x029000 |
| 3 | 0xF000 |
0x03F000 |
| 4 | 0xD000 |
0x04D000 |
The chip does not open or parse files.
A file such as .hex is stored externally, for example in:
That controller converts the file into SPI frames and sends them into the ASIC.
So the ASIC receives:
not files directly.
In a physical setup, the external controller acts as:
The ASIC acts as:
A practical demo-board flow is:
RUN = 0PROGRAM_LOADEDRUN = 1uo[7:0], HALTED, and RUN_ECHOIf the CPU runs too fast, a person will not be able to observe intermediate states directly.
At high frequency, execution may complete before a human can visually follow the output transitions.
Use a slow external clock so changes can be seen by eye.
Advance execution one clock pulse at a time.
Run normally, then only show:
A human can easily observe:
PROGRAM_LOADED becomes activeRUN_ECHO reflects execution stateHALTED becomes activeuo[7:0]If a human must observe intermediate instruction results, then the external controller should provide:
Suppose the external program file contains:
00 2033
01 E000
02 9000
03 F000
04 D000
The controller converts that into these frames:
0x0020330x01E0000x0290000x03F0000x04D000Then:
RUN = 0PROGRAM_LOADED = 1RUN = 1This architecture is good for:
It is not intended for:
The validation strategy is intended to cover:
This project extends a simple ALU-style concept into a compact programmable system.
Instead of manually forcing each operation from outside, the ALU is controlled by a small CPU that:
This makes the design more representative of a real programmable digital block while still remaining compact enough for Tiny Tapeout.
| # | Input | Output | Bidirectional |
|---|---|---|---|
| 0 | RUN | OUT[0] | SPI_SCK input |
| 1 | OUT[1] | SPI_CS_N input | |
| 2 | OUT[2] | SPI_MOSI input | |
| 3 | OUT[3] | PROGRAM_LOADED output | |
| 4 | OUT[4] | HALTED output | |
| 5 | OUT[5] | RUN_ECHO output | |
| 6 | OUT[6] | ||
| 7 | OUT[7] |