354 8-bit RISC CPU

354 : 8-bit RISC CPU

Design render

Overview

This project is an 8-bit RISC CPU written in Verilog for Tiny Tapeout GF. The CPU is intentionally small and simple: it has no internal program ROM, so an external controller provides one 16-bit instruction at a time through the Tiny Tapeout input pins.

The design executes the instruction and returns either the instruction result or the current program counter on the 8-bit output bus.

Main building blocks:

  • 8-bit program counter
  • 8 x 8-bit register file
  • 8-bit ALU
  • immediate generator
  • branch-control logic
  • 16-byte data memory

The target clock configured for this project is 5 MHz.

Pin Interface

The instruction bus is 16 bits wide and is split across ui_in and uio_in. The bidirectional uio pins are used only as inputs.

Tiny Tapeout signal Direction CPU signal
ui_in[7:0] input instruction[15:8]
uio_in[7:0] input instruction[7:0]
uo_out[7:0] output CPU result / PC output
uio_out[7:0] output Always 0
uio_oe[7:0] output Always 0
clk input system clock
rst_n input active-low reset
ena input Tiny Tapeout enable, unused internally

To drive an instruction:

ui_in  = instruction[15:8]
uio_in = instruction[7:0]

For example, instruction 0x1843 is driven as:

ui_in  = 0x18
uio_in = 0x43

This means ui_in[0] carries instruction[8], and uio_in[0] carries instruction[0].

Execution Timing

The CPU alternates between two internal phases:

  1. NOP / PC phase
  2. execute phase

For normal use, keep each instruction stable for two clock cycles.

During the execute phase, uo_out shows the instruction result. On the next clock edge, register, memory, and PC updates are committed. During the following NOP / PC phase, uo_out shows the current program counter.

One instruction slot looks like this:

Moment External action / observation
Before clock edge 1 Drive the 16-bit instruction and keep it stable
After clock edge 1 uo_out shows the instruction result
Clock edge 2 The register file, data memory, and PC update
After clock edge 2 uo_out shows the updated PC

Typical external-controller sequence:

  1. Drive rst_n = 0 for reset.
  2. Release reset with rst_n = 1.
  3. Put a 16-bit instruction on {ui_in, uio_in}.
  4. Hold the instruction stable for two clock cycles.
  5. Read the result from uo_out.
  6. Change to the next instruction and repeat.

Architecture Details

Register File

There are eight 8-bit general-purpose registers, r0 through r7.

After reset:

Register Value
r0 0
r1 0
r2 0
r3 0
r4 0
r5 0
r6 0
r7 0

Data Memory

The data memory contains 16 bytes. Address calculations are 8-bit, but only the lower 4 address bits select the memory entry. In other words, data-memory addresses wrap modulo 16.

Example:

0x08, 0x18, and 0xF8 all access memory index 8

Program Counter

The program counter is 8 bits wide. Normal instructions increment the PC by 1. Taken branches update the PC using:

next_pc = pc + imm6

PC arithmetic wraps modulo 256.

Instruction Encoding

This is a compact custom ISA for this Tiny Tapeout design, not a standard RISC-V-compatible encoding.

The opcode is always stored in instruction[15:12].

R-Type Format

R-type instructions use two source registers, one destination register, and a 3-bit ALU function field.

Bits Field
[15:12] opcode, always 0000
[11:9] destination register rd
[8:6] source register rs1
[5:3] source register rs2
[2:0] ALU function funct3

Encoding formula:

instruction = (rd << 9) | (rs1 << 6) | (rs2 << 3) | funct3

Immediate Format

Most non-R-type instructions use a source/base register and a 6-bit immediate.

Bits Field
[15:12] opcode
[11:9] destination register rd, or unused depending on instruction
[8:6] source/base register rs1
[5:0] immediate imm6

Encoding formula:

instruction = (opcode << 12) | (rd << 9) | (rs1 << 6) | (imm6 & 0x3F)

The imm6 value is sign-extended to 8 bits before ALU use. For example, imm6 = 0x3F represents -1, which becomes 0xFF in the 8-bit datapath.

The LI instruction is a special case:

instruction = (0x2 << 12) | (rd << 9) | imm8

Store And Branch Format

Store and branch instructions use a split 6-bit immediate so that rs2 can be encoded independently.

Bits Field
[15:12] opcode
[11:9] immediate imm6[5:3]
[8:6] source/base register rs1
[5:3] source register rs2
[2:0] immediate imm6[2:0]

Encoding formula:

instruction = (opcode << 12)
            | (((imm6 >> 3) & 0x7) << 9)
            | (rs1 << 6)
            | (rs2 << 3)
            | (imm6 & 0x7)

For SW, rs2 selects the register whose value is written to data memory. For branch instructions, rs1 and rs2 are compared, while imm6 is used as the signed PC offset when the branch is taken.

Instruction Set

Opcode Mnemonic Operation
0000 R-type ALU operation selected by funct3
0001 ADDI rd = rs1 + imm6
0010 LI rd = imm8
0011 LW rd = memory[rs1 + imm6]
0100 SW memory[rs1 + imm6] = rs2
0101 BEQ branch when rs1 == rs2
0110 BNE branch when rs1 != rs2
0111 BLT branch when rs1 < rs2
1000 ANDI rd = rs1 & imm6
1001 ORI <code>rd = rs1 | imm6</code>
1010 XORI rd = rs1 ^ imm6
1011 SLLI rd = rs1 << imm6
1100 SRLI rd = rs1 >> imm6
1111 NOP no operation

NOP and unsupported opcodes leave the PC, registers, and data memory unchanged.

ALU Functions

R-type instructions use funct3.

funct3 Operation
000 ADD
001 SUB
010 AND
011 OR
100 XOR
101 Shift left
110 Shift right
111 Defaults to ADD

Shift operations use the lower three bits of the second ALU operand as the shift amount, giving a range from 0 to 7 positions.

The branch instructions use the SUB path internally to compare rs1 and rs2 and generate comparison flags. The less-than comparison is signed, so both operands are interpreted as 8-bit two's-complement values.

Worked Examples

Example 1: Basic Register And ALU Sequence

Step Instruction Encoding Result on uo_out
1 LI r1, 4 0x2204 4
2 ADDI r4, r1, 3 0x1843 7
3 ADDI r5, r4, 2 0x1B02 9
4 ADD r6, r4, r5 0x0D28 16

Explanation:

r1 = 4
r4 = 4 + 3 = 7
r5 = 7 + 2 = 9
r6 = 7 + 9 = 16

Example 2: Load Immediate And Negative Immediate

Instruction Encoding Result
LI r6, 42 0x2C2A r6 = 42
ADDI r7, r6, -1 0x1FBF r7 = 41

The second instruction uses imm6 = 0x3F, which sign-extends to 0xFF and acts as -1 in the 8-bit datapath.

Example 3: Store And Load

Step Instruction Encoding Effect
1 LI r1, 4 0x2204 r1 = 4
2 ADDI r4, r4, 16 0x1910 r4 = 16
3 SW r1, [r4 + 8] 0x4308 writes 4 to memory index 8
4 LW r6, [r4 + 8] 0x3D08 r6 = 4

The address is 16 + 8 = 24. Because data memory uses only the lower 4 address bits, address 24 accesses memory index 8.

Example 4: Taken Branch

Step Instruction Encoding Behavior
1 LI r1, 4 0x2204 r1 = 4
2 BEQ r1, r1, +4 0x504C branch is taken

After LI, the current PC is 1, so the taken branch updates it to:

pc + imm6 = 1 + 4 = 5

Running The RTL Simulation

The repository includes a cocotb end-to-end testbench that drives instructions through the Tiny Tapeout top-level pins and checks reset behavior, ALU instructions, immediate instructions, memory access, branches, and PC updates.

From the repository root:

cd test
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
make

The expected result is:

TESTS=1 PASS=1 FAIL=0

Running Gate-Level Simulation

After local hardening or a successful GDS flow, copy the powered netlist into the test directory and run the test with GATES=yes.

For GF180 local hardening, the netlist is normally:

runs/wokwi/final/pnl/tt_um_8bit_risc_cpu.pnl.v

Example:

export PDK=gf180mcuD
export PDK_ROOT=/path/to/pdk/root

cp runs/wokwi/final/pnl/tt_um_8bit_risc_cpu.pnl.v test/gate_level_netlist.v
cd test
make -B GATES=yes

The expected result is again:

TESTS=1 PASS=1 FAIL=0

Running Local Hardening

This project targets Tiny Tapeout GF, so Tiny Tapeout tooling should be run with the --gf flag.

If the tt/ support-tools directory is not present, clone it first:

git clone https://github.com/TinyTapeout/tt-support-tools tt

Typical local setup:

python3 -m venv ~/ttsetup/venv
source ~/ttsetup/venv/bin/activate
pip install --upgrade pip
pip install -r tt/requirements.txt

export PDK_ROOT=~/ttsetup/pdk
export PDK=gf180mcuD
export LIBRELANE_TAG=3.0.3
pip install librelane==$LIBRELANE_TAG

Create the merged config and run hardening:

./tt/tt_tool.py --create-user-config --gf
./tt/tt_tool.py --harden --gf
./tt/tt_tool.py --print-warnings --gf

If the PDK is installed through Ciel but gate-level simulation cannot find gf180mcuD, enable the installed GF180 PDK:

ciel enable --pdk-root ~/ttsetup/pdk --pdk-family gf180mcu <version>

Then rerun the hardening or gate-level simulation command.

If hardening succeeds, the final GDS, LEF, netlists, and metrics are placed under:

runs/wokwi/final/

Hardware Usage

On real Tiny Tapeout hardware, an external controller such as a microcontroller, FPGA, or test setup must provide the clock, reset, and instruction input.

The most important rule is to hold each instruction stable for two clock cycles. Then read uo_out during the execute phase for the result and during the next NOP / PC phase for the updated program counter.

IO

#InputOutputBidirectional
0instruction[8]CPU_Out[0]instruction[0]
1instruction[9]CPU_Out[1]instruction[1]
2instruction[10]CPU_Out[2]instruction[2]
3instruction[11]CPU_Out[3]instruction[3]
4instruction[12]CPU_Out[4]instruction[4]
5instruction[13]CPU_Out[5]instruction[5]
6instruction[14]CPU_Out[6]instruction[6]
7instruction[15]CPU_Out[7]instruction[7]

Chip location

Controller Mux Mux Mux Mux Mux Mux Mux Mux Mux Mux Mux Mux Mux Mux Mux tt_um_chip_rom (Chip ROM) tt_um_factory_test (Tiny Tapeout Factory Test) tt_um_utoss_riscv (UTOSS RISC-V core) tt_um_memory_game_top (Number Memory Game) tt_um_danielpenas42 (Ball Display) tt_um_machinelearning (7-Segment Neural Predictor) tt_um_microlane_demo (microlane demo project) tt_um_pixel_processor (Tiny Pixel Processor) tt_um_jpigdon_gps_accelerator_top (GPS_Accelerator) tt_um_rgb_mixer (rgb_mixer) tt_um_bgao43 (Tiny TPU Systolic Array) tt_um_main (Pong in Verilog) tt_um_joannec34_teenytpu (teenytpu) tt_um_apa102_ws2812_squidgeefish (APA102 to WS2812 Translator) tt_um_uacj_bouncing_DVD_screensaver (Custom DVD Screensaver for VGA) tt_um_logoUACJ_MOGA (VGA_screensaver_UACJ) tt_um_grace_spi_led_driver (SPI-Controlled 8-Channel LED Driver) tt_um_rebeccargb_universal_decoder (Universal Binary to Segment Decoder) tt_um_rebeccargb_hardware_utf8 (Hardware UTF Encoder/Decoder) tt_um_happyhop_deadcast2 (happyhop) tt_um_dino7 (Dino-7: 7-Segment Runner Game) tt_um_arty3_mac_engine (Simple MAC Engine w/ Postproc) tt_um_uacj (Custom DVD Screensaver for VGA) tt_um_algofoogle_dottee (DOTTEE VGA demo (TTGF26a)) tt_um_mattvenn_signal_generator (Simple Signal Generator) tt_um_urish_simon (Simon Says memory game) tt_um_tpu (Tensor Processing Unit For GF) tt_um_gojimmypi_ttgf_UART_FSM_TRNG_Lab (Hardware Entropy Explorer: UART/SPI TRNG and PUF) tt_um_wokwi_465483277165299713 (First Tinytapeout) tt_um_prem_pipeline_test (Programmable_Pipeline-RISC-V) tt_um_wokwi_467219410242853889 (Tiny Tapeout testtest 111233) tt_um_wokwi_465549494272929793 (Pacos first design) tt_um_wokwi_465731371445677057 (Arturo's first Wokwi design) tt_um_wokwi_465732744934845441 (Tiny Tapeout Template_1234) tt_um_wokwi_465736492859711489 (Tiny Tapeout Workshop JuanF) tt_um_wokwi_465731430225727489 (Rafa’s first Wokwi design) tt_um_wokwi_465731458365332481 (7 segment Display Fli-Flop Try-out) tt_um_wokwi_465732744245929985 (DiseñoCursoTiny) tt_um_wokwi_465731490568160257 (Matt’s first Wokwi design) tt_um_wokwi_465736691688630273 (test1) tt_um_wokwi_465731458628527105 (Mi copia del Tiny Tapeout) tt_um_wokwi_465731520738845697 (El primer diseño) tt_um_wokwi_465731521356457985 (Tiny Tapeout Template Copy) tt_um_gen1_digital_companion_tile (Gen1 Digital Companion Tile) tt_um_wokwi_465732827753495553 (Tiny Tapeout Template Ayman) tt_um_wokwi_465731394728267777 (Julian_Proyecto) tt_um_wokwi_465731458535202817 (Tiny Tapeout Template Copy) tt_um_wokwi_465732847401723905 (Basic Circuit) tt_um_wokwi_465731452481768449 (El primer diseño de Matt para Wokwi) tt_um_wokwi_465731502018614273 (Tiny Tapeout Template flip flop) tt_um_wokwi_465732616714924033 (Tiny Tapeout RJAP) tt_um_wokwi_465731575275296769 (ocxpkeWokwiDesign) tt_um_wokwi_465732880722332673 (Pedro Template) tt_um_wokwi_465731858252480513 (Paula's first Wokwi design) tt_um_wokwi_465731455677830145 (Tiny Tapeout JMCG) tt_um_wokwi_465737601403996161 (Tiny Number Simon) tt_um_ttmul (Balanced Ternary Multiplier) tt_um_wokwi_465731466664816641 (Tiny Tapeout Workshop Malaga 2jun2026) tt_um_8bit_risc_cpu (8-bit RISC CPU) tt_um_wokwi_451184391728659457 (Simple Sprinkler) tt_um_fhw_appel_spiPWMio (spiPWMio) tt_um_divadnauj_GB_serv_soc_wb (serv_soc_wb) tt_um_8bitcustomcomputer (SAP 8 Bit Computer) tt_um_bioimpedance (Very Low Resource Digital Implementation of Bioimpedance Analysis) tt_um_mgj_bist8 (BIST-8: Built-In Self-Test for 8-bit CLA Adder) tt_um_roberto_tiny_radar_tile (BioPulse Tile) tt_um_systolic_mac_2x2 (2x2 Systolic Array Matrix Multiplier) tt_um_peg_top (2x2 CNN Accelerator PE Grid with UART) tt_um_AlvaroRub_ringcounter (Counter16Outputs) tt_um_wokwi_465731440267947009 (Antonio's first Wokwi design) tt_um_wokwi_465732706576877569 (Guille's first Wokwi design.) tt_um_wokwi_465731481873367041 (MIPS-Lite 8-bit Processor) tt_um_wokwi_465736612213902337 (Juan`s first Worki design) tt_um_wokwi_465731439156454401 (Rhyloo’s first Wokwi design) tt_um_wokwi_465732536551273473 (Tiny Tapeout Marcos Fernandez) tt_um_wokwi_465737290543084545 (Tiny Tapeout Template) tt_um_wokwi_465630130495825921 (ram 1 bit Copy) tt_um_wokwi_465731403724006401 (sdft wokwi 1) tt_um_top (RHD2164-MCU-SPI Bridge) tt_um_line_follower_arvaloez (Line Follower Robot controller) tt_um_xoroshiro64plus_v2 (xoroshiro64) tt_um_ohuettenhofer_tiny_qsim (Tiny Quantum Circuit Simulator) tt_um_santhosh_ring_osc_gf (Ring Oscillator PVT Sensor & TRNG (GF180)) tt_um_santhosh_stoch_stdp_pair_gf (Stochastic neuron + STDP controller (merged, GF180)) tt_um_santhosh_rsd_char_gf (RRAM Characterization Platform (DC sweep + endurance + retention + histogram, GF180)) tt_um_santhosh_xbar_ctrl_gf (Memristive Crossbar Peripheral Controller (GF180)) tt_um_joseph_bf (BF) tt_um_hydrocomms (FSK Modem) tt_um_systolic_array (2x2 MAC Systolic array with DFT) tt_um_kluterirv_rv32e_core (Minimal RV32E SoC with UART Loader) tt_um_algofoogle_ttgf26a_vco (VCO driven by DAC) tt_um_fer_logo_music_vga (UNIZG-FER VGA project) tt_um_maqsudbek_dyadic_pwm (Dyadic PWM) tt_um_waferspace_vga_screensaver (Wafer.space Logo VGA Screensaver) tt_um_htfab_vga_tester (Video mode tester)