
This project is a compact 16-bit MIPS-like single-cycle CPU for Tiny Tapeout. The design is intentionally small: it uses a 3-bit program counter, a 4-register file, a 2-word data memory, and a load/run interface on the Tiny Tapeout pins so instructions can be written into internal instruction memory at runtime.
The top-level module, tt_um_top_module_16_mips, exposes a very small control interface. When ui_in[7] = 0, the design is in load mode and the instruction memory can be written one byte at a time through uio_in[7:0]. When ui_in[7] = 1, the CPU runs and executes one instruction per clock cycle.
In run mode, the lower 8 bits of the ALU result appear on uo_out[7:0] and the upper 8 bits appear on uio_out[7:0]. In load mode, uo_out[7:0] mirrors the input byte for easier bring-up, and uio_oe[7:0] is deasserted so the bidirectional pins behave as inputs.
uo_out[7:0] and uio_out[7:0]ui_in[7]: mode select, 0 = load, 1 = runui_in[6]: byte select, 0 = low byte, 1 = high byteui_in[2:0]: instruction address during load modeuio_in[7:0]: instruction byte during load modeuo_out[7:0]: lower 8 bits of ALU result in run mode, or the load-mode echo of uio_in[7:0]uio_out[7:0]: upper 8 bits of ALU result in run modeuio_oe[7:0]: output enable mask, 0x00 in load mode and 0xFF in run modeTo load one 16-bit instruction:
ui_in[7] = 0 to enter load mode.ui_in[2:0].uio_in[7:0] with ui_in[6] = 0.uio_in[7:0] with ui_in[6] = 1.ui_in[7] = 1 to enter run mode.The instruction memory is written on the rising edge of clk whenever ena = 1 and the design is in load mode.
The decoder uses the following bit layout:
opcode[15:12] | rs[11:10] | rt[9:8] | rd[7:6] | unused[5:0]opcode[15:12] | rs[11:10] | rt[9:8] | imm[7:0]opcode[15:12] | unused[11:3] | target[2:0]Supported opcodes:
0000: add0001: sub0010: xor0011: or0100: lw0101: sw0110: addi0111: jumpThe CPU is single-cycle: each instruction is fetched, decoded, executed, and written back in one clock cycle. The program counter increments by 1 on each enabled rising edge unless a jump instruction redirects it.
The register file contains four architectural registers addressed by 2-bit register indices. Register r0 is hard-wired to zero on reads and cannot be overwritten.
The ALU supports add, sub, or, and xor. For load and store instructions, the ALU computes the effective address using the base register plus a sign-extended 8-bit immediate.
The repository includes both a Verilog behavioral testbench and a cocotb test. They both:
Run the tests from the test/ directory with:
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
make
uio_out is not a program-counter output; it carries the upper byte of the ALU result in run mode.src/config.json: PL_TARGET_DENSITY_PCT was increased from 60 to 80.| # | Input | Output | Bidirectional |
|---|---|---|---|
| 0 | Instruction address bit 0 during load mode | ALU_out[0] - Result bit 0 | Load input byte bit 0 / ALU_out[8] in run mode |
| 1 | Instruction address bit 1 during load mode | ALU_out[1] - Result bit 1 | Load input byte bit 1 / ALU_out[9] in run mode |
| 2 | Instruction address bit 2 during load mode | ALU_out[2] - Result bit 2 | Load input byte bit 2 / ALU_out[10] in run mode |
| 3 | Unused | ALU_out[3] - Result bit 3 | Load input byte bit 3 / ALU_out[11] in run mode |
| 4 | Unused | ALU_out[4] - Result bit 4 | Load input byte bit 4 / ALU_out[12] in run mode |
| 5 | Unused | ALU_out[5] - Result bit 5 | Load input byte bit 5 / ALU_out[13] in run mode |
| 6 | Byte select: 0=low byte, 1=high byte | ALU_out[6] - Result bit 6 | Load input byte bit 6 / ALU_out[14] in run mode |
| 7 | Mode select: 0=load, 1=run | ALU_out[7] - Result bit 7 | Load input byte bit 7 / ALU_out[15] in run mode |