
This project implements a Mini Tensor Processing Unit (Mini-TPU) on the Tiny Tapeout open-source ASIC platform. It features an SPI interface for instruction/memory and a compact 3ร3 systolic array optimized for efficient matrix multiplication, making it ideal for resource-constrained AI inference tasks.
โจ Built using Tiny Tapeout and Skywater 130nm PDK
๐ฏ Educational, efficient, and open-source
The Mini-TPU is structured around a weight-stationary systolic array for accelerating matrix multiplication tasks.
Key components:
Once data is loaded into the PE from SPI, the TPU executes the multiplication by propagating inputs through the systolic array and accumulating results in place.
The Mini-TPU supports a minimal 12-bit instruction set for memory access and computation:
| Instruction | Format (Binary) | Description |
|---|---|---|
LOAD m, r, c, x |
10m0 rrcc xxxxxxxx |
Load 4-bit data x into memory m (0 = A, 1 = B) at row r, column c |
STORE r, c |
1100 rrcc 00000000 |
Store result from array row r, column c |
RUN |
0100 0000 00000000 |
Trigger systolic array to compute for 7 cycles |
This simple ISA allows deterministic control over all TPU behavior, suitable for small-scale AI inference use cases.
cocotb or SystemVerilog testbenches| # | Input | Output | Bidirectional |
|---|---|---|---|
| 0 | mosi | result[0] | miso |
| 1 | cs | result[1] | |
| 2 | sclk | result[2] | |
| 3 | result[3] | ||
| 4 | result[4] | ||
| 5 | result[5] | ||
| 6 | result[6] | ||
| 7 | result[7] |