41 Mini TPU v2

41 : Mini TPU v2

Design render
  • Author: Miloudi Adel Hani and Sotehi Yacine and Yassine Hamzi and Talhi Mehdi Boudraa Nadjib and Dennis Du and Rick Gao
  • Description: The systolic array for the Mini TPU v2 in Tiny Tapeout is a compact, parallel processing architecture optimized for matrix multiplications, utilizing a 3x3 array of processing elements to efficiently perform multiply-accumulate (MAC) operations., while the input and wheights can be fetched externally through SPI, built with funding from IEEE
  • GitHub repository
  • Open in 3D viewer
  • Clock: 5000000 Hz

๐Ÿง  Mini-TPU: A Tiny Tapeout-Based Systolic Array Accelerator

This project implements a Mini Tensor Processing Unit (Mini-TPU) on the Tiny Tapeout open-source ASIC platform. It features an SPI interface for instruction/memory and a compact 3ร—3 systolic array optimized for efficient matrix multiplication, making it ideal for resource-constrained AI inference tasks.

โœจ Built using Tiny Tapeout and Skywater 130nm PDK
๐ŸŽฏ Educational, efficient, and open-source


๐Ÿ” How it Works

The Mini-TPU is structured around a weight-stationary systolic array for accelerating matrix multiplication tasks.

Key components:

  • 3ร—3 Processing Element (PE) array for 4-bit MAC operations
  • SPI off-chip memory for activations (Memory A) and weights (Memory B)
  • Control Unit to execute custom instructions and orchestrate computation
  • Output-stationary dataflow with pipelined MAC accumulation

Once data is loaded into the PE from SPI, the TPU executes the multiplication by propagating inputs through the systolic array and accumulating results in place.


๐Ÿ”ง Instruction Format

The Mini-TPU supports a minimal 12-bit instruction set for memory access and computation:

Instruction Format (Binary) Description
LOAD m, r, c, x 10m0 rrcc xxxxxxxx Load 4-bit data x into memory m (0 = A, 1 = B) at row r, column c
STORE r, c 1100 rrcc 00000000 Store result from array row r, column c
RUN 0100 0000 00000000 Trigger systolic array to compute for 7 cycles

This simple ISA allows deterministic control over all TPU behavior, suitable for small-scale AI inference use cases.


๐Ÿงช How to Test

๐Ÿ–ฅ๏ธ Simulation
  • Simulate the RTL using cocotb or SystemVerilog testbenches
  • Use included Python reference model for golden comparisons
  • Testbench components:
    • Driver: sends LOAD, RUN, STORE sequences
    • Monitor: samples outputs
    • Scoreboard: compares with expected values

IO

#InputOutputBidirectional
0mosiresult[0]miso
1csresult[1]
2sclkresult[2]
3result[3]
4result[4]
5result[5]
6result[6]
7result[7]

Chip location

Controller Mux Mux Mux Mux Mux Mux Mux Mux Mux Mux Analog Mux Mux Mux Mux Mux Mux Mux Mux Mux Mux Analog Mux Mux Mux Mux Analog Mux Mux Mux Mux Mux Mux tt_um_chip_rom (Chip ROM) tt_um_factory_test (Tiny Tapeout Factory Test) tt_um_oscillating_bones (Oscillating Bones) tt_um_tetrahedral_oscillator (Tetrahedral Oscillator) tt_um_urish_simon (Simon Says memory game) tt_um_c4m_spsram_direct (TTSKY-SPSRAM-direct) tt_um_bgr (sky130 Bandgap Reference) tt_um_floating_bulk_test_2 (Floating-bulk-test-2) tt_um_sker (Bomberman) tt_um_pzhu2 (Hardware Triangle Rasterizer with VGA Output) tt_um_nlanderso_morse_code (Morse Code Translator) tt_um_peterhan_ReactionGame (Reaction Time Game) tt_um_minmanrox_drone (Drone Flight Controller) tt_um_tiny3d_kevinqian11 (Tiny3D) tt_um_abhinavputhran_raycast (raycaster) tt_um_sillylad_top (Tiny Rainbow Snake Game) tt_um_akim_tinydma (TinyDMA-2C) tt_um_jenny82121027_axi4lite (AXI4-Lite Slave Register Demo) tt_um_Edward2005lol_Slot_Machine_Top (Slot Machine) tt_um_amin_hong_ooo_cpu (tiny OoO CPU) tt_um_flappy_vga_Akul18 (Flappy VGA) tt_um_vidishac2004_calc (Keypad Calculator) tt_um_eric_lcc (Tiny_Tapeout_Launch_Controller) tt_um_rwnt_vgatest (Intro_VGA_Playground) tt_um_gurtej_randhawa1_pulsemon8 (PulseMon8) tt_um_28add11_latchup (latchup2026-28add11) tt_um_noah_azz_demo (My First TT Demo) tt_um_llhtimlam_movingscreen (movingscreen) tt_um_harveywong85_harveywilly (harveywilly) tt_um_theandelope_checkers (Checkers) tt_um_rajum_iterativeMAC (Iterative MAC LATCHUP2026) tt_um_calebulboaca_calebcheckers (Caleb's Checkers) tt_um_vga_yusefkarim (ttsky-verilog-yk) tt_um_lfearn_latchup (Latch Up Tiny Tapeout) tt_um_ww_charlieplex (7x8 Charlieplex Array Controller) tt_um_zlj8800_tiny_tapeout_v2 (Chipping Away to Learn about The Chips) tt_um_ocpu (OCPU) tt_um_aelobo (TinyPomodoro) tt_um_jasonbrave_terre (Terre VGA) tt_um_mosbius (mini mosbius) tt_um_fabulous_sky_26b (Tiny FABulous FPGA) tt_um_cycho (Mini Memory Controller) tt_um_erika24 (TinyFarm) tt_um_wokwi_463101366305871873 (Tiny Laura L) tt_um_pcs_link_lite (PCS Lite: Asynchronous 8b10b SerDes) tt_um_sienahlee (18244-s26-tiny-nn) tt_um_basic8 (Basic8 CPU) tt_um_basic_na (basic_national_anthem_buzzer) tt_um_datiuemm (IEEE MBIST & ECC for RAM 8x32) tt_um_CFG_WDT (Configurable WDT) tt_um_top (IEEE_henon) tt_um_fidel_makatia_digital_tapeout (8-bit Accumulator CPU SoC) tt_um_garage_project (IEEE_UPP_Garage unit control) tt_um_wokwi_462595774777167873 (Bypass Universal) tt_um_pwm_4ch (IEEE Multi-Channel PWM Controller ) tt_um_amarjay (mini_cpu) tt_um_blackjack (ttsky-blackjack) tt_um_bartu_kripto (Tiny Crypto Core) tt_um_umitanik_matmul3x3 (3x3 Serial Matrix Multiplier (4-bit)) tt_um_tnt_mosbius (tnt's variant of SKY130 mini-MOSbius) tt_um_rule30_vga (IEEE Rule 30 Cellular Automaton VGA Display) tt_um_authQV (authQV RISC-V CPU) tt_um_nn_3x3 (3x3 Hardware Neural Network (Programmable TPU)) tt_um_top_module_16_mips (16-bit MIPS Single Cycle Processor) tt_um_auth_dmac (AUTh DMA Controller) tt_um_puf (IEEE Ring Oscillator PUF) tt_um_jacob_kebaso_4bit_cpu (Nibble - 4-bit CPU) tt_um_IEEE_perceptron (1-bit Perceptron - Hardware Neuron) tt_um_wokwi_458569964697822209 (Full Adder: Binary Addition Circuit) tt_um_yfoong86_chasey (Chasey) tt_um_dsp_top (Configurable 8-bit Streaming DSP Core) tt_um_processor_top (TinyCrypto-8) tt_um_pro_clk (Programmable Clock Generator) tt_um_wokwi_458951258752539649 (a tour-in the haunted house) tt_um_vga (IEEE Multi-Mode Procedural VGA Graphics Engine) tt_um_happy_birthday (IEEE Happy Birthday Detector) tt_um_galois_lfsr16 (16 bit Galois LFSR based Random number generator-IEEE) tt_um_cordic_ieee (Cordic-based Math processor-IEEE) tt_um_wokwi_462089659615737857 (Mines live or die) tt_um_arfanghani_design2_top (Multi-Mode Sensor Signal Processor) tt_um_arfanghani_design3_top (Heat Stress Alert ASIC) tt_um_arfanghani_design1_top (Water Quality Classifier Core) tt_um_zed_analog (Analog design) tt_um_gen_onda (DDS Waveform Generator - IEEE) tt_um_Richard_Tarqui_contador_uart_simple (UART - Controlled Frecuency Meter & Timer - IEEE) tt_um_iporre_rm121 (IEEE PONG IPORRE VGA) tt_um_digitalclock (Digital Clock!) tt_um_wokwi_462089398612533249 (Sunblock Holiday) tt_um_wokwi_458477197787547649 (FULL SUBTRACTOR) tt_um_wokwi_462165147286899713 (PSI Open IC 2026) tt_um_RaphRaphyRofl_VerilogIEEEBounce (IEEE Letters Screensaver) tt_um_leongamboa_OpenSilicon_SubmissionChapterLogo (Open Silicon 2026: SKY26a Submission - Chapter Logo) tt_um_SollysLe_mac_8bits (8-bit Multiply-Accumulate (MAC) with 2-Cycle Serial Interface) tt_um_AlephNaNsea_decentvgachipIEEEIESIPSPH (Galvantronix, DLSU, and me!) tt_um_IEEE_OpenSilicon_SubmissionCredits (Open Silicon 2026: SKY26a Submission - Chapter Logo) tt_um_Mitchell_s_Approximation_based_EML (IEEE Mitchell-s_Approximation_based_EML) tt_um_tiny_8bit_cpu (IEEE Tiny 8bit CPU) tt_um_dco (Digitally Controlled Oscillator) tt_um_thunder (Ford Thunderbird Rearlights Controller - IEEE OpenSilicon Bootcamp) tt_um_tiny8_risclike (IEEE_CPU with SPI program load and internal execution) tt_um_coffee_chip (IEEEcoffee_chip) tt_um_vga_glyph_mode_clone (Philippine IC Design Boot Camp 2026!) tt_um_alu7b (IEEE 7-bit ALU - Serial Input / Parallel Output) tt_um_AlephNaNsea_space_time_waves_and_filaments (Space-Time Waves and Filaments) tt_um_BFD100_Logic (BDF1000 Line folower) tt_um_Floppy_LIGHT (Floppy LIGHT) tt_um_okforth_ieee (SUBLEQ CPU IEEE) tt_um_magnetofield_ieee (Hackerspace logo IEEE) tt_um_krv8_ieee (A simple 8-bit RISC-V style CPU) tt_um_tile_growth_simulator_NoahW (Tile Growth Simulator) tt_um_prog_clk_router (Programmable Clock Router (IEEE)) tt_um_snk_smart_io_hub (UART Smart I/O Hub) tt_um_rom_vga_screensaver (VGA Screensaver with embedded bitmap ROM) tt_um_eml_gate (EML Serial Coprocessor) tt_um_Nay0805_detector_de_patrones_generados_aleatoreamente (tt_um_Nay0805_detector_de_patrones_generados_aleatoreamente) tt_um_DlynchR_spi_display (tt_um_DlynchR_spi_display) tt_um_scisneros29_BCR (tt_um_scisneros29_BCR) tt_um_sqrt8_ieee (A simple 8-bit square root calculator.) tt_um_ieee_opensilicon_bootcamp (Guess the Number Game - IEEE OpenSilicon Bootcamp) tt_um_wokwi_461639934990157825 (4 bit unlock (IEEE)) tt_um_wokwi_461620354455920641 (4-Bit High-Security Password System (IEEE)) tt_um_KK_VGA01 (KK Zuzel Motocross IEEE) tt_um_wokwi_461622504612675585 (Tiny Tapeout : Lock system v2 (IEEE)) tt_um_riscv_alu (rv32i RISC-V ALU) tt_um_the_siliconimist_chip1 (The Siliconimist Chip1) tt_um_william_pll (Smartcard PLL Clock Generator) tt_um_william_adc8 (Sigma-Delta Bitstream ADC (8-bit)) tt_um_wlmoi_bcd_to_7segment (TTSKY26A BCD to 7-Segment Decoder) tt_um_BillNace_SumItUp (SumItUp Hardware Thread (18-341)) tt_um_sandsim_Alden_G878 (SandSim) tt_um_dma_multi_channel (dma_multi_channel) tt_um_Halcy0nnnn_1 (IEEE_MMU_Cybertron_Logo) tt_um_8_bit_cpu (8-bit CPU) tt_um_morse_code (Translator) tt_um_unified_error_detection (8-Bit Error Detection Engine) tt_um_sobel (Streaming Sobel Edge Detection Accelerator) tt_um_NUPlace2 (VAK FSM) tt_um_youweiterrylu (DMA) tt_um_joo111emad_BGR (Analog BGR) tt_um_izh_neuron (SKY130 Spiking Neuron) tt_um_izh_neuron_4pins (SKY130 Spiking Neuron) tt_um_pmendoza_ieee_tinyscan (Tiny SCAN chain tester) tt_um_rajkamal_analog (IEEE Multi-Stage Configurable Ring Oscillator) tt_um_isalopez9_memory_game (Simon Memory Game Chip) tt_um_usp_didactic ((IEEE) USP OpenSilicio Didactic Testchip) tt_um_bn_lif_evan (Bernoulli Stochastic Multiplier + LIF Neuron) tt_um_advun (tinyWorkshop) tt_um_wokwi_460983138943099905 (Trial IB) tt_um_pfw_tpu (2x2 Systolic Array TPU) tt_um_riscv_gpu (4x4 BitNet b1.58 Matrix Multiply Accelerator) tt_um_tt08_axis_fifo_fwft_bkenololo (IEEE 8-bit AXI4-Stream FWFT FIFO) tt_um_analog_ota_v3_IEEE (TTSKY26a_Miller_OTA(IEEE)) tt_um_quadpulse_pwm (QuadPulse โ€” 4-Channel Servo/Motor PWM ASIC) tt_um_advaittej_stopwatch (V-SPACE Demo: Command & Control Chronograph) tt_um_snn_afib_detector (SNN AFib Detector โ€” Spiking Reservoir Computing Core) tt_um_Halcy0nnnn (IEEE_MMU_Cybertron_Logo) tt_um_baby_cpu (Baby CPU) tt_um_wokwi_462285560117329921 (BCD ID Wowki) tt_um_LAT (Automation Laboratory Logo with author Image) tt_um_dean_foulds_ai_accelerator (Systolic Binary Neural Network Accelerator) tt_um_kazan_rqpu (tt_um_kazan_rqpu) tt_um_ultrasage_danz (IEEE Open-Silicon 2026 x NITHUB: Soil Moisture Irrigation Controller) tt_um_traffic_ctrl (IEEE Open-Silicon 2026: Adaptive Traffic Light Controller with Emergency Override) tt_um_lpf_ieee (Moving average Digital Low pass filter (IEEE open silicon)) tt_um_array_mult_vga (4x4 Array Multiplier with VGA Visualization) tt_um_bfloat16 (IEEE bfloat16_accelerator) tt_um_silicon_art_vga_screensaver (VGA Screensaver with Silicon Art ROM) tt_um_seapanda0 (DSP_FIR) tt_um_datdt_charizard (IEEE VGA Charizard Flamethrower) tt_um_ocd_charlieplex (Charlieplex array controller) tt_um_bytex64_wave_hi (wave_hi) tt_um_STDCELL_LDO (STDCELL_LDO) tt_um_devil_nyancat (Devil Nyan Cat VGA) tt_um_ieee_pwd (PWM Generator) tt_um_petros (TTNN: Pre-trained BNN for 8x8 MNIST) tt_um_Medidor_Jitter (Jitter Metrics & Pulse Analyzer) tt_um_CNN4IC_sky (CNN4IC โ€” Convolutional Neural Network (CNN) for Image Classification on Chip (IEEE)) tt_um_Madd_CS_Ring_Osc (CSRO with 8-bit DAC) tt_um_reaction_game (Reaction game on Simon Says board) tt_um_load_priority_controller (IEEE Open-Silicon 2026: Load Priority Controller) tt_um_ctw_ldo (LDO Regulator Skywater 130nm) tt_um_c4m_legacyspsram_direct (TTSKY-SPSRAM-legacy-direct) tt_um_tpu (Mini TPU v2) tt_um_rcyaon (bandgap-ptat) tt_um_5tOTA (Operational Transconductance Amplifier) tt_um_wokwi_461554799001985025 (inec_voting) tt_um_systolic_array (Custom 3 by 3 Systolic Array) tt_um_chronoINAAL (Digital Stopwatch with LAP mode) tt_um_pree (UART_Analog_IC) tt_um_thorsten_shiftregister (Shiftregister Challenge 40 Bit) tt_um_hamming74 (Hamming(7,4) Encoder/Decoder) tt_um_prathiba_finite_sbox (Finite Field AES S-box) tt_um_maw_game (MAW Bird Shooter VGA Game) tt_um_vga_ascii (ascii_typewriter) tt_um_lstm_wakeword (TTSKY26A Neural Network - LSTM Wake Word Detector) tt_um_bad_apple (test) tt_um_riscv_branch (rv32i RISC-V Branch Condition Unit) tt_um_alu8bit (8-bit Tiny ALU) tt_um_chaotic_rng (C0haotic RNG) tt_um_ik_0_ptat_bgr (Pseudo-PTAT cell based bandgap reference) tt_um_er_ring_osc (Simple Ring Oscillator) tt_um_wokwi_462290658621740033 (IEEE IC Bootcamp Khalifa University) tt_um_ross_systolic (2x2 Systolic Array Matrix Multiplier) tt_um_27jorge05_crc_fifo (CRC_FIFO: CRC-32 Engine with 8-Byte FIFO and VGA Display) tt_um_jonathanbytes_alu8_serial (ALU8 Serial (IEEE)) tt_um_vmm_bnn (Nano-Bnn-Accelerator) tt_um_Onchip_TrafficLight (Onchip-UIS Traffic Light) tt_um_rebeccargb_universal_decoder (Universal Binary to Segment Decoder) tt_um_db_PWM (Onchip-UIS PWM Generator ) tt_um_ccollatz_SO (Onchip-UIS Collatz Conjecture) tt_um_rebeccargb_hardware_utf8 (Hardware UTF Encoder/Decoder) tt_um_rebeccargb_intercal_alu (INTERCAL ALU) tt_um_rebeccargb_vga_pride (VGA Pride) tt_um_wokwi_462349004652630017 (IEEE Logic Locked Reversible 2-Bit ALU) tt_um_andriansyah_capless_ldo (capless LDO regulator with 51.1dB PSRR at 100kHz) tt_um_ramp_adc (ttsky26b-ramp-adc) tt_um_alu_7bits (ALU 7 Bits) tt_um_ALU_Porca (Onchip-UIS 8-bit ALU with Status Flags) tt_um_oreoluwa_water_level (IEEE Open-silicon 2026 x NITHUB: Fluid Level Detector and Controller) tt_um_wokwi_464171439964087297 (First Silicon) tt_um_wokwi_464173578877001729 (Tiny Tapeout Template - PJ v2) tt_um_krisjdev_artwork (Silicon Artwork) tt_um_wokwi_464171399090591745 (tiny-tapeout-2026-05-16) tt_um_wokwi_464176621517795329 (Tiny Tapeout Run1) tt_um_wokwi_464178664603376641 (Tiny Tapetest) tt_um_wokwi_464171361019935745 (Tiny Tapeout Template Copy) tt_um_wokwi_464177144942873601 (TinyTapeout_Hackaday_Daniel) tt_um_wokwi_464171521208810497 (Daniel's first chip (Tiny Tapeout)) tt_um_wokwi_464171464939073537 (Claire's first Wokwi design) tt_um_wokwi_464176181065476097 (8-bit counter) tt_um_hackin7_coprocessor (AoC Hardcaml Coprocessor) tt_um_wokwi_464171453853527041 (Tiny Tapeout Hackaday 2026) tt_um_wokwi_464171864719209473 (Everton - Tiny Tapeout Workshop LC26) tt_um_ml_coprocessor (Kunal ML co-processor) tt_um_rahulbhagwat_brainamp_lna (brainamp-ac-coupled-lna) tt_um_Onchip_adder_NM (Onchip-UIS 4-bit Ripple Carry Adder) tt_um_wokwi_463557428446691329 (3Bit_yALU_IEEE_V2) tt_um_Onchip_Trimmed_BandGap (Onchip-UIS 3-bit Trimmed 1.2V BandGap) tt_um_ascon_cxof_chain (ASCON-CXOF128 Hash-Chain Accelerator) tt_um_Onchip_Freq_Divider_Dig (Onchip-UIS CLK Frequency Divider) tt_um_bleeptrack_cc2 (Recursive Rectangles) tt_um_enjimneering_spi_mem (SPI Memory Test) tt_um_voltrare (UART SPI ASCII Art) tt_um_enrico_glr (Secret Guessing Game) tt_um_gitragi_rng (Logic-Locked 5-Bit RNGy) tt_um_ece298A_analog_r4 (ECE298A analog tile) tt_um_trinity_nano (TRI-1 Phi โ€” Trinity ฯ†-anchor 1ร—1 Lucas POST + CLARA Gap-4) tt_um_ghtag_trinity_gf16 (TRI-1 Euler โ€” Trinity e-engine 8ร—2 SUPER-CROWN + 10 CLARA Gaps) tt_um_lujji_ulogic_analyzer (ulogic_analyzer) tt_um_catalinlazar_adpll_125m_sky130 (127-stage Coarse-Tapped ADPLL) tt_um_vga_sharc_demo (SHaRC VGA Demo) tt_um_digit_serial_divider (IEEE | 24-Bit Serial Fixed-Point Binary Divider) tt_um_xeniarose_sbox (AES S-Box / PRESENT) tt_um_main_fsm_anbui_uci (Swarm Microrobot Drug Delivery FSM) tt_um_RO_aging (Onchip-UIS Ring Oscillators for Aging) tt_um_trinity_max_true (TRI-1 Gamma โ€” MAX-TRUE NEUROMORPHIC FLAGSHIP 32-tile 8-column) tt_um_gray_sobel (tt_um_sobel_threshold) tt_um_c0d3d1_ldo (tt26b-Babies-First-LDO) tt_um_Bio_SSG_ (Bio-SSG) tt_um_nezumi_tech_adc_sq_compare (TT ADC SQ Compare) tt_um_c4m_spsram_direct_librelane (TTSKY-SPSRAM-direct-librelane) tt_um_tinycgra (tinyCGRA 2x2) tt_um_opensilicio_5g_rectifier (5 GHz RF-DC Rectifier) tt_um_sky_pll (SKY PLL test project) tt_um_rv32_vga (Systolic VGA Visualizer) tt_um_tron_game (TRON: Light Cycles game with VGA support (IEEE)) tt_um_wearlevel_controller (Hardware EEPROM Wear-Leveling Controller) tt_um_enjimneering_bss_uart (BSS UART) tt_um_wokwi_458489231265343489 (EDS workshop 4bit adder) tt_um_wokwi_464171612496799745 (Tiny Tapeout Exercise) tt_um_wokwi_464178459384432641 (Tiny Tapeout Template Copy) tt_um_leozqi_onetile (OneTile!) tt_um_d_4_array_multiplier (3020 Test Repo 4x4 Array Multiplier) tt_um_adithya_selvakumar_vco (4-Stage Differential Ring VCO) tt_um_snk_pwm_uart (PWM UART Controller) Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available