207 2x2 Systolic Array TPU

207 : 2x2 Systolic Array TPU

Design render
  • Author: PFW Tiny Tapeout Team
  • Description: A 2x2 systolic-array Tensor Processing Unit that performs signed 8-bit matrix multiplication with 16-bit accumulation, optional fused transpose, and ReLU activation. Pipelined streaming allows overlapped input/output.
  • GitHub repository
  • Open in 3D viewer
  • Clock: 50000000 Hz

How it works

This project implements a small-scale Tensor Processing Unit (TPU) that performs 2x2 matrix multiplications using a systolic array of Multiply-Accumulate (MAC) units. It is designed for the Tiny Tapeout ASIC flow.

Architecture

  • Processing Element (PE): Each PE executes a multiply-accumulate operation and propagates partial data to its neighbors in the systolic array.
  • Systolic Array (2x2): A mesh of 4 PEs wired in systolic fashion. Data flows left-to-right (weights) and top-to-bottom (inputs).
  • Unified Memory: An 8-byte register bank storing both weight and input matrices (4 bytes each).
  • Control Unit: A finite-state machine (FSM) sequences the load, compute, and output phases.
  • MMU Feeder: Schedules data flow between memory, the systolic array, and the output port.

Features

  • Signed 8-bit inputs, 16-bit outputs: Handles signed integers (-128 to 127) and accumulates products in 16-bit precision.
  • Pipelined streaming: Supports overlapped input of the next matrix pair while outputting results of the current multiplication.
  • Fused transpose: Optional on-chip transpose of the second input matrix (enabled via uio_in[1]).
  • ReLU activation: Optional ReLU (clamp negatives to 0) on the output (enabled via uio_in[2]).

Pin Mapping

Pin Direction Signal
ui_in[7:0] Input 8-bit matrix data
uo_out[7:0] Output 8-bit result data (half of 16-bit element)
uio_in[0] Input LOAD_EN: enable loading elements into memory
uio_in[1] Input TRANSPOSE: enable fused transpose of second operand
uio_in[2] Input ACTIVATION: enable ReLU activation on output
uio_out[5] Output STATE0: FSM state bit 0
uio_out[6] Output STATE1: FSM state bit 1
uio_out[7] Output DONE: high when results are ready

How to test

  1. Reset: Assert rst_n low for several clock cycles, then release.
  2. Load matrices: Set uio_in[0] (LOAD_EN) high and send 8 bytes via ui_in one per clock cycle: 4 weight elements followed by 4 input elements (row-major order).
  3. Wait for computation: Deassert LOAD_EN and wait. The DONE signal (uio_out[7]) goes high when results are available.
  4. Read results: Each of the 4 output elements is 16 bits, streamed as 2 consecutive 8-bit values on uo_out (high byte first). 8 clock cycles to read all 4 elements.
  5. Continuous operation: You can begin loading the next matrix pair while reading results (pipelined).

Simulation

cd test
pip install -r requirements.txt
make

This runs cocotb tests via Icarus Verilog covering identity multiply, general multiply, signed values, and ReLU activation.

External hardware

No external hardware required for basic operation. Connect ui_in and uio_in to a microcontroller or FPGA for data input and control. Connect uo_out and uio_out to read results and status.

IO

#InputOutputBidirectional
0DATA[0] matrix element bit 0OUT[0] result data bit 0LOAD_EN (input) enable matrix element loading
1DATA[1] matrix element bit 1OUT[1] result data bit 1TRANSPOSE (input) fused transpose of second operand
2DATA[2] matrix element bit 2OUT[2] result data bit 2ACTIVATION (input) enable ReLU on output
3DATA[3] matrix element bit 3OUT[3] result data bit 3(unused)
4DATA[4] matrix element bit 4OUT[4] result data bit 4(unused)
5DATA[5] matrix element bit 5OUT[5] result data bit 5STATE0 (output) FSM state bit 0
6DATA[6] matrix element bit 6OUT[6] result data bit 6STATE1 (output) FSM state bit 1
7DATA[7] matrix element bit 7OUT[7] result data bit 7DONE (output) high when results are ready

Chip location

Controller Mux Mux Mux Mux Mux Mux Mux Mux Mux Mux Analog Mux Mux Mux Mux Mux Mux Mux Mux Mux Mux Analog Mux Mux Mux Mux Analog Mux Mux Mux Mux Mux Mux tt_um_chip_rom (Chip ROM) tt_um_factory_test (Tiny Tapeout Factory Test) tt_um_oscillating_bones (Oscillating Bones) tt_um_tetrahedral_oscillator (Tetrahedral Oscillator) tt_um_urish_simon (Simon Says memory game) tt_um_c4m_spsram_direct (TTSKY-SPSRAM-direct) tt_um_bgr (sky130 Bandgap Reference) tt_um_floating_bulk_test_2 (Floating-bulk-test-2) tt_um_sker (Bomberman) tt_um_pzhu2 (Hardware Triangle Rasterizer with VGA Output) tt_um_nlanderso_morse_code (Morse Code Translator) tt_um_peterhan_ReactionGame (Reaction Time Game) tt_um_minmanrox_drone (Drone Flight Controller) tt_um_tiny3d_kevinqian11 (Tiny3D) tt_um_abhinavputhran_raycast (raycaster) tt_um_sillylad_top (Tiny Rainbow Snake Game) tt_um_akim_tinydma (TinyDMA-2C) tt_um_jenny82121027_axi4lite (AXI4-Lite Slave Register Demo) tt_um_Edward2005lol_Slot_Machine_Top (Slot Machine) tt_um_amin_hong_ooo_cpu (tiny OoO CPU) tt_um_flappy_vga_Akul18 (Flappy VGA) tt_um_vidishac2004_calc (Keypad Calculator) tt_um_eric_lcc (Tiny_Tapeout_Launch_Controller) tt_um_rwnt_vgatest (Intro_VGA_Playground) tt_um_gurtej_randhawa1_pulsemon8 (PulseMon8) tt_um_28add11_latchup (latchup2026-28add11) tt_um_noah_azz_demo (My First TT Demo) tt_um_llhtimlam_movingscreen (movingscreen) tt_um_harveywong85_harveywilly (harveywilly) tt_um_theandelope_checkers (Checkers) tt_um_rajum_iterativeMAC (Iterative MAC LATCHUP2026) tt_um_calebulboaca_calebcheckers (Caleb's Checkers) tt_um_vga_yusefkarim (ttsky-verilog-yk) tt_um_lfearn_latchup (Latch Up Tiny Tapeout) tt_um_ww_charlieplex (7x8 Charlieplex Array Controller) tt_um_zlj8800_tiny_tapeout_v2 (Chipping Away to Learn about The Chips) tt_um_ocpu (OCPU) tt_um_aelobo (TinyPomodoro) tt_um_jasonbrave_terre (Terre VGA) tt_um_mosbius (mini mosbius) tt_um_fabulous_sky_26b (Tiny FABulous FPGA) tt_um_cycho (Mini Memory Controller) tt_um_erika24 (TinyFarm) tt_um_wokwi_463101366305871873 (Tiny Laura L) tt_um_pcs_link_lite (PCS Lite: Asynchronous 8b10b SerDes) tt_um_sienahlee (18244-s26-tiny-nn) tt_um_basic8 (Basic8 CPU) tt_um_basic_na (basic_national_anthem_buzzer) tt_um_datiuemm (IEEE MBIST & ECC for RAM 8x32) tt_um_CFG_WDT (Configurable WDT) tt_um_top (IEEE_henon) tt_um_fidel_makatia_digital_tapeout (8-bit Accumulator CPU SoC) tt_um_garage_project (IEEE_UPP_Garage unit control) tt_um_wokwi_462595774777167873 (Bypass Universal) tt_um_pwm_4ch (IEEE Multi-Channel PWM Controller ) tt_um_amarjay (mini_cpu) tt_um_blackjack (ttsky-blackjack) tt_um_bartu_kripto (Tiny Crypto Core) tt_um_umitanik_matmul3x3 (3x3 Serial Matrix Multiplier (4-bit)) tt_um_tnt_mosbius (tnt's variant of SKY130 mini-MOSbius) tt_um_rule30_vga (IEEE Rule 30 Cellular Automaton VGA Display) tt_um_authQV (authQV RISC-V CPU) tt_um_nn_3x3 (3x3 Hardware Neural Network (Programmable TPU)) tt_um_top_module_16_mips (16-bit MIPS Single Cycle Processor) tt_um_auth_dmac (AUTh DMA Controller) tt_um_puf (IEEE Ring Oscillator PUF) tt_um_jacob_kebaso_4bit_cpu (Nibble - 4-bit CPU) tt_um_IEEE_perceptron (1-bit Perceptron - Hardware Neuron) tt_um_wokwi_458569964697822209 (Full Adder: Binary Addition Circuit) tt_um_yfoong86_chasey (Chasey) tt_um_dsp_top (Configurable 8-bit Streaming DSP Core) tt_um_processor_top (TinyCrypto-8) tt_um_pro_clk (Programmable Clock Generator) tt_um_wokwi_458951258752539649 (a tour-in the haunted house) tt_um_vga (IEEE Multi-Mode Procedural VGA Graphics Engine) tt_um_happy_birthday (IEEE Happy Birthday Detector) tt_um_galois_lfsr16 (16 bit Galois LFSR based Random number generator-IEEE) tt_um_cordic_ieee (Cordic-based Math processor-IEEE) tt_um_wokwi_462089659615737857 (Mines live or die) tt_um_arfanghani_design2_top (Multi-Mode Sensor Signal Processor) tt_um_arfanghani_design3_top (Heat Stress Alert ASIC) tt_um_arfanghani_design1_top (Water Quality Classifier Core) tt_um_zed_analog (Analog design) tt_um_gen_onda (DDS Waveform Generator - IEEE) tt_um_Richard_Tarqui_contador_uart_simple (UART - Controlled Frecuency Meter & Timer - IEEE) tt_um_iporre_rm121 (IEEE PONG IPORRE VGA) tt_um_digitalclock (Digital Clock!) tt_um_wokwi_462089398612533249 (Sunblock Holiday) tt_um_wokwi_458477197787547649 (FULL SUBTRACTOR) tt_um_wokwi_462165147286899713 (PSI Open IC 2026) tt_um_RaphRaphyRofl_VerilogIEEEBounce (IEEE Letters Screensaver) tt_um_leongamboa_OpenSilicon_SubmissionChapterLogo (Open Silicon 2026: SKY26a Submission - Chapter Logo) tt_um_SollysLe_mac_8bits (8-bit Multiply-Accumulate (MAC) with 2-Cycle Serial Interface) tt_um_AlephNaNsea_decentvgachipIEEEIESIPSPH (Galvantronix, DLSU, and me!) tt_um_IEEE_OpenSilicon_SubmissionCredits (Open Silicon 2026: SKY26a Submission - Chapter Logo) tt_um_Mitchell_s_Approximation_based_EML (IEEE Mitchell-s_Approximation_based_EML) tt_um_tiny_8bit_cpu (IEEE Tiny 8bit CPU) tt_um_dco (Digitally Controlled Oscillator) tt_um_thunder (Ford Thunderbird Rearlights Controller - IEEE OpenSilicon Bootcamp) tt_um_tiny8_risclike (IEEE_CPU with SPI program load and internal execution) tt_um_coffee_chip (IEEEcoffee_chip) tt_um_vga_glyph_mode_clone (Philippine IC Design Boot Camp 2026!) tt_um_alu7b (IEEE 7-bit ALU - Serial Input / Parallel Output) tt_um_AlephNaNsea_space_time_waves_and_filaments (Space-Time Waves and Filaments) tt_um_BFD100_Logic (BDF1000 Line folower) tt_um_Floppy_LIGHT (Floppy LIGHT) tt_um_okforth_ieee (SUBLEQ CPU IEEE) tt_um_magnetofield_ieee (Hackerspace logo IEEE) tt_um_krv8_ieee (A simple 8-bit RISC-V style CPU) tt_um_tile_growth_simulator_NoahW (Tile Growth Simulator) tt_um_prog_clk_router (Programmable Clock Router (IEEE)) tt_um_snk_smart_io_hub (UART Smart I/O Hub) tt_um_rom_vga_screensaver (VGA Screensaver with embedded bitmap ROM) tt_um_eml_gate (EML Serial Coprocessor) tt_um_Nay0805_detector_de_patrones_generados_aleatoreamente (tt_um_Nay0805_detector_de_patrones_generados_aleatoreamente) tt_um_DlynchR_spi_display (tt_um_DlynchR_spi_display) tt_um_scisneros29_BCR (tt_um_scisneros29_BCR) tt_um_sqrt8_ieee (A simple 8-bit square root calculator.) tt_um_ieee_opensilicon_bootcamp (Guess the Number Game - IEEE OpenSilicon Bootcamp) tt_um_wokwi_461639934990157825 (4 bit unlock (IEEE)) tt_um_wokwi_461620354455920641 (4-Bit High-Security Password System (IEEE)) tt_um_KK_VGA01 (KK Zuzel Motocross IEEE) tt_um_wokwi_461622504612675585 (Tiny Tapeout : Lock system v2 (IEEE)) tt_um_riscv_alu (rv32i RISC-V ALU) tt_um_the_siliconimist_chip1 (The Siliconimist Chip1) tt_um_william_pll (Smartcard PLL Clock Generator) tt_um_william_adc8 (Sigma-Delta Bitstream ADC (8-bit)) tt_um_wlmoi_bcd_to_7segment (TTSKY26A BCD to 7-Segment Decoder) tt_um_BillNace_SumItUp (SumItUp Hardware Thread (18-341)) tt_um_sandsim_Alden_G878 (SandSim) tt_um_dma_multi_channel (dma_multi_channel) tt_um_Halcy0nnnn_1 (IEEE_MMU_Cybertron_Logo) tt_um_8_bit_cpu (8-bit CPU) tt_um_morse_code (Translator) tt_um_unified_error_detection (8-Bit Error Detection Engine) tt_um_sobel (Streaming Sobel Edge Detection Accelerator) tt_um_NUPlace2 (VAK FSM) tt_um_youweiterrylu (DMA) tt_um_joo111emad_BGR (Analog BGR) tt_um_izh_neuron (SKY130 Spiking Neuron) tt_um_izh_neuron_4pins (SKY130 Spiking Neuron) tt_um_pmendoza_ieee_tinyscan (Tiny SCAN chain tester) tt_um_rajkamal_analog (IEEE Multi-Stage Configurable Ring Oscillator) tt_um_isalopez9_memory_game (Simon Memory Game Chip) tt_um_usp_didactic ((IEEE) USP OpenSilicio Didactic Testchip) tt_um_bn_lif_evan (Bernoulli Stochastic Multiplier + LIF Neuron) tt_um_advun (tinyWorkshop) tt_um_wokwi_460983138943099905 (Trial IB) tt_um_pfw_tpu (2x2 Systolic Array TPU) tt_um_riscv_gpu (4x4 BitNet b1.58 Matrix Multiply Accelerator) tt_um_tt08_axis_fifo_fwft_bkenololo (IEEE 8-bit AXI4-Stream FWFT FIFO) tt_um_analog_ota_v3_IEEE (TTSKY26a_Miller_OTA(IEEE)) tt_um_quadpulse_pwm (QuadPulse — 4-Channel Servo/Motor PWM ASIC) tt_um_advaittej_stopwatch (V-SPACE Demo: Command & Control Chronograph) tt_um_snn_afib_detector (SNN AFib Detector — Spiking Reservoir Computing Core) tt_um_Halcy0nnnn (IEEE_MMU_Cybertron_Logo) tt_um_baby_cpu (Baby CPU) tt_um_wokwi_462285560117329921 (BCD ID Wowki) tt_um_LAT (Automation Laboratory Logo with author Image) tt_um_dean_foulds_ai_accelerator (Systolic Binary Neural Network Accelerator) tt_um_kazan_rqpu (tt_um_kazan_rqpu) tt_um_ultrasage_danz (IEEE Open-Silicon 2026 x NITHUB: Soil Moisture Irrigation Controller) tt_um_traffic_ctrl (IEEE Open-Silicon 2026: Adaptive Traffic Light Controller with Emergency Override) tt_um_lpf_ieee (Moving average Digital Low pass filter (IEEE open silicon)) tt_um_array_mult_vga (4x4 Array Multiplier with VGA Visualization) tt_um_bfloat16 (IEEE bfloat16_accelerator) tt_um_silicon_art_vga_screensaver (VGA Screensaver with Silicon Art ROM) tt_um_seapanda0 (DSP_FIR) tt_um_datdt_charizard (IEEE VGA Charizard Flamethrower) tt_um_ocd_charlieplex (Charlieplex array controller) tt_um_bytex64_wave_hi (wave_hi) tt_um_STDCELL_LDO (STDCELL_LDO) tt_um_devil_nyancat (Devil Nyan Cat VGA) tt_um_ieee_pwd (PWM Generator) tt_um_petros (TTNN: Pre-trained BNN for 8x8 MNIST) tt_um_Medidor_Jitter (Jitter Metrics & Pulse Analyzer) tt_um_CNN4IC_sky (CNN4IC — Convolutional Neural Network (CNN) for Image Classification on Chip (IEEE)) tt_um_Madd_CS_Ring_Osc (CSRO with 8-bit DAC) tt_um_reaction_game (Reaction game on Simon Says board) tt_um_load_priority_controller (IEEE Open-Silicon 2026: Load Priority Controller) tt_um_ctw_ldo (LDO Regulator Skywater 130nm) tt_um_c4m_legacyspsram_direct (TTSKY-SPSRAM-legacy-direct) tt_um_tpu (Mini TPU v2) tt_um_rcyaon (bandgap-ptat) tt_um_5tOTA (Operational Transconductance Amplifier) tt_um_wokwi_461554799001985025 (inec_voting) tt_um_systolic_array (Custom 3 by 3 Systolic Array) tt_um_chronoINAAL (Digital Stopwatch with LAP mode) tt_um_pree (UART_Analog_IC) tt_um_thorsten_shiftregister (Shiftregister Challenge 40 Bit) tt_um_hamming74 (Hamming(7,4) Encoder/Decoder) tt_um_prathiba_finite_sbox (Finite Field AES S-box) tt_um_maw_game (MAW Bird Shooter VGA Game) tt_um_vga_ascii (ascii_typewriter) tt_um_lstm_wakeword (TTSKY26A Neural Network - LSTM Wake Word Detector) tt_um_bad_apple (test) tt_um_riscv_branch (rv32i RISC-V Branch Condition Unit) tt_um_alu8bit (8-bit Tiny ALU) tt_um_chaotic_rng (C0haotic RNG) tt_um_ik_0_ptat_bgr (Pseudo-PTAT cell based bandgap reference) tt_um_er_ring_osc (Simple Ring Oscillator) tt_um_wokwi_462290658621740033 (IEEE IC Bootcamp Khalifa University) tt_um_ross_systolic (2x2 Systolic Array Matrix Multiplier) tt_um_27jorge05_crc_fifo (CRC_FIFO: CRC-32 Engine with 8-Byte FIFO and VGA Display) tt_um_jonathanbytes_alu8_serial (ALU8 Serial (IEEE)) tt_um_vmm_bnn (Nano-Bnn-Accelerator) tt_um_Onchip_TrafficLight (Onchip-UIS Traffic Light) tt_um_rebeccargb_universal_decoder (Universal Binary to Segment Decoder) tt_um_db_PWM (Onchip-UIS PWM Generator ) tt_um_ccollatz_SO (Onchip-UIS Collatz Conjecture) tt_um_rebeccargb_hardware_utf8 (Hardware UTF Encoder/Decoder) tt_um_rebeccargb_intercal_alu (INTERCAL ALU) tt_um_rebeccargb_vga_pride (VGA Pride) tt_um_wokwi_462349004652630017 (IEEE Logic Locked Reversible 2-Bit ALU) tt_um_andriansyah_capless_ldo (capless LDO regulator with 51.1dB PSRR at 100kHz) tt_um_ramp_adc (ttsky26b-ramp-adc) tt_um_alu_7bits (ALU 7 Bits) tt_um_ALU_Porca (Onchip-UIS 8-bit ALU with Status Flags) tt_um_oreoluwa_water_level (IEEE Open-silicon 2026 x NITHUB: Fluid Level Detector and Controller) tt_um_wokwi_464171439964087297 (First Silicon) tt_um_wokwi_464173578877001729 (Tiny Tapeout Template - PJ v2) tt_um_krisjdev_artwork (Silicon Artwork) tt_um_wokwi_464171399090591745 (tiny-tapeout-2026-05-16) tt_um_wokwi_464176621517795329 (Tiny Tapeout Run1) tt_um_wokwi_464178664603376641 (Tiny Tapetest) tt_um_wokwi_464171361019935745 (Tiny Tapeout Template Copy) tt_um_wokwi_464177144942873601 (TinyTapeout_Hackaday_Daniel) tt_um_wokwi_464171521208810497 (Daniel's first chip (Tiny Tapeout)) tt_um_wokwi_464171464939073537 (Claire's first Wokwi design) tt_um_wokwi_464176181065476097 (8-bit counter) tt_um_hackin7_coprocessor (AoC Hardcaml Coprocessor) tt_um_wokwi_464171453853527041 (Tiny Tapeout Hackaday 2026) tt_um_wokwi_464171864719209473 (Everton - Tiny Tapeout Workshop LC26) tt_um_ml_coprocessor (Kunal ML co-processor) tt_um_rahulbhagwat_brainamp_lna (brainamp-ac-coupled-lna) tt_um_Onchip_adder_NM (Onchip-UIS 4-bit Ripple Carry Adder) tt_um_wokwi_463557428446691329 (3Bit_yALU_IEEE_V2) tt_um_Onchip_Trimmed_BandGap (Onchip-UIS 3-bit Trimmed 1.2V BandGap) tt_um_ascon_cxof_chain (ASCON-CXOF128 Hash-Chain Accelerator) tt_um_Onchip_Freq_Divider_Dig (Onchip-UIS CLK Frequency Divider) tt_um_bleeptrack_cc2 (Recursive Rectangles) tt_um_enjimneering_spi_mem (SPI Memory Test) tt_um_voltrare (UART SPI ASCII Art) tt_um_enrico_glr (Secret Guessing Game) tt_um_gitragi_rng (Logic-Locked 5-Bit RNGy) tt_um_ece298A_analog_r4 (ECE298A analog tile) tt_um_trinity_nano (TRI-1 Phi — Trinity φ-anchor 1×1 Lucas POST + CLARA Gap-4) tt_um_ghtag_trinity_gf16 (TRI-1 Euler — Trinity e-engine 8×2 SUPER-CROWN + 10 CLARA Gaps) tt_um_lujji_ulogic_analyzer (ulogic_analyzer) tt_um_catalinlazar_adpll_125m_sky130 (127-stage Coarse-Tapped ADPLL) tt_um_vga_sharc_demo (SHaRC VGA Demo) tt_um_digit_serial_divider (IEEE | 24-Bit Serial Fixed-Point Binary Divider) tt_um_xeniarose_sbox (AES S-Box / PRESENT) tt_um_main_fsm_anbui_uci (Swarm Microrobot Drug Delivery FSM) tt_um_RO_aging (Onchip-UIS Ring Oscillators for Aging) tt_um_trinity_max_true (TRI-1 Gamma — MAX-TRUE NEUROMORPHIC FLAGSHIP 32-tile 8-column) tt_um_gray_sobel (tt_um_sobel_threshold) tt_um_c0d3d1_ldo (tt26b-Babies-First-LDO) tt_um_Bio_SSG_ (Bio-SSG) tt_um_nezumi_tech_adc_sq_compare (TT ADC SQ Compare) tt_um_c4m_spsram_direct_librelane (TTSKY-SPSRAM-direct-librelane) tt_um_tinycgra (tinyCGRA 2x2) tt_um_opensilicio_5g_rectifier (5 GHz RF-DC Rectifier) tt_um_sky_pll (SKY PLL test project) tt_um_rv32_vga (Systolic VGA Visualizer) tt_um_tron_game (TRON: Light Cycles game with VGA support (IEEE)) tt_um_wearlevel_controller (Hardware EEPROM Wear-Leveling Controller) tt_um_enjimneering_bss_uart (BSS UART) tt_um_wokwi_458489231265343489 (EDS workshop 4bit adder) tt_um_wokwi_464171612496799745 (Tiny Tapeout Exercise) tt_um_wokwi_464178459384432641 (Tiny Tapeout Template Copy) tt_um_leozqi_onetile (OneTile!) tt_um_d_4_array_multiplier (3020 Test Repo 4x4 Array Multiplier) tt_um_adithya_selvakumar_vco (4-Stage Differential Ring VCO) tt_um_snk_pwm_uart (PWM UART Controller) Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available