835 3x3 Serial Matrix Multiplier (4-bit)

835 : 3x3 Serial Matrix Multiplier (4-bit)

Design render

How it works

This design is a 3×3 unsigned 4-bit matrix multiplier. It computes C = A · B where A and B are 3×3 matrices of 4-bit unsigned integers (values 0–15). Each element of C is the sum of three 8-bit products, so the result is 10 bits wide and cannot overflow: 3 · 15 · 15 = 675 < 2^10.

Because the chip only has 24 user I/O bits, the matrices cannot be loaded in parallel. Instead the design exposes a simple nibble-stream protocol driven by a small finite-state machine:

S_LOAD_A → S_LOAD_B → S_WAIT → ┌── S_COMPUTE ── S_READ ──┐ ...  → S_DONE
                               └─────────────  9× ────────┘
  • S_LOAD_A / S_LOAD_B: every rising edge of LOAD_EN latches the nibble on DATA_IN[3:0] into the next slot of the on-chip A or B memory (row-major, 9 nibbles each).
  • S_COMPUTE: when START rises after both matrices are loaded, the multiplier accumulates one 4×4 product per clock for three cycles, producing a 10-bit element of C.
  • S_READ: OUT_VALID is asserted; the user reads two bytes of the result by pulsing READ_EN. Byte 0 contains bits [9:8], byte 1 contains bits [7:0]. After the second pulse the FSM automatically advances to the next (i, j) and computes again.
  • S_DONE: after the 9th element has been read, DONE is asserted and stays high until reset.

Status pins (BUSY, DONE, OUT_VALID) are exposed on the bidirectional bus so a host MCU can poll progress without bit-banging.

Reset (rst_n low) clears all state and returns to S_LOAD_A.

How to test

The host (MCU, FPGA, USB-GPIO, etc.) drives the chip in this order:

  1. Hold rst_n low for a few clocks, then release.
  2. Load A: for each of the 9 elements (row-major) put the nibble on ui_in[3:0], pulse LOAD_EN high for one clock, low for one clock.
  3. Load B: same as A; the FSM auto-switches after the 9th nibble.
  4. Start: pulse START high for one clock. BUSY stays high.
  5. Read 9 results: for each element, wait for OUT_VALID = 1, then read two bytes on uo_out[7:0], pulsing READ_EN between bytes. The FSM transparently computes the next element while reading.
  6. After the 18th byte, DONE goes high.

Sanity-check vectors:

  • A = I_3, any BC == B.
  • A = B = [[15]*3]*3 → every C[i][j] = 675 (0x02A3, byte0=0x02, byte1=0xA3).

The cocotb testbench in test/test.py covers identity, a small hand-computed case, the maximum-value case, and a randomized case.

External hardware

None. A microcontroller, FPGA, or USB-GPIO interface is enough to drive the nibble-stream protocol. No buffers, level shifters, or analog parts are required.

IO

#InputOutputBidirectional
0DATA_IN[0]DATA_OUT[0]BUSY (out)
1DATA_IN[1]DATA_OUT[1]DONE (out)
2DATA_IN[2]DATA_OUT[2]OUT_VALID (out)
3DATA_IN[3]DATA_OUT[3]LOAD_EN (in)
4DATA_IN[4]DATA_OUT[4]START (in)
5DATA_IN[5]DATA_OUT[5]READ_EN (in)
6DATA_IN[6]DATA_OUT[6]
7DATA_IN[7]DATA_OUT[7]

Chip location

Controller Mux Mux Mux Mux Mux Mux Mux Mux Mux Mux Analog Mux Mux Mux Mux Mux Mux Mux Mux Mux Mux Analog Mux Mux Mux Mux Analog Mux Mux Mux Mux Mux Mux tt_um_chip_rom (Chip ROM) tt_um_factory_test (Tiny Tapeout Factory Test) tt_um_oscillating_bones (Oscillating Bones) tt_um_tetrahedral_oscillator (Tetrahedral Oscillator) tt_um_urish_simon (Simon Says memory game) tt_um_c4m_spsram_direct (TTSKY-SPSRAM-direct) tt_um_bgr (sky130 Bandgap Reference) tt_um_floating_bulk_test_2 (Floating-bulk-test-2) tt_um_sker (Bomberman) tt_um_pzhu2 (Hardware Triangle Rasterizer with VGA Output) tt_um_nlanderso_morse_code (Morse Code Translator) tt_um_peterhan_ReactionGame (Reaction Time Game) tt_um_minmanrox_drone (Drone Flight Controller) tt_um_tiny3d_kevinqian11 (Tiny3D) tt_um_abhinavputhran_raycast (raycaster) tt_um_sillylad_top (Tiny Rainbow Snake Game) tt_um_akim_tinydma (TinyDMA-2C) tt_um_jenny82121027_axi4lite (AXI4-Lite Slave Register Demo) tt_um_Edward2005lol_Slot_Machine_Top (Slot Machine) tt_um_amin_hong_ooo_cpu (tiny OoO CPU) tt_um_flappy_vga_Akul18 (Flappy VGA) tt_um_vidishac2004_calc (Keypad Calculator) tt_um_eric_lcc (Tiny_Tapeout_Launch_Controller) tt_um_rwnt_vgatest (Intro_VGA_Playground) tt_um_gurtej_randhawa1_pulsemon8 (PulseMon8) tt_um_28add11_latchup (latchup2026-28add11) tt_um_noah_azz_demo (My First TT Demo) tt_um_llhtimlam_movingscreen (movingscreen) tt_um_harveywong85_harveywilly (harveywilly) tt_um_theandelope_checkers (Checkers) tt_um_rajum_iterativeMAC (Iterative MAC LATCHUP2026) tt_um_calebulboaca_calebcheckers (Caleb's Checkers) tt_um_vga_yusefkarim (ttsky-verilog-yk) tt_um_lfearn_latchup (Latch Up Tiny Tapeout) tt_um_ww_charlieplex (7x8 Charlieplex Array Controller) tt_um_zlj8800_tiny_tapeout_v2 (Chipping Away to Learn about The Chips) tt_um_ocpu (OCPU) tt_um_aelobo (TinyPomodoro) tt_um_jasonbrave_terre (Terre VGA) tt_um_mosbius (mini mosbius) tt_um_fabulous_sky_26b (Tiny FABulous FPGA) tt_um_cycho (Mini Memory Controller) tt_um_erika24 (TinyFarm) tt_um_wokwi_463101366305871873 (Tiny Laura L) tt_um_pcs_link_lite (PCS Lite: Asynchronous 8b10b SerDes) tt_um_sienahlee (18244-s26-tiny-nn) tt_um_basic8 (Basic8 CPU) tt_um_basic_na (basic_national_anthem_buzzer) tt_um_datiuemm (IEEE MBIST & ECC for RAM 8x32) tt_um_CFG_WDT (Configurable WDT) tt_um_top (IEEE_henon) tt_um_fidel_makatia_digital_tapeout (8-bit Accumulator CPU SoC) tt_um_garage_project (IEEE_UPP_Garage unit control) tt_um_wokwi_462595774777167873 (Bypass Universal) tt_um_pwm_4ch (IEEE Multi-Channel PWM Controller ) tt_um_amarjay (mini_cpu) tt_um_blackjack (ttsky-blackjack) tt_um_bartu_kripto (Tiny Crypto Core) tt_um_umitanik_matmul3x3 (3x3 Serial Matrix Multiplier (4-bit)) tt_um_tnt_mosbius (tnt's variant of SKY130 mini-MOSbius) tt_um_rule30_vga (IEEE Rule 30 Cellular Automaton VGA Display) tt_um_authQV (authQV RISC-V CPU) tt_um_nn_3x3 (3x3 Hardware Neural Network (Programmable TPU)) tt_um_top_module_16_mips (16-bit MIPS Single Cycle Processor) tt_um_auth_dmac (AUTh DMA Controller) tt_um_puf (IEEE Ring Oscillator PUF) tt_um_jacob_kebaso_4bit_cpu (Nibble - 4-bit CPU) tt_um_IEEE_perceptron (1-bit Perceptron - Hardware Neuron) tt_um_wokwi_458569964697822209 (Full Adder: Binary Addition Circuit) tt_um_yfoong86_chasey (Chasey) tt_um_dsp_top (Configurable 8-bit Streaming DSP Core) tt_um_processor_top (TinyCrypto-8) tt_um_pro_clk (Programmable Clock Generator) tt_um_wokwi_458951258752539649 (a tour-in the haunted house) tt_um_vga (IEEE Multi-Mode Procedural VGA Graphics Engine) tt_um_happy_birthday (IEEE Happy Birthday Detector) tt_um_galois_lfsr16 (16 bit Galois LFSR based Random number generator-IEEE) tt_um_cordic_ieee (Cordic-based Math processor-IEEE) tt_um_wokwi_462089659615737857 (Mines live or die) tt_um_arfanghani_design2_top (Multi-Mode Sensor Signal Processor) tt_um_arfanghani_design3_top (Heat Stress Alert ASIC) tt_um_arfanghani_design1_top (Water Quality Classifier Core) tt_um_zed_analog (Analog design) tt_um_gen_onda (DDS Waveform Generator - IEEE) tt_um_Richard_Tarqui_contador_uart_simple (UART - Controlled Frecuency Meter & Timer - IEEE) tt_um_iporre_rm121 (IEEE PONG IPORRE VGA) tt_um_digitalclock (Digital Clock!) tt_um_wokwi_462089398612533249 (Sunblock Holiday) tt_um_wokwi_458477197787547649 (FULL SUBTRACTOR) tt_um_wokwi_462165147286899713 (PSI Open IC 2026) tt_um_RaphRaphyRofl_VerilogIEEEBounce (IEEE Letters Screensaver) tt_um_leongamboa_OpenSilicon_SubmissionChapterLogo (Open Silicon 2026: SKY26a Submission - Chapter Logo) tt_um_SollysLe_mac_8bits (8-bit Multiply-Accumulate (MAC) with 2-Cycle Serial Interface) tt_um_AlephNaNsea_decentvgachipIEEEIESIPSPH (Galvantronix, DLSU, and me!) tt_um_IEEE_OpenSilicon_SubmissionCredits (Open Silicon 2026: SKY26a Submission - Chapter Logo) tt_um_Mitchell_s_Approximation_based_EML (IEEE Mitchell-s_Approximation_based_EML) tt_um_tiny_8bit_cpu (IEEE Tiny 8bit CPU) tt_um_dco (Digitally Controlled Oscillator) tt_um_thunder (Ford Thunderbird Rearlights Controller - IEEE OpenSilicon Bootcamp) tt_um_tiny8_risclike (IEEE_CPU with SPI program load and internal execution) tt_um_coffee_chip (IEEEcoffee_chip) tt_um_vga_glyph_mode_clone (Philippine IC Design Boot Camp 2026!) tt_um_alu7b (IEEE 7-bit ALU - Serial Input / Parallel Output) tt_um_AlephNaNsea_space_time_waves_and_filaments (Space-Time Waves and Filaments) tt_um_BFD100_Logic (BDF1000 Line folower) tt_um_Floppy_LIGHT (Floppy LIGHT) tt_um_okforth_ieee (SUBLEQ CPU IEEE) tt_um_magnetofield_ieee (Hackerspace logo IEEE) tt_um_krv8_ieee (A simple 8-bit RISC-V style CPU) tt_um_tile_growth_simulator_NoahW (Tile Growth Simulator) tt_um_prog_clk_router (Programmable Clock Router (IEEE)) tt_um_snk_smart_io_hub (UART Smart I/O Hub) tt_um_rom_vga_screensaver (VGA Screensaver with embedded bitmap ROM) tt_um_eml_gate (EML Serial Coprocessor) tt_um_Nay0805_detector_de_patrones_generados_aleatoreamente (tt_um_Nay0805_detector_de_patrones_generados_aleatoreamente) tt_um_DlynchR_spi_display (tt_um_DlynchR_spi_display) tt_um_scisneros29_BCR (tt_um_scisneros29_BCR) tt_um_sqrt8_ieee (A simple 8-bit square root calculator.) tt_um_ieee_opensilicon_bootcamp (Guess the Number Game - IEEE OpenSilicon Bootcamp) tt_um_wokwi_461639934990157825 (4 bit unlock (IEEE)) tt_um_wokwi_461620354455920641 (4-Bit High-Security Password System (IEEE)) tt_um_KK_VGA01 (KK Zuzel Motocross IEEE) tt_um_wokwi_461622504612675585 (Tiny Tapeout : Lock system v2 (IEEE)) tt_um_riscv_alu (rv32i RISC-V ALU) tt_um_the_siliconimist_chip1 (The Siliconimist Chip1) tt_um_william_pll (Smartcard PLL Clock Generator) tt_um_william_adc8 (Sigma-Delta Bitstream ADC (8-bit)) tt_um_wlmoi_bcd_to_7segment (TTSKY26A BCD to 7-Segment Decoder) tt_um_BillNace_SumItUp (SumItUp Hardware Thread (18-341)) tt_um_sandsim_Alden_G878 (SandSim) tt_um_dma_multi_channel (dma_multi_channel) tt_um_Halcy0nnnn_1 (IEEE_MMU_Cybertron_Logo) tt_um_8_bit_cpu (8-bit CPU) tt_um_morse_code (Translator) tt_um_unified_error_detection (8-Bit Error Detection Engine) tt_um_sobel (Streaming Sobel Edge Detection Accelerator) tt_um_NUPlace2 (VAK FSM) tt_um_youweiterrylu (DMA) tt_um_joo111emad_BGR (Analog BGR) tt_um_izh_neuron (SKY130 Spiking Neuron) tt_um_izh_neuron_4pins (SKY130 Spiking Neuron) tt_um_pmendoza_ieee_tinyscan (Tiny SCAN chain tester) tt_um_rajkamal_analog (IEEE Multi-Stage Configurable Ring Oscillator) tt_um_isalopez9_memory_game (Simon Memory Game Chip) tt_um_usp_didactic ((IEEE) USP OpenSilicio Didactic Testchip) tt_um_bn_lif_evan (Bernoulli Stochastic Multiplier + LIF Neuron) tt_um_advun (tinyWorkshop) tt_um_wokwi_460983138943099905 (Trial IB) tt_um_pfw_tpu (2x2 Systolic Array TPU) tt_um_riscv_gpu (4x4 BitNet b1.58 Matrix Multiply Accelerator) tt_um_tt08_axis_fifo_fwft_bkenololo (IEEE 8-bit AXI4-Stream FWFT FIFO) tt_um_analog_ota_v3_IEEE (TTSKY26a_Miller_OTA(IEEE)) tt_um_quadpulse_pwm (QuadPulse — 4-Channel Servo/Motor PWM ASIC) tt_um_advaittej_stopwatch (V-SPACE Demo: Command & Control Chronograph) tt_um_snn_afib_detector (SNN AFib Detector — Spiking Reservoir Computing Core) tt_um_Halcy0nnnn (IEEE_MMU_Cybertron_Logo) tt_um_baby_cpu (Baby CPU) tt_um_wokwi_462285560117329921 (BCD ID Wowki) tt_um_LAT (Automation Laboratory Logo with author Image) tt_um_dean_foulds_ai_accelerator (Systolic Binary Neural Network Accelerator) tt_um_kazan_rqpu (tt_um_kazan_rqpu) tt_um_ultrasage_danz (IEEE Open-Silicon 2026 x NITHUB: Soil Moisture Irrigation Controller) tt_um_traffic_ctrl (IEEE Open-Silicon 2026: Adaptive Traffic Light Controller with Emergency Override) tt_um_lpf_ieee (Moving average Digital Low pass filter (IEEE open silicon)) tt_um_array_mult_vga (4x4 Array Multiplier with VGA Visualization) tt_um_bfloat16 (IEEE bfloat16_accelerator) tt_um_silicon_art_vga_screensaver (VGA Screensaver with Silicon Art ROM) tt_um_seapanda0 (DSP_FIR) tt_um_datdt_charizard (IEEE VGA Charizard Flamethrower) tt_um_ocd_charlieplex (Charlieplex array controller) tt_um_bytex64_wave_hi (wave_hi) tt_um_STDCELL_LDO (STDCELL_LDO) tt_um_devil_nyancat (Devil Nyan Cat VGA) tt_um_ieee_pwd (PWM Generator) tt_um_petros (TTNN: Pre-trained BNN for 8x8 MNIST) tt_um_Medidor_Jitter (Jitter Metrics & Pulse Analyzer) tt_um_CNN4IC_sky (CNN4IC — Convolutional Neural Network (CNN) for Image Classification on Chip (IEEE)) tt_um_Madd_CS_Ring_Osc (CSRO with 8-bit DAC) tt_um_reaction_game (Reaction game on Simon Says board) tt_um_load_priority_controller (IEEE Open-Silicon 2026: Load Priority Controller) tt_um_ctw_ldo (LDO Regulator Skywater 130nm) tt_um_c4m_legacyspsram_direct (TTSKY-SPSRAM-legacy-direct) tt_um_tpu (Mini TPU v2) tt_um_rcyaon (bandgap-ptat) tt_um_5tOTA (Operational Transconductance Amplifier) tt_um_wokwi_461554799001985025 (inec_voting) tt_um_systolic_array (Custom 3 by 3 Systolic Array) tt_um_chronoINAAL (Digital Stopwatch with LAP mode) tt_um_pree (UART_Analog_IC) tt_um_thorsten_shiftregister (Shiftregister Challenge 40 Bit) tt_um_hamming74 (Hamming(7,4) Encoder/Decoder) tt_um_prathiba_finite_sbox (Finite Field AES S-box) tt_um_maw_game (MAW Bird Shooter VGA Game) tt_um_vga_ascii (ascii_typewriter) tt_um_lstm_wakeword (TTSKY26A Neural Network - LSTM Wake Word Detector) tt_um_bad_apple (test) tt_um_riscv_branch (rv32i RISC-V Branch Condition Unit) tt_um_alu8bit (8-bit Tiny ALU) tt_um_chaotic_rng (C0haotic RNG) tt_um_ik_0_ptat_bgr (Pseudo-PTAT cell based bandgap reference) tt_um_er_ring_osc (Simple Ring Oscillator) tt_um_wokwi_462290658621740033 (IEEE IC Bootcamp Khalifa University) tt_um_ross_systolic (2x2 Systolic Array Matrix Multiplier) tt_um_27jorge05_crc_fifo (CRC_FIFO: CRC-32 Engine with 8-Byte FIFO and VGA Display) tt_um_jonathanbytes_alu8_serial (ALU8 Serial (IEEE)) tt_um_vmm_bnn (Nano-Bnn-Accelerator) tt_um_Onchip_TrafficLight (Onchip-UIS Traffic Light) tt_um_rebeccargb_universal_decoder (Universal Binary to Segment Decoder) tt_um_db_PWM (Onchip-UIS PWM Generator ) tt_um_ccollatz_SO (Onchip-UIS Collatz Conjecture) tt_um_rebeccargb_hardware_utf8 (Hardware UTF Encoder/Decoder) tt_um_rebeccargb_intercal_alu (INTERCAL ALU) tt_um_rebeccargb_vga_pride (VGA Pride) tt_um_wokwi_462349004652630017 (IEEE Logic Locked Reversible 2-Bit ALU) tt_um_andriansyah_capless_ldo (capless LDO regulator with 51.1dB PSRR at 100kHz) tt_um_ramp_adc (ttsky26b-ramp-adc) tt_um_alu_7bits (ALU 7 Bits) tt_um_ALU_Porca (Onchip-UIS 8-bit ALU with Status Flags) tt_um_oreoluwa_water_level (IEEE Open-silicon 2026 x NITHUB: Fluid Level Detector and Controller) tt_um_wokwi_464171439964087297 (First Silicon) tt_um_wokwi_464173578877001729 (Tiny Tapeout Template - PJ v2) tt_um_krisjdev_artwork (Silicon Artwork) tt_um_wokwi_464171399090591745 (tiny-tapeout-2026-05-16) tt_um_wokwi_464176621517795329 (Tiny Tapeout Run1) tt_um_wokwi_464178664603376641 (Tiny Tapetest) tt_um_wokwi_464171361019935745 (Tiny Tapeout Template Copy) tt_um_wokwi_464177144942873601 (TinyTapeout_Hackaday_Daniel) tt_um_wokwi_464171521208810497 (Daniel's first chip (Tiny Tapeout)) tt_um_wokwi_464171464939073537 (Claire's first Wokwi design) tt_um_wokwi_464176181065476097 (8-bit counter) tt_um_hackin7_coprocessor (AoC Hardcaml Coprocessor) tt_um_wokwi_464171453853527041 (Tiny Tapeout Hackaday 2026) tt_um_wokwi_464171864719209473 (Everton - Tiny Tapeout Workshop LC26) tt_um_ml_coprocessor (Kunal ML co-processor) tt_um_rahulbhagwat_brainamp_lna (brainamp-ac-coupled-lna) tt_um_Onchip_adder_NM (Onchip-UIS 4-bit Ripple Carry Adder) tt_um_wokwi_463557428446691329 (3Bit_yALU_IEEE_V2) tt_um_Onchip_Trimmed_BandGap (Onchip-UIS 3-bit Trimmed 1.2V BandGap) tt_um_ascon_cxof_chain (ASCON-CXOF128 Hash-Chain Accelerator) tt_um_Onchip_Freq_Divider_Dig (Onchip-UIS CLK Frequency Divider) tt_um_bleeptrack_cc2 (Recursive Rectangles) tt_um_enjimneering_spi_mem (SPI Memory Test) tt_um_voltrare (UART SPI ASCII Art) tt_um_enrico_glr (Secret Guessing Game) tt_um_gitragi_rng (Logic-Locked 5-Bit RNGy) tt_um_ece298A_analog_r4 (ECE298A analog tile) tt_um_trinity_nano (TRI-1 Phi — Trinity φ-anchor 1×1 Lucas POST + CLARA Gap-4) tt_um_ghtag_trinity_gf16 (TRI-1 Euler — Trinity e-engine 8×2 SUPER-CROWN + 10 CLARA Gaps) tt_um_lujji_ulogic_analyzer (ulogic_analyzer) tt_um_catalinlazar_adpll_125m_sky130 (127-stage Coarse-Tapped ADPLL) tt_um_vga_sharc_demo (SHaRC VGA Demo) tt_um_digit_serial_divider (IEEE | 24-Bit Serial Fixed-Point Binary Divider) tt_um_xeniarose_sbox (AES S-Box / PRESENT) tt_um_main_fsm_anbui_uci (Swarm Microrobot Drug Delivery FSM) tt_um_RO_aging (Onchip-UIS Ring Oscillators for Aging) tt_um_trinity_max_true (TRI-1 Gamma — MAX-TRUE NEUROMORPHIC FLAGSHIP 32-tile 8-column) tt_um_gray_sobel (tt_um_sobel_threshold) tt_um_c0d3d1_ldo (tt26b-Babies-First-LDO) tt_um_Bio_SSG_ (Bio-SSG) tt_um_nezumi_tech_adc_sq_compare (TT ADC SQ Compare) tt_um_c4m_spsram_direct_librelane (TTSKY-SPSRAM-direct-librelane) tt_um_tinycgra (tinyCGRA 2x2) tt_um_opensilicio_5g_rectifier (5 GHz RF-DC Rectifier) tt_um_sky_pll (SKY PLL test project) tt_um_rv32_vga (Systolic VGA Visualizer) tt_um_tron_game (TRON: Light Cycles game with VGA support (IEEE)) tt_um_wearlevel_controller (Hardware EEPROM Wear-Leveling Controller) tt_um_enjimneering_bss_uart (BSS UART) tt_um_wokwi_458489231265343489 (EDS workshop 4bit adder) tt_um_wokwi_464171612496799745 (Tiny Tapeout Exercise) tt_um_wokwi_464178459384432641 (Tiny Tapeout Template Copy) tt_um_leozqi_onetile (OneTile!) tt_um_d_4_array_multiplier (3020 Test Repo 4x4 Array Multiplier) tt_um_adithya_selvakumar_vco (4-Stage Differential Ring VCO) tt_um_snk_pwm_uart (PWM UART Controller) Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available