
TeenyTPU is a 2x2 INT8 systolic array TPU designed. At its core, it features a 2x2 grid of Processing Elements (PEs) that can perform matrix multiplication operations on INT8 data, resulting in 16-bit partial sum sequences.
SPI Pin Mapping
ui[0]: SPI SCLKui[1]: SPI CS_Nui[2]: SPI MOSIuo[0]: SPI MISOuo[1]: BUSY flaguo[2]: DONE flagArchitecture Overview
0x01 WRITE_WEIGHT — load an 8-bit weight into a selected column/row.0x02 LOAD_INPUT — load an 8-bit activation into a selected row.0x03 CMD_START — trigger the compute FSM.0x04 READ_RESULT — read back two 16-bit partial sums from a selected column.0x05 READ_STATUS — read the {done, busy} status byte.IDLE → LOAD_W → SWITCH → FEED → DRAIN → DONE).To test the TeenyTPU, you must implement an SPI master to drive the designated ui pins.
clk) and assert rst_n low briefly to reset the FSM and the systolic array.CS_N low, send the 0x01 opcode, followed by the column index (e.g., 0x00), and then the two 8-bit weights for that column. Deassert CS_N. Repeat this for column 1.CS_N low, send the 0x02 opcode, followed by the row index (e.g., 0x00), and the 8-bit activation value. Deassert CS_N. Repeat this for row 1.0x03 opcode to initiate computation. The busy pin (uo[1]) will assert high.done pin (uo[2]) to assert high, which indicates that the systolic array has finished computation. Alternatively, you can use the 0x05 opcode to read the {6'b0, done, busy} status byte repeatedly.0x04 opcode followed by the column index (0x00 or 0x01). The TPU will respond by shifting out 16 bits of result data (MISO) representing the partial sums of that column.An SPI master connected to ui[0] (SCLK), ui[1] (CS_N), ui[2] (MOSI), and uo[0] (MISO).
| # | Input | Output | Bidirectional |
|---|---|---|---|
| 0 | spi_sclk | spi_miso | |
| 1 | spi_cs_n | busy | |
| 2 | spi_mosi | done | |
| 3 | |||
| 4 | |||
| 5 | |||
| 6 | |||
| 7 |