
This is a 1-bit Vector-Matrix Multiplier (VMM) coprocessor designed to accelerate Binary Neural Networks (BNNs). Because a 1x1 Sky130 tile only gives us about 1,000 gates, fitting a full Neural Network is impossible.
Instead, this design uses a Bottom-Up Time-Multiplexing architecture. The silicon contains 8 physical neurons. Each neuron has an 8-bit weight shift register and a 10-bit accumulator. The math is extremely cheap: multiplication is done using a single XNOR gate, and accumulation is done via a Popcount adder tree.
There is no internal State Machine. The ASIC acts as a raw, ultra-fast math engine, while the looping, routing, and layer management are handled by a host processor in software.
The chip expects to be driven by a host microcontroller running a C/C++ driver. The testing flow goes like this:
rst_n LOW to clear the 10-bit accumulators.ui_in.ui_in.uio_in[4]). The internal edge detector ensures exactly one addition per byte.uo_out and the top 2 bits from uio_out[7:6].To run this, you need a host microcontroller (Raspberry Pi Pico / RP2040, ESP32, etc.) or an FPGA SoC to hold the trained model weights, stream the 1-bit data, and compute the final Float32 Softmax output layer.
| # | Input | Output | Bidirectional |
|---|---|---|---|
| 0 | data_in_0 | acc_out_0 | load_weights |
| 1 | data_in_1 | acc_out_1 | addr_0 |
| 2 | data_in_2 | acc_out_2 | addr_1 |
| 3 | data_in_3 | acc_out_3 | addr_2 |
| 4 | data_in_4 | acc_out_4 | compute_en |
| 5 | data_in_5 | acc_out_5 | unused |
| 6 | data_in_6 | acc_out_6 | acc_out_8 |
| 7 | data_in_7 | acc_out_7 | acc_out_9 |