
This design wraps a structurally-optimized gate-level multiply-accumulate (MAC) unit in a TinyTapeout-compatible clocked interface. The inner MAC core was produced through ML-guided design-space exploration of arithmetic architectures — it computes y[16:0] = a[7:0] * b[7:0] + c[15:0] as a purely combinational circuit using a ripple-carry accumulation structure.
The wrapper adds clocked operand registers, a 17-bit accumulator, and a serial command interface. On each MAC operation, the accumulator feeds back into the MAC's addend input, enabling running accumulation across multiple multiply-add cycles.
Specifications:
rst_n low then high.cmd=001, put value on ui_in, clock once.cmd=010, put value on ui_in, clock once. This triggers acc = a*b + acc.cmd=011 for acc[7:0], cmd=100 for acc[15:8], cmd=101 for acc[16].cmd=110 zeros the accumulator.The gate-level MAC netlist originates from a semester project on design-space exploration of AI hardware architectures, where ML-driven optimization was used to explore Pareto-optimal arithmetic unit implementations across power, performance, and area (PPA) tradeoffs.
None required.
| # | Input | Output | Bidirectional |
|---|---|---|---|
| 0 | data_in[0] | data_out[0] | cmd[0] |
| 1 | data_in[1] | data_out[1] | cmd[1] |
| 2 | data_in[2] | data_out[2] | cmd[2] |
| 3 | data_in[3] | data_out[3] | busy |
| 4 | data_in[4] | data_out[4] | overflow (acc[16]) |
| 5 | data_in[5] | data_out[5] | |
| 6 | data_in[6] | data_out[6] | |
| 7 | data_in[7] | data_out[7] |