
This design implements a 64-point Radix-2 Single Delay Feedback (SDF) Decimation-In-Frequency (DIF) FFT in silicon, targeting the Tiny Tapeout 6×2 tile footprint.
The pipeline consists of 6 cascaded sdf_stage modules (since log₂(64) = 6). Each stage contains:
A+B and the twiddle-multiplied difference (A−B)·W.Data routing through each stage is controlled by a single sel bit derived from the shared 6-bit master_cnt counter in top.v.
All data paths are 8-bit signed two's complement integers. To prevent overflow, each butterfly stage applies an arithmetic right-shift (division by 2) with convergent rounding (+0.5 LSB before truncation, computed in 16-bit intermediates). This introduces a cumulative pipeline gain of 1/64. The output order is bit-reversed, which is the natural output order of Radix-2 DIF.
A 128-cycle frame controller lives in project.v:
| Phase | Cycles | Description |
|---|---|---|
| Idle | — | System waits for the trigger byte 0xAA on ui_in while ena is high |
| Ingest | 0–63 | 64 real samples are clocked in from ui_in |
| Sync emit | Cycle 63 | A 2-stage slip buffer injects the sync marker 0xAA on both outputs |
| Flush | 64–127 | ui_in is internally forced to 0x00; 64 FFT bins stream out on uo_out (real) and uio_out (imaginary) simultaneously |
| Halt | 128 | running deasserts; system returns to idle |
Separating the ingest and flush phases into a strict 128-cycle frame prevents memory contamination between back-to-back transforms.
tt_um_fft_adityaamehra (project.v) — 128-cycle FSM, sync slip buffer, output gating
└── top (top.v) — master_cnt counter, 6× sdf_stage instantiation
└── sdf_stage × 6 (sdf_stage.v) — delay line + butterfly + twiddle ROM per stage
├── delay_line (delay_line.v) — parameterised shift register (depth = N/2^(stage+1))
├── Butterfly (Butterfly.v) — Radix-2 DIF complex butterfly with convergent rounding
└── twiddle_rom (twiddle_rom.v)— 32-entry async ROM of W₆₄ coefficients
rst_n low for at least one clock cycle to reset all pipeline registers and the frame counter.ena high.0xAA (decimal 170) on ui_in. On the next rising clock edge the running flag asserts and the 128-cycle frame begins.ui_in, starting from the cycle immediately following the trigger. Sample order is time-sequential (n = 0, 1, … 63).0xAA) on both uo_out and uio_out for one cycle. This signals that the next 64 output cycles will carry valid FFT data.uo_out (real part, X_k real) and uio_out (imaginary part, X_k imag). Note: the bins arrive in bit-reversed order. To reconstruct natural frequency order, reverse the 6-bit index of each output sample.running deasserts and the design returns to idle, ready for the next 0xAA trigger.0b000000).The included CocoTB test suite (test/test.py) validates the design against numpy.fft.fft. It exercises DC, unit impulse, single-tone sine, single-tone cosine, Nyquist, and pseudo-random noise inputs, and performs a point-by-point LSB deviation check after correcting for bit-reversal.
To run the tests locally:
cd test
make
The design requires an external controller (e.g., an FPGA or microcontroller) to:
ena, then send 0xAA on ui_in followed by 64 data samples on successive clock cycles.uo_out and uio_out on each of the 64 cycles following the 0xAA sync marker.clk pin.No analog or mixed-signal peripherals are required. The interface is entirely synchronous digital.
| # | Input | Output | Bidirectional |
|---|---|---|---|
| 0 | Input real bit 0 | Output real bit 0 | Output imag bit 0 |
| 1 | Input real bit 1 | Output real bit 1 | Output imag bit 1 |
| 2 | Input real bit 2 | Output real bit 2 | Output imag bit 2 |
| 3 | Input real bit 3 | Output real bit 3 | Output imag bit 3 |
| 4 | Input real bit 4 | Output real bit 4 | Output imag bit 4 |
| 5 | Input real bit 5 | Output real bit 5 | Output imag bit 5 |
| 6 | Input real bit 6 | Output real bit 6 | Output imag bit 6 |
| 7 | Input real bit 7 | Output real bit 7 | Output imag bit 7 |