551 BRISQ

551 : BRISQ

Design render

How it works

BRISQ computes an approximate IEEE-754 single-precision inverse square root:

y1 = y0 * (1.5 - 0.5 * x * y0 * y0)

The Tiny Tapeout top module is tt_um_brisq. The design receives one 32-bit FP32 operand over an 8-bit byte-serial input bus, runs one Newton-Raphson refinement, and returns one 32-bit FP32 result over an 8-bit byte-serial output bus.

The initial estimate is generated in src/top.sv with a Quake-style integer seed:

seed_y0 = WTF - (input_magnitude >> 1)

WTF is a localparam in the RTL and is currently 31'h5F3759DF. The cocotb testbench reads this value from the design, so the expected-value model remains parametric with the RTL seed constant.

The implementation is optimized for small area:

  • src/top.sv contains the Tiny Tapeout wrapper, byte SerDes, FSM, seed logic, and special-case handling.
  • src/fp32_mul.sv implements a truncated positive-magnitude FP32-style multiplier using the high PRECISION_BITS fraction bits.
  • src/fp32_sub.sv implements the parametric 1.5 - b subtractor used by the Newton correction term.

The default precision is 11 high fraction bits. This is an approximate datapath, not a fully rounded IEEE-754 FPU.

I/O protocol

All FP32 values are transferred most-significant byte first.

Pin Direction Description
ui_in[7:0] input Input FP32 byte
uio_in[0] input Input byte valid
uo_out[7:0] output Output FP32 byte
uio_out[7] output Output byte valid
uio_out[6] output Final output byte
uio[5:1] input/unused Unused

To send an operand, drive each byte on ui_in[7:0] and assert uio_in[0] for the clock edge that accepts that byte. The receive FSM waits when uio_in[0] is low, so the input stream can pause between bytes.

After the fourth input byte is accepted, the accelerator computes for four clock cycles. It then drives four output bytes on uo_out[7:0]. uio_out[7] is high while output bytes are valid, and uio_out[6] is high with the fourth and final output byte.

Special cases

Input class Output
zero or subnormal +inf
negative value quiet NaN
+inf +0.0
NaN quiet NaN

How to test

From the test directory, install dependencies once:

python3 -m pip install -r requirements.txt

Run the self-checking cocotb RTL vector test:

make -B

Run the sweep test, which generates sweep_results/isqrt_sweep.csv and sweep_results/isqrt_sweep.png:

make sweep

Use more sweep samples with:

make sweep SWEEP_POINTS=4096

For gate-level simulation, copy the hardened netlist to test/gate_level_netlist.v and run:

make -B GATES=yes

The cocotb testbench writes tb.fst, which can be opened with GTKWave or Surfer.

External hardware

None.

IO

#InputOutputBidirectional
0fp32_in_byte[0]fp32_out_byte[0]fp32_in_valid
1fp32_in_byte[1]fp32_out_byte[1]
2fp32_in_byte[2]fp32_out_byte[2]
3fp32_in_byte[3]fp32_out_byte[3]
4fp32_in_byte[4]fp32_out_byte[4]
5fp32_in_byte[5]fp32_out_byte[5]
6fp32_in_byte[6]fp32_out_byte[6]fp32_out_last
7fp32_in_byte[7]fp32_out_byte[7]fp32_out_valid

Chip location

Controller Mux Mux Mux Mux Mux Mux Mux Mux Mux Mux Mux Mux Mux Mux Mux tt_um_chip_rom (Chip ROM) tt_um_factory_test (Tiny Tapeout Factory Test) tt_um_Vincent2405_adder_tree (BSD Convolution Adder Tree) tt_um_BastiBudde_i2c_slave_sensor (I2C Slave Template with Emulated Sensor) tt_um_60hz_load (60 Hz Grid-Forming ASIC with Dump-Load Control) tt_um_spi_config_reg (Simple SPI configuration for analog designs) tt_um_ex_drosen766 (Project) tt_um_spi_cpu_top (SPI-CPU) tt_um_d5smith_mfa (Music for ASICs) tt_um_i2c_master (I2C Master Controller) tt_um_aswarby_mac (Aswarby INT8 MAC) tt_um_arrakeen_spsram_direct (TT-Arrakeen-SPSRAM-direct) tt_um_alu (8-bit Interactive ALU) tt_um_JCT_PoC (ttgf jct PoC) tt_um_jct_lea (LEA-128) tt_um_cwru_cpu (CWRU CPU) tt_um_teapot (100Mbps Ethernet Accelerator Wrapper) tt_um_jte_cordic (CORDIC sin/cos generator) tt_um_aidenkoch4 (Three Channel RGB PWM Controller) tt_um_pschuetz_tremolo (Tremolo guitar pedal ASIC) tt_um_jsabree11_fibonacci_checker (fibbonaci_tt) tt_um_connerdaehler_boop (Procedural ASIC) tt_um_Kieckenwama_Traffic_LIGHT_FSM (Traffic Light FSM) tt_um_KimLuu02_WashingMachine_FSM (WashingMachine_FSM) tt_um_PaulineKreis_PWM_Analyser (PWM-Analyser) tt_um_PWM (PWM Generator) tt_um_wokwi_466666882406199297 (Simple Sprinkler) tt_um_rebeccargb_universal_decoder (Universal Binary to Segment Decoder) tt_um_rebeccargb_hardware_utf8 (Hardware UTF Encoder/Decoder) tt_um_spi_master (SPI Master Slave Communication) tt_um_likitha_trng (Secure TRNG Entropy Generator) tt_um_wnn (8-bit WNN Pattern Recognizer) tt_um_raksha (Raksha) tt_um_uart_soc (UART_SOC) tt_um_ecdsa_verify (ECDSA Verification) tt_um_ecc_processor (ECC Processor) tt_um_fast_auth (Fast Authentication Accelerator) tt_um_karthik_trng (TRNG using Ring Oscillator) tt_um_push (Secure V2X Mini Demonstrator) tt_um_santosh_aes_sbox (AES S-Box Accelerator) tt_um_hardware_anomaly_detection (Hardware Anomaly Detection) tt_um_multi_protocol (Multi-Protocol Communication Controller) tt_um_pqc_ntt_butterfly (PQC NTT Butterfly Core) tt_um_cambridge_nlfsr (Programmable Chaotic NLFSR) tt_um_4b_accumulator_cpu (4 bit Accumulator CPU) tt_um_spi_slave (SPI Slave with 8-Register File) tt_um_geeta_doddamani_lfsr (4-bit Maximum-Length LFSR) tt_um_ecc_accelerator (ECC Scalar Accelerator) tt_um_egurapha_chacha20 (ChaCha20) tt_um_configurable_pwm (Configurable PWM Generator) tt_um_Arctic0 (Arctic0 16-bit CPU) tt_um_comp8 (8-bit Comparator) tt_um_pwm_cit (Configurable 8-bit PWM Generator) tt_um_rameshwar_door_lock (Digital Door Lock) tt_um_sandy_venky (8-bit LFSR Circuit) tt_um_ljhahne_pong (Pong) tt_um_v2x_warning (V2X Collision Warning) tt_um_ecc_scalar_mult (ECC Scalar Multiplication) tt_um_fhw_appel_spiPWMio (spiPWMio) tt_um_arrakeen_spsram_direct_sramrules (TT-Arrakeen-SPSRAM-direct-sramrules) tt_um_arrakeen_spsram_direct_5v (TT-Arrakeen-SPSRAM-direct-5V) tt_um_LukeSilva_cartrip (Car Trip) tt_um_coffeepot (100Mpbs 3 port Ethernet switch) tt_um_emiliopeju_lightscan (Lightscan) tt_um_Alanduan21_triad01_top (triad01) tt_um_lif_snn (4-Neuron LIF Spiking Neural Network) tt_um_smerity_mandelbrot (Smerity-Mandelbrot) tt_um_elvtide01_7SegmentDice (7SegmentDice) tt_um_elemental_harmony (Elemental Harmony Game) tt_um_pattern_gen (Programmable Waveform and PWM Generator) tt_um_antimatter15_pdm_vad (PDM Voice Activity Detector) tt_um_layla_spike_detector (Neural Spike Detector) tt_um_detronyx_arith_lab (Detronyx Arithmetic Lab Tile) tt_um_hasheddan_nni (Nearest Neighbor Interpolation) tt_um_brisq (BRISQ) tt_um_santhosh_spike_codec_gf (Neuromorphic Spike Codec (GF180)) tt_um_santhosh_aer_router_gf (Asynchronous-AER Spike Router (4-phase REQ/ACK, 16-entry routing table, GF180)) tt_um_santhosh_snn_wta_gf (Spiking Neural Network WTA Inference Engine (GF180)) tt_um_santhosh_cim_bist_gf (CIM Controller with BIST and Fault Map (GF180)) tt_um_santhosh_neuro_puf_gf (Neuromorphic PUF (distinct-tap LFSR arbiter + memristor XOR, GF180)) tt_um_detronyx_uart_trace_exerciser (Detronyx UART Trace Exerciser) tt_um_ro_puf (Tiny RIng Oscillator PUF) tt_um_franretfie_top (Quadrature sine generator) tt_um_cherny_xor_8bi (XORing given bits) tt_um_mealycpp_ascon_sdmc_uart (ASCON Integrated Crypto Processor) tt_um_reflex_s4 (AER Reflex Chip - MCP2515 CAN gateway) tt_um_polytrig_core (PolyTrig Digital Waveform Synthesis Core) tt_um_waferspace_vga_screensaver (Wafer.space Logo VGA Screensaver) tt_um_2048_vga_game (2048 sliding tile puzzle game (VGA)) tt_um_urish_simon (Simon Says memory game) Available