103 GOA - grogu on ASIC

103 : GOA - grogu on ASIC

GOA - grogu on ASIC

GOA stands for grogu on ASIC. It is a reduced version of the CORTEZ chip targeting the Tiny Tapeout 6 run. The grogu part comes from the register file design utilities grogu.

The GOA design is made of a single neuron with 2 (two) inputs. The register file contains a number of registers for control and observation. The neuron core works on 8 (eight) bits fixed-point arithmetic with 5 (five) reserved for the fraction.

Neuron internals

The next figure shows the simplified block diagram of the Neuron.

Neuron architecture

The arithmetic pipeline is composed of a number of fixed-point units: multiplier, adder for accumulator and bias, activation function. These primitives are shared, so that a centralized control engine (NCE) dispatches one value at a time to the proper block. WEIGHTS matrix is externally stored in a local register file, an instance of grogu.

The NCE expects exactly NUM_INPUTS input values. For each of them, the following process is executed through the pipeline:

  1. Multiply current value by its respective weight;
  2. Accumulate the value.

Once all values have been received, bias is added and the non-linear activation function is used to determine the output solution.

Fixed-point

The entire pipeline works with fixed-point arithmetic. This reduces the complexity of the design. For the Tiny Tapeout run, the fixed-point configuration is: 8 (eight) bits word with 5 (five) bits reserved to the fractional part. All fixed-point operations wrap.

Non-linear activation function

tanh() is the chosen non-linear activation function. Thanks to its mathematical properties, it is interesting designing a fully digital filter that implements a piecewise approximation.

Linear interpolation between successive points is carefully chosen to minimize the error. The points where the tanh() function is split are chosen by looking at up to the 4-th derivative. Since the tanh() function is odd symmetric, the digital implementation focuses on half of the problem in the 1st quadrant. The other half of the problem on the 3rd quadrant is derived. The output is shown.

Derivatives

Scalable Configuration Interface

The SCI interface has been designed for the CORTEZ chip to address dense register files with a fairly large amount of registers. The SCI is designed to reduce wires and congestion. It consists of an half-duplex communication mechanism with request/ack pairs, useful for low-speed peripheral register access. Is is also latency insensitive. The SCI is inspired by the SPI protocol, with tri-state bus and active-low chip select.

For the single neuron case, the SCI bus is not tri-stated, since there is one single peripheral. This simplfies the implementation.

In general, the SCI interface for one Master and N Slaves is composed of 4 (four) signals (mapping to the tt_um_scorbetta_goa pins is reported in the Pinout section).

SIGNAL WIDTH DIRECTION ROLE
SCI_CSN N Master-to-Slave Active-low peripheral select
SCI_REQ 1 Master-to-Slave Request
SCI_RESP 1 Slave-to-Master Response
SCI_ACK 1 Slave-to-Master Ack

For detailed information on the SCI protocol please refer to this page.

Examples of Write and Read accesses are shown.

Write access

Read access

Network emulation

A twisted use of the single-neuron design can emulate an entire network made of a number of layers, each with a number of neuron. This is doable thanks to the way the neuron is designed. Basically, the 2-inputs neuron is repeadetely fed with iterative data, coming from either the external world (i.e., input values) or intermediate results (i.e., from the inner core). Mathematically, the MAC operation is distributed in time.

Network emulation

Pinout

PIN DIRECTION ROLE
ui_in[0] input FPGA clock
ui_in[1] input Active-low FPGA reset
ui_in[2] input Loopback data
ui_in[3] input Unused
ui_in[4] input Unused
ui_in[5] input Unused
ui_in[6] input Debug select [0]
ui_in[7] input Debug select [1]
uo_out[0] output Shared debug output dbug_out[0]
uo_out[1] output Shared debug output dbug_out[1]
uo_out[2] output Shared debug output dbug_out[2]
uo_out[3] output Shared debug output dbug_out[3]
uo_out[4] output Shared debug output dbug_out[4]
uo_out[5] output Shared debug output dbug_out[5]
uo_out[6] output Shared debug output dbug_out[6]
uo_out[7] output Shared debug output dbug_out[7]
uio_in[0] input SCI_CSN
uio_in[1] input SCI_REQ
uio_out[2] output SCI_RESP
uio_out[3] output SCI_ACK
uio_out[4] input Unused, configured as input
uio_out[5] input Unused, configured as input
uio_out[6] input Unused, configured as input
uio_out[7] input Unused, configured as input

Debug signals are mapped to output pins uo_out. In total, 32 (thirty-two) signals are exposed to the debug interface. Inputs ui_in[7:6] are used to control which ones, according to the following table.

Debug signals mux

Configuration

The configuration of the neuron is implemented by means of local registers that hold the values for the weights and the bias. In addition, control registers are used to trigger the neuron operations. All resigsters are 8 (eight) bits wide

REGISTER OFFSET TYPE CONTENTS
WEIGHT_0 0x0 R/W Weight of input #0
WEIGHT_1 0x1 R/W Weight of input #1
BIAS 0x2 R/W Bias
VALUE_IN 0x3 R/W Input value
CTRL 0x4 R/W Control register
STATUS 0x5 R Status register
RESULT 0x6 R Neuron solution
MULT_RESULT 0x7 R Intermediate multiplie result
ADD_RESULT 0x8 R Intermediate adder result w/o bias
BIAS_ADD_RESULT 0x9 R Intermediate adder result w/ bias

External hardware

The main clock clk is generated by the on-board RP2040 chip. It is used solely for debug purposes. It is mirrored to uo_out[1].

The core clock is instead drawn from ui_in[0]. This is generated by an FPGA residing on an external board. ui_in[0] and clk are mesochronous, and they never interact.

The use of an external clock is required, since the SCI interface (also driven by the FPGA) needs proper synchronization. The FPGA also drives the active-low core reset through ui_in[1]. All control and status information is sent to and retrieved from the ASIC through the SCI interface.

IO

# Input Output Bidirectional
0 FPGA clock Shared debug output dbug_out[0] SCI_CSN
1 Active-low FPGA reset Shared debug output dbug_out[1] SCI_REQ
2 Loopback data Shared debug output dbug_out[2] SCI_RESP
3 Shared debug output dbug_out[3] SCI_ACK
4 Shared debug output dbug_out[4]
5 Shared debug output dbug_out[5]
6 Debug select [0] Shared debug output dbug_out[6]
7 Debug select [1] Shared debug output dbug_out[7]

Chip location

Controller Mux Mux Mux Mux Mux Mux Mux Mux Mux Mux Mux Analog Mux Mux Mux Mux Mux Mux Mux Mux Mux Mux Mux Analog Mux Mux Mux Mux Mux Mux Mux Mux Mux Mux tt_um_chip_rom (Chip ROM) tt_um_factory_test (TinyTapeout 06 Factory Test) tt_um_analog_factory_test (TT06 Analog Factory Test) tt_um_analog_factory_test (TT06 Analog Factory Test) tt_um_urish_charge_pump (Dickson Charge Pump) tt_um_psychogenic_wowa (WoWA) tt_um_oscillating_bones (Oscillating Bones) tt_um_kevinwguan (Crossbar Array) tt_um_coloquinte_moosic (Moosic logic-locked design) tt_um_alexsegura_pong (Pong) tt_um_iron_violet_simon (Iron Violet) tt_um_tomkeddie_a (VGA Experiments in Tennis) tt_um_MichaelBell_tinyQV (TinyQV Risc-V SoC) tt_um_andychip1_sn74169 (sn74169) tt_um_mattvenn_r2r_dac (Analog 8bit R2R DAC) tt_um_thorkn_audiochip_v2 (AudioChip_V2) tt_um_faramire_gate_guesser (Gate Guesser) tt_um_urish_simon (Simon Says game) tt_um_TT06_SAR_wulffern (TT06 8-bit SAR ADC) tt_um_soundgen (soundgen) tt_um_ledcontroller_Gatsch (ledcontroller) tt_um_digitaler_filter_rathmayr (Digitaler Filter) tt_um_histefan_top (Snake Game) tt_um_mayrmichael_wave_generator (Wave Generator) tt_um_advanced_counter (jku-tt06-advanced-counter) tt_um_FanCTRL_DomnikBrandstetter (PI-Based Fan Controller) tt_um_ps2_morse_encoder_top (PS/2 Keyboard to Morse Code Encoder) tt_um_calculator_muehlbb (16-bit calculator) tt_um_hpretl_tt06_tempsens (Temperature Sensor NG) tt_um_haeuslermarkus_fir_filter (FIR Filter with adaptable coefficients) tt_um_mattvenn_rgb_mixer (RGB Mixer demo) tt_um_analog_loopback (Analog loopback) tt_um_entwurf_integrierter_schaltungen_hadner (Projekt KEIS Hadner Thomas) tt_um_seven_segment_fun1 (7-segment-FUN) tt_um_moving_average_master (Moving average filter) tt_um_rgbled_decoder (SPI to RGBLED Decoder/Driver) tt_um_4bit_cpu_with_fsm (4-Bit CPU mit FSM) tt_um_flappy_bird (Flappy Bird) tt_um_drops (drops) tt_um_enieman (UART-Programmable RISC-V 32I Core) tt_um_gabejessil_timer (2 Player Game) tt_um_wokwi_384804985843168257 (playwithnumbers) tt_um_wokwi_384711264596377601 (luckyCube) tt_um_hpretl_tt06_tdc (Synthesized Time-to-Digital Converter (TDC)) tt_um_wokwi_384437973887503361 (Asynchronous Down Counter) tt_um_spi_pwm_djuara (spi_pwm) tt_um_SteffenReith_PiMACTop (PiMAC) tt_um_mattvenn_relax_osc (Relaxation oscillator) tt_um_jv_sigdel (1st passive Sigma Delta ADC) tt_um_wokwi_392873974467527681 (PILIPINAS) tt_um_scorbetta_goa (GOA - grogu on ASIC) tt_um_sanojn_ttrpg_dice (TTRPG Dice + simple I2C peripheral) tt_um_urish_dffram (DFFRAM Example (128 bytes)) tt_um_lucaz97_monobit (Monobit Test) tt_um_noritsuna_i4004 (i4004 for TinyTapeout) tt_um_hpretl_tt06_tdc_v2 (Synthesized Time-to-Digital Converter (TDC) v2) tt_um_vaf_555_timer (A 555-Timer Clone for Tiny Tapeout 6) tt_um_obriensp_be8 (8-bit CPU with Debugger (Lite)) tt_um_toivoh_retro_console (Retro Console) tt_um_mattvenn_inverter (Double Inverter) tt_um_SteffenReith_ASGTop (ASG) tt_um_lucaz97_rng_tests (rng Test) tt_um_dieroller_nathangross1 (Die Roller) tt_um_kwilke_cdc_fifo (Clock Domain Crossing FIFO) tt_um_spiff42_exp_led_pwm (LED PWM controller) tt_um_devinatkin_fastreadout (Fast Readout Image Sensor Prototype) tt_um_ja1tye_tiny_cpu (Tiny 8-bit CPU) tt_um_7seg_animated (Animated 7-segment character display) tt_um_neurocore (Neurocore) tt_um_zhwa_rgb_mixer (RGB Mixer) tt_um_wokwi_394704587372210177 (Cambio de giro de motor CD) tt_um_ian_keypad_controller (Keypad controller) tt_um_urish_spell (SPELL) tt_um_vks_pll (PLL blocks) tt_um_fountaincoder_top (multimac) tt_um_dsatizabal_opamp (Simple FET OpAmp with Sky130.) tt_um_obriensp_be8_nomacro (8-bit CPU with Debugger) tt_um_LFSR_shivam (10-bit Linear feedback shift register) tt_um_shivam (Pulse Width Modulation) tt_um_algofoogle_tt06_grab_bag (TT06 Grab Bag) tt_um_meiniKi_tt06_fazyrv_exotiny (FazyRV-ExoTiny) tt_um_wokwi_394888799427677185 (4-bit stochastic multiplier traditional) tt_um_QIF_8bit (8 Bit Digital QIF) tt_um_MATTHIAS_M_PAL_TOP_WRAPPER (easy PAL) tt_um_andrewtron3000 (Rule 30 Engine!) tt_um_tommythorn_4b_cpu_v2 (Silly 4b CPU v2) tt_um_aerox2_jrb8_computer (The James Retro Byte 8 computer) tt_um_wokwi_394898807123828737 (4-bit Stochastic Multiplier Compact with Stochastic Resonator) tt_um_argunda_tiny_opamp (Tiny Opamp) tt_um_fdc_chip (Frequency to digital converters (asynchronous and synchronous)) tt_um_8bit_cpu (8-Bit CPU In a Week) tt_um_mitssdd (co processor for precision farming) tt_um_fstolzcode (Tiny Zuse) tt_um_liu3hao_rv32e_min_mcu (tt06-RV32E_MinMCU) tt_um_kianV_rv32ima_uLinux_SoC (KianV uLinux SoC) tt_um_wokwi_395444977868278785 (*NOT WORKING* HP 5082-7500 Decoder) tt_um_wokwi_394618582085551105 (Keypad Decoder) tt_um_wokwi_395054820631340033 (Workshop Hackaday Juli) tt_um_wokwi_395055035944909825 (Some_LEDs) tt_um_wokwi_395055351144787969 (Hack a day Tiny Tapeout project) tt_um_wokwi_395054823569451009 (First TT Project) tt_um_wokwi_395054823837887489 (Dice) tt_um_wokwi_395055341723330561 (Workshop_chip) tt_um_jduchniewicz_prng (8-bit PRNG) tt_um_wokwi_395054564978002945 (Bestagon LED matrix driver) tt_um_wokwi_395054466384583681 (1-Bit ALU 2) tt_um_wokwi_395058308283408385 (test for tiny tapeout hackaday) tt_um_s1pu11i_simple_nco (Simple NCO) tt_um_wokwi_395055359324730369 (Tiny_Tapeout_6_Frank) tt_um_disp1 (Display test 1) tt_um_pckys_game (PCKY´s Successive Approximation Game) tt_um_tiny_shader_mole99 (Tiny Shader) tt_um_wokwi_393815624518031361 (My Chip) tt_um_minibyte (Minibyte CPU) tt_um_emilian_rf_playground (IDAC8 based on divide current by 2) tt_um_triple_watchdog (Triple Watchdog) tt_um_wokwi_395142547244224513 (EFAB Demo 2) tt_um_chisel_hello_schoeberl (Chisel Hello World) tt_um_aiju_8080 (8080 CPU) tt_um_wokwi_395134712676183041 (Inverters) tt_um_nubcore_default_tape (DEFAULT) tt_um_wuehr1999_servotester (Servotester) tt_um_wokwi_395055722430895105 (Servo Signal Tester) tt_um_exai_izhikevich_neuron (Izhikevich Neuron) tt_um_lisa (LISA 8-Bit Microcontroller) tt_um_wokwi_394707429798790145 (32-Bit Galois Linear Feedback Shift Register) tt_um_CKPope_top (X/Y Controller) tt_um_MNSLab_BLDC (Universal Motor and Actuator Controller) tt_um_couchand_dual_deque (Dual Deque) tt_um_JamesTimothyMeech_inverter (Programmable Thing) tt_um_signed_unsigned_4x4_bit_multiplier (Signed Unsigned multiplyer) tt_um_lipsi_schoeberl (Lipsi: Probably the Smallest Processor in the World) tt_um_i_tree_batzolislefteris (Anomaly Detection using Isolation trees) tt_um_wokwi_394830069681034241 (Cyclic Redundancy Check 8 bit) tt_um_rng_3_lucaz97 (RNG3) tt_um_wokwi_395263962779770881 (Bivium-B Non-Linear Feedback Shift Register) tt_um_dvxf_dj8 (DJ8 8-bit CPU) tt_um_silicon_tinytapeout_lm07 (Digital Temperature Monitor) tt_um_htfab_flash_adc (Flash ADC) tt_um_chisel_pong (Chisel Pong) tt_um_wokwi_395414987024660481 (HELP for tinyTapeout) tt_um_jorgenkraghjakobsen_toi2s (SPDIF to I2S decoder) tt_um_cmerrill_pdm (Parallel / SPI modulation tester) tt_um_csit_luks (CSIT-Luks) tt_um_wokwi_395357890431011841 (Trivium Non-Linear Feedback Shift Register) tt_um_drburke3_top (SADdiff_v1) tt_um_cejmu_riscv (TinyRV1 CPU) tt_um_rejunity_current_cmp (Analog Current Comparator) tt_um_loco_choco (BF Processor) tt_um_qubitbytes_alive (It's Alive) tt_um_wokwi_395061443288867841 (BCD to single 7 segment display Converter) tt_um_SJ (SiliconJackets_Systolic_Array) tt_um_ejfogleman_smsdac (8-bit DEM R2R DAC) tt_um_wokwi_395055455727667201 (Hardware Trojan Part II) tt_um_ericsmi_weste_problem_4_11 (Measurement of CMOS VLSI Design Problem 4.11) tt_um_wokwi_395034561853515777 (2 bit Binary Calculator) tt_um_mw73_pmic (Power Management IC) tt_um_Counter_1_shivam (8-bit Binary Counter) tt_um_wokwi_395054508867644417 (SynchMux) tt_um_otp_encryptor (TT06 OTP Encryptor) tt_um_wokwi_395514572866576385 (Parity Generator) tt_um_ADPCM_COMPRESSOR (ADPCM Encoder Audio Compressor) tt_um_3515_sequenceDetector (Sequence detector using 7-segment) tt_um_faramire_stopwatch (Simple Stopwatch) tt_um_ks_pyamnihc (Karplus-Strong String Synthesis) tt_um_dlmiles_muldiv8 (MULDIV unit (8-bit signed/unsigned)) tt_um_dlmiles_muldiv8_sky130faha (MULDIV unit (8-bit signed/unsigned) with sky130 HA/FA cells) tt_um_tommythorn_ncl_lfsr (NCL LFSR) tt_um_lk_ans_top (ANS Encoder/Decoder) tt_um_MichaelBell_latch_mem (Latch RAM (64 bytes)) tt_um_wokwi_395179352683141121 (Combination Lock) tt_um_Uart_Transciver (UART Transceiver) tt_um_dgkaminski (4-Bit ALU) tt_um_DigitalClockTop (TDM Digital Clock) tt_um_wokwi_394640918790880257 (IFSC Keypad Locker) tt_um_wokwi_395355133883896833 (BIT COMPARATOR) tt_um_alu (SumLatchUART_System) tt_um_alfiero88_VCII (VCII) tt_um_ALU (3-bit ALU) tt_um_topTDC (Convertidor de Tiempo a Digital (TDC)) tt_um_UABCReloj (24 H Clock) tt_um_CDMA_Santiago (CDMA_2024) tt_um_dr_skyler_clock (Clock) tt_um_motor (motor a pasos) tt_um_mult_2b (mult_2b) tt_um_CodHex7seg (Decodificador binario a display 7 segmentos hexadecimal) tt_um_S2P (Serial to Parallel Register) tt_um_PWM (PWM) tt_um_ss_register (serie_serie_register) tt_um_stepper (Stepper) tt_um_g3f (Generador digital trifásico) tt_um_ALU_DECODERS (ALU with a Gray and Octal decoders) tt_um_ram (4 bit RAM) tt_um_sap_1 (SAP-1 Computer) tt_um_guitar_pedal (Integrated Distorion Pedal) tt_um_mbalestrini_usb_cdc_devices (Two ports USB CDC device) tt_um_adammaj (Tiny ALU) tt_um_wokwi_395567106413190145 (4-Bit Full Adder and Subtractor with Hardware Trojan) tt_um_gak25_8bit_cpu_ext (Most minimal extension of friend's 'CPU In a Week' in a day) tt_um_hsc_tdc (UCSC HW Systems Collective, TDC) tt_um_BoothMulti_hhrb98 (UACJ-MIE-Booth 4) tt_um_dlmiles_poc_fskmodem_hdlctrx (FSK Modem +HDLC +UART (PoC)) tt_um_simplez_rcoeurjoly (tt6-simplez) tt_um_nurirfansyah_alits01 (Analog Test Circuit ITS: VCO) tt_um_ppca (drEEm tEEm PPCA) tt_um_wokwi_395522292785089537 (Displays CIt) tt_um_fpu (Dgrid_FPU) tt_um_duk_lif (Leaky Integrate and fire neuron(LIF)) tt_um_bomba1 (Latin_bomba) tt_um_chatgpt_rsnn_paolaunisa (ChatGPT designed Recurrent Spiking Neural Network) tt_um_bit_ctrl (Bit Control) tt_um_array_multiplier_hhrb98 (Array Multiplier) tt_um_wallace_hhrb98 (UACJ-Wallace multiplier) tt_um_I2C_to_SPI (TinyTapeout SPI Master) tt_um_rng (Random number generator) tt_um_wokwi_395599496098067457 (EVEN AND ODD COUNTERS) tt_um_8bitALU (8bit ALU) tt_um_aleena (Analog Sigmoid) tt_um_rejunity_1_58bit (Ternary 1.58-bit x 8-bit matrix multiplier) tt_um_rejunity_fp4_mul_i8 (FP4 x 8-bit matrix multiplier) tt_um_PWM_Controller (PWM Controller) tt_um_couchand_cora16 (CORA-16) tt_um_frq_divider (clk frequency divider controled by rom) tt_um_wokwi_390913889347409921 (Notre Dame Dorms LED) tt_um_timer_counter_UGM (4-Digit Scanning Digital Timer Counter) tt_um_koconnor_kstep (kstep) tt_um_lancemitrex (DIP Switch to HEX 7-segment Display) tt_um_PWM_Sine_UART (PWM_Sinewave_UART) tt_um_nicklausthompson_twi_monitor (TWI Monitor) tt_um_wokwi_395615790979120129 (Cambio de giro de motor CD) tt_um_ancho (Circuito PWM con ciclo de trabajo configurable) tt_um_wokwi_395618714068432897 (32b Fibonacci Original) tt_um_voting_thingey (Voting thingey) tt_um_hsc_tdc_buf (UCSC HW Systems Collective, TDC - BUF2x1) tt_um_hsc_tdc_mux (UCSC HW Systems Collective, TDC - MUX2x1) tt_um_petersn_micro1 (14 Hour Simple Computer) tt_um_sanojn_tlv2556_interface (UART interface to ADC TLV2556 (VHDL Test)) tt_um_gray_sobel (Gray scale and Sobel filter) tt_um_wokwi_395614106833794049 (Universal gates) Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available