39 Programmable_Pipeline-RISC-V

39 : Programmable_Pipeline-RISC-V

Design render

How it works

Block Diagram

This project implements a compact 32-bit RISC-V processor with a five-stage pipeline architecture consisting of Instruction Fetch (IF), Decode (ID), Execute (EX), Memory (MEM), and Write-Back (WB) stages. The pipelined design allows multiple instructions to be processed concurrently, improving throughput while maintaining a small hardware footprint.

Peripheral Interfaces

The system includes the following communication and control interfaces:

  1. UART1: Bootloader interface for program loading via serial protocol
  2. UART2: General-purpose UART communication interface during execution
  3. SPI Master: Peripheral communication interface (Mode 0, ~4.17 MHz)
  4. GPIO1: General-purpose output (LED control or similar)
  5. GPIO2: Hardware Chip Select control for SPI slave peripherals

Operating Modes

Bootloader Mode (Reset): After reset, the processor enters bootloader mode via UART1. Instructions are received serially as bytes through the UART1 RX pin and stored sequentially into instruction memory. Once the bootloader detects the sentinel value (0xBAADF00D), it automatically transitions to execution mode.

Execution Mode: The processor fetches and executes instructions through the five-stage pipeline. Peripheral access is controlled via memory-mapped I/O registers.

Clock and Timing Specifications

  • System Clock: 25 MHz
  • UART Baud Rate: 115,200 baud with x16 oversampling (both UART1 and UART2)
  • SPI Clock Frequency: ~4.17 MHz (CLK_DIV = 3, Mode 0: CPOL = 0, CPHA = 0)

Peripheral Communication

Peripherals are accessed through memory-mapped I/O registers using a polling-based architecture. The CPU continuously reads status registers to determine when operations can proceed, rather than using interrupts. This simplifies the design while maintaining deterministic behavior.

The SPI master interface supports full-duplex communication with external peripherals such as sensors or external memory devices via dedicated signals (MOSI, MISO, SCLK, and CS).


Peripheral Address Map & Polling Schemes

1. UART2 Control Interface

Memory-mapped registers for UART2 (general-purpose communication):

Register Address Description
UART_TX 0x1000_0000 Write data byte to transmit
UART_RX 0x1000_0004 Read received data byte
UART_TX_STATUS 0x1000_0008 TX buffer status (Bit 0: 1 = ready/empty, 0 = busy)
UART_RX_STATUS 0x1000_000C RX buffer status (Bit 0: 1 = data available, 0 = empty)

Polling Protocol:

  • TX: Poll UART_TX_STATUS Bit 0 until high, then write to UART_TX
  • RX: Poll UART_RX_STATUS Bit 0 until high, then read from UART_RX

2. SPI Master Control Interface

Memory-mapped registers for SPI master mode (Mode 0):

Register Address Description
SPI_TX 0x4000_0000 Write data byte to transmit
SPI_RX 0x4000_0008 Read received data byte
SPI_TX_STATUS 0x4000_0004 TX status (Bit 0: 1 = idle/ready, 0 = shifting)
SPI_RX_STATUS 0x4000_000C RX status (Bit 0: 1 = data available, 0 = empty)

Polling Protocol:

  • TX: Poll SPI_TX_STATUS Bit 0 until high, then write to SPI_TX to initiate transfer
  • RX: Poll SPI_RX_STATUS Bit 0 until high, then read from SPI_RX
  • Timing: SPI transfers are full-duplex; RX data becomes available after the TX completes

3. GPIO Configuration

Memory-mapped GPIO output registers:

Register Address Description
GPIO1 0x3000_0000 General-purpose output (LED or other peripheral control)
GPIO2 0x3000_0004 SPI Chip Select control (typically active-low)

How to test

The design can be tested using both simulation and hardware deployment.

Simulation Environment

Use Icarus Verilog and cocotb for functional verification:

  1. Compilation: Standard testbench compiles the Verilog design with timing parameters
  2. Stimulus: Apply clock and reset signals; execute functional test sequences
  3. Waveform Analysis: Use GTKWave to inspect signal transitions, pipeline activity, and register updates

UART Testing (Bootloading)

  1. Program Loading: Use UART1 to send a sequence of 32-bit instructions serialized as bytes
  2. Acknowledgment: Verify the bootloader correctly stores instructions in instruction memory
  3. Sentinel Detection: Once the sentinel value (0xBAADF00D) is received, the bootloader halts and execution begins
  4. Execution Verification: Monitor pipeline progression, register state changes, and instruction completions

SPI Testing

  1. External Driver: An external SPI master drives SCLK, MOSI, and CS signals
  2. Data Observation: Monitor MISO output from the processor
  3. Timing Verification: Confirm Mode 0 timing compliance (CPOL=0, CPHA=0)
  4. Data Integrity: Compare transmitted and received byte streams

Functional Verification Checklist

  • [ ] Bootloader Integrity: Confirm UART1 correctly receives binary instruction bytes, populates instruction memory, detects sentinel value 0xBAADF00D, and transitions to execution mode cleanly
  • [ ] Pipeline Progression: Verify concurrent execution across IF/ID/EX/MEM/WB stages; confirm dependencies, branches, and register write-backs resolve without hazards
  • [ ] Memory-Mapped I/O Polling: Validate that the processor correctly handles status polling for UART2 and SPI without deadlocks
  • [ ] SPI-to-UART Relay: Test the interlock routine under load conditions; ensure no data is dropped and transfers complete reliably

Sample Verification Program: SPI-to-UART Polling Loop

The following assembly routine demonstrates a practical I/O sequence:

  • Assert SPI Chip Select (GPIO2)
  • Poll and write dummy data (0xAA) to SPI
  • Poll and read the SPI response byte
  • Poll and relay the byte over UART2 to the host
  • Repeat for 200 iterations, then release Chip Select
# ==============================================================================
# RISC-V MMIO SPI Polling and UART2 Relay Loop
# Target: 32-Bit Pipelined Core (RV32I)
# SPI to UART test
# ==============================================================================

    # --- Setup Base Address Pointers ---
    lui   x10, 0x40000        # x10 = SPI Base Address      (0x4000_0000)
    lui   x18, 0x30000        # x18 = GPIO Base Address     (0x3000_0000)
    lui   x19, 0x10000        # x19 = UART Base Address     (0x1000_0000)
    
    # --- Calculate Register Offsets ---
    addi  x11, x10, 8         # x11 = SPI_RX Data Offset    (0x4000_0008)
    addi  x12, x10, 4         # x12 = SPI_TX Status Offset  (0x4000_0004)
    addi  x13, x10, 12        # x13 = SPI_RX Status Offset  (0x4000_000C)
    addi  x21, x19, 8         # x21 = UART_TX Status Offset (0x1000_0008)
    addi  x14, x18, 4         # x14 = GPIO2 CS Offset       (0x3000_0004)
    
    # --- Initialize Loop Constants ---
    addi  x3,  x0, 1          # x3  = GPIO2 HIGH (CS de-assert)
    addi  x16, x0, 0          # x16 = GPIO2 LOW (CS assert)
    addi  x5,  x0, 200        # x5  = Transfer counter (200 iterations)
    addi  x22, x0, 1          # x22 = Status bit mask (Bit 0)
    addi  x7,  x0, 0xAA       # x7  = Dummy data byte
    
    # --- Assert SPI Chip Select ---
    sw    x16, 0(x14)         # GPIO2 = 0 (CS asserted to peripheral)
    
    loop:
        beq   x5, x0, release_cs   # If counter == 0, exit loop
    
    # --- Wait for SPI TX Ready ---
    wait_spi_tx:
        lw    x6, 0(x12)         # Load SPI_TX_STATUS
        and   x6, x6, x22        # Isolate Bit 0 (ready flag)
        beq   x6, x0, wait_spi_tx # If 0 (busy), loop back
        sw    x7, 0(x10)         # Write 0xAA to SPI_TX, initiate transfer
    
    # --- Wait for SPI RX Data ---
    wait_spi_rx:
        lw    x8, 0(x13)         # Load SPI_RX_STATUS
        and   x8, x8, x22        # Isolate Bit 0 (data available flag)
        beq   x8, x0, wait_spi_rx # If 0 (no data), loop back
        lw    x9, 0(x11)         # Read received byte from SPI_RX
    
    # --- Wait for UART TX Ready ---
    wait_uart_tx:
        lw    x6, 0(x21)         # Load UART_TX_STATUS
        and   x6, x6, x22        # Isolate Bit 0 (ready flag)
        beq   x6, x0, wait_uart_tx # If 0 (busy), loop back
        sw    x9, 0(x19)         # Write SPI byte to UART_TX
    
    # --- Next Iteration ---
        addi  x5, x5, -1         # Decrement counter
        jal   x0, loop           # Jump to next iteration
    
    release_cs:
        sw    x3, 0(x14)         # GPIO2 = 1 (CS de-asserted from peripheral)
        ecall                    # Halt simulation

Compiled Machine Code (SPI to UART)

Corresponding 32-bit hexadecimal instruction sequence: (link:https://riscvasm.lucasteske.dev/#)

0x40000537,  # lui   x10, 0x40000
0x30000937,  # lui   x18, 0x30000
0x100009b7,  # lui   x19, 0x10000
0x00850593,  # addi  x11, x10, 8
0x00450613,  # addi  x12, x10, 4
0x00c50693,  # addi  x13, x10, 12
0x00898a93,  # addi  x21, x19, 8
0x00490713,  # addi  x14, x18, 4
0x00100193,  # addi  x3,  x0, 1
0x00000813,  # addi  x16, x0, 0
0x03200293,  # addi  x5,  x0, 200
0x00300b13,  # addi  x22, x0, 1
0x0aa00393,  # addi  x7,  x0, 0xAA
0x01072023,  # sw    x16, 0(x14)
0x02028663,  # beq   x5, x0, release_cs
0x00752023,  # sw    x22, 0(x12) [wait_spi_tx start - poll]
0x0006a403,  # lw    x6, 0(x12)
0xfe040ee3,  # beq   x6, x0, wait_spi_tx
0x0005a483,  # lw    x8, 0(x10)
0x000aa303,  # lw    x3, 0(x13)
0x01637333,  # and   x6, x6, x22
0xfe031ce3,  # beq   x6, x0, wait_spi_rx
0x0099a023,  # sw    x9, 0(x11)
0xfff28293,  # addi  x5, x5, -1
0xfd9ff06f,  # jal   x0, loop
0x00372023,  # sw    x3, 0(x14) [release_cs]
0x00000073,  # ecall
0xBAADF00D   # SENTINEL VALUE (bootloader halt marker)
# ==============================================================================
# RISC-V MMIO SPI Polling and UART2 Relay Loop
# Target: 32-Bit Pipelined Core (RV32I)
#GPIO test
# ==============================================================================

    lui   x13, 0x30000        /* GPIO1 base = 0x30000000 GPIO1*/
    addi  x14, x13, 4         /* GPIO2 = 0x30000004 GPIO2*/

    addi  x5, x0, 10          /* loop counter */

    addi  x6, x0, 1           /* HIGH */
    addi  x7, x0, 0           /* LOW */

loop:

    /* BOTH OFF */
    sw    x7, 0(x13)
    sw    x7, 0(x14)

    /* delay */
    addi  x3, x0, 1000
delay1:
    addi  x3, x3, -1
    bne   x3, x0, delay1

    /* BOTH ON */
    sw    x6, 0(x13)
    sw    x6, 0(x14)

    /* delay */
    addi  x3, x0, 1000
delay2:
    addi  x3, x3, -1
    bne   x3, x0, delay2

    addi  x5, x5, -1
    bne   x5, x0, loop

    /* OFF both */
    sw    x7, 0(x13)
    sw    x7, 0(x14)

    ecall

Compiled Machine Code (GPIO test)

0x300006B7
0x00468713
0x00A00293
0x00100313
0x00000393
0x0076A023
0x00772023
0x3E800193
0xFFF18193
0xFE019EE3
0x0066A023
0x00672023
0x3E800193
0xFFF18193
0xFE019EE3
0xFFF28293
0xFA029EE3
0x0076A023
0x00772023
0x00000073,  # ecall
0xBAADF00D   # SENTINEL VALUE (bootloader halt marker)

Bootloader Operation: When the bootloader receives the sentinel value 0xBAADF00D, it:

  1. Stops accepting instruction bytes
  2. Clears the bootloader hardware state
  3. Enables the execution pipeline
  4. The processor begins fetching from address 0x0000_0000

External Hardware Requirements

Standard external signals required for operation:

  1. Clock Input: 25 MHz reference clock
  2. Reset Signal: Active-low asynchronous reset (rst_n)
  3. Enable Signal: System enable (ena) from external controller
  4. UART1 Interface:
    • RX (input from bootloader tool)
    • TX (output, optional acknowledgment)
  5. UART2 Interface:
    • RX (input for general communication)
    • TX (output for general communication)
  6. SPI Interface:
    • MOSI (Master Out, Slave In) — output from processor
    • MISO (Master In, Slave Out) — input to processor
    • SCLK (Serial Clock) — output from processor
    • CS (Chip Select) — output from GPIO2
  7. GPIO1: General-purpose output (open-drain or push-pull driver)
  8. GPIO2: Chip Select output for SPI slave (open-drain or push-pull driver)

Notes

  • This project is an educational, hands-on RISC-V SoC design
  • It is developed for learning CPU architecture, pipeline execution, and hardware-software interaction
  • Intended for academic and experimental use in Nepal
  • Focus is on clarity and understanding rather than commercial optimization
  • The design prioritizes simplicity, determinism, and observability of CPU behavior
  • No interrupt system is implemented; all peripherals use polling only
  • Instruction format assumes little-endian byte ordering

IO

#InputOutputBidirectional
0BOOT_UART1_RXBOOT_UART1_TXSPI_CS_GPIO2_TOP
1PER_UART2_RXPER_UART2_TXSPI_MOSI_TOP
2SPI_MISO_TOP
3SPI_SCLK_TOP
4GPIO1_TOP
5SPI_CS_GPIO3_TOP
6
7

Chip location

Controller Mux Mux Mux Mux Mux Mux Mux Mux Mux Mux Mux Mux Mux Mux Mux tt_um_chip_rom (Chip ROM) tt_um_factory_test (Tiny Tapeout Factory Test) tt_um_utoss_riscv (UTOSS RISC-V core) tt_um_memory_game_top (Number Memory Game) tt_um_danielpenas42 (Ball Display) tt_um_machinelearning (7-Segment Neural Predictor) tt_um_microlane_demo (microlane demo project) tt_um_pixel_processor (Tiny Pixel Processor) tt_um_jpigdon_gps_accelerator_top (GPS_Accelerator) tt_um_rgb_mixer (rgb_mixer) tt_um_bgao43 (Tiny TPU Systolic Array) tt_um_main (Pong in Verilog) tt_um_joannec34_teenytpu (teenytpu) tt_um_apa102_ws2812_squidgeefish (APA102 to WS2812 Translator) tt_um_uacj_bouncing_DVD_screensaver (Custom DVD Screensaver for VGA) tt_um_logoUACJ_MOGA (VGA_screensaver_UACJ) tt_um_grace_spi_led_driver (SPI-Controlled 8-Channel LED Driver) tt_um_rebeccargb_universal_decoder (Universal Binary to Segment Decoder) tt_um_rebeccargb_hardware_utf8 (Hardware UTF Encoder/Decoder) tt_um_happyhop_deadcast2 (happyhop) tt_um_dino7 (Dino-7: 7-Segment Runner Game) tt_um_arty3_mac_engine (Simple MAC Engine w/ Postproc) tt_um_uacj (Custom DVD Screensaver for VGA) tt_um_algofoogle_dottee (DOTTEE VGA demo (TTGF26a)) tt_um_mattvenn_signal_generator (Simple Signal Generator) tt_um_urish_simon (Simon Says memory game) tt_um_tpu (Tensor Processing Unit For GF) tt_um_gojimmypi_ttgf_UART_FSM_TRNG_Lab (Hardware Entropy Explorer: UART/SPI TRNG and PUF) tt_um_wokwi_465483277165299713 (First Tinytapeout) tt_um_prem_pipeline_test (Programmable_Pipeline-RISC-V) tt_um_wokwi_467219410242853889 (Tiny Tapeout testtest 111233) tt_um_wokwi_465549494272929793 (Pacos first design) tt_um_wokwi_465731371445677057 (Arturo's first Wokwi design) tt_um_wokwi_465732744934845441 (Tiny Tapeout Template_1234) tt_um_wokwi_465736492859711489 (Tiny Tapeout Workshop JuanF) tt_um_wokwi_465731430225727489 (Rafa’s first Wokwi design) tt_um_wokwi_465731458365332481 (7 segment Display Fli-Flop Try-out) tt_um_wokwi_465732744245929985 (DiseñoCursoTiny) tt_um_wokwi_465731490568160257 (Matt’s first Wokwi design) tt_um_wokwi_465736691688630273 (test1) tt_um_wokwi_465731458628527105 (Mi copia del Tiny Tapeout) tt_um_wokwi_465731520738845697 (El primer diseño) tt_um_wokwi_465731521356457985 (Tiny Tapeout Template Copy) tt_um_gen1_digital_companion_tile (Gen1 Digital Companion Tile) tt_um_wokwi_465732827753495553 (Tiny Tapeout Template Ayman) tt_um_wokwi_465731394728267777 (Julian_Proyecto) tt_um_wokwi_465731458535202817 (Tiny Tapeout Template Copy) tt_um_wokwi_465732847401723905 (Basic Circuit) tt_um_wokwi_465731452481768449 (El primer diseño de Matt para Wokwi) tt_um_wokwi_465731502018614273 (Tiny Tapeout Template flip flop) tt_um_wokwi_465732616714924033 (Tiny Tapeout RJAP) tt_um_wokwi_465731575275296769 (ocxpkeWokwiDesign) tt_um_wokwi_465732880722332673 (Pedro Template) tt_um_wokwi_465731858252480513 (Paula's first Wokwi design) tt_um_wokwi_465731455677830145 (Tiny Tapeout JMCG) tt_um_wokwi_465737601403996161 (Tiny Number Simon) tt_um_ttmul (Balanced Ternary Multiplier) tt_um_wokwi_465731466664816641 (Tiny Tapeout Workshop Malaga 2jun2026) tt_um_8bit_risc_cpu (8-bit RISC CPU) tt_um_wokwi_451184391728659457 (Simple Sprinkler) tt_um_fhw_appel_spiPWMio (spiPWMio) tt_um_divadnauj_GB_serv_soc_wb (serv_soc_wb) tt_um_8bitcustomcomputer (SAP 8 Bit Computer) tt_um_bioimpedance (Very Low Resource Digital Implementation of Bioimpedance Analysis) tt_um_mgj_bist8 (BIST-8: Built-In Self-Test for 8-bit CLA Adder) tt_um_roberto_tiny_radar_tile (BioPulse Tile) tt_um_systolic_mac_2x2 (2x2 Systolic Array Matrix Multiplier) tt_um_peg_top (2x2 CNN Accelerator PE Grid with UART) tt_um_AlvaroRub_ringcounter (Counter16Outputs) tt_um_wokwi_465731440267947009 (Antonio's first Wokwi design) tt_um_wokwi_465732706576877569 (Guille's first Wokwi design.) tt_um_wokwi_465731481873367041 (MIPS-Lite 8-bit Processor) tt_um_wokwi_465736612213902337 (Juan`s first Worki design) tt_um_wokwi_465731439156454401 (Rhyloo’s first Wokwi design) tt_um_wokwi_465732536551273473 (Tiny Tapeout Marcos Fernandez) tt_um_wokwi_465737290543084545 (Tiny Tapeout Template) tt_um_wokwi_465630130495825921 (ram 1 bit Copy) tt_um_wokwi_465731403724006401 (sdft wokwi 1) tt_um_top (RHD2164-MCU-SPI Bridge) tt_um_line_follower_arvaloez (Line Follower Robot controller) tt_um_xoroshiro64plus_v2 (xoroshiro64) tt_um_ohuettenhofer_tiny_qsim (Tiny Quantum Circuit Simulator) tt_um_santhosh_ring_osc_gf (Ring Oscillator PVT Sensor & TRNG (GF180)) tt_um_santhosh_stoch_stdp_pair_gf (Stochastic neuron + STDP controller (merged, GF180)) tt_um_santhosh_rsd_char_gf (RRAM Characterization Platform (DC sweep + endurance + retention + histogram, GF180)) tt_um_santhosh_xbar_ctrl_gf (Memristive Crossbar Peripheral Controller (GF180)) tt_um_joseph_bf (BF) tt_um_hydrocomms (FSK Modem) tt_um_systolic_array (2x2 MAC Systolic array with DFT) tt_um_kluterirv_rv32e_core (Minimal RV32E SoC with UART Loader) tt_um_algofoogle_ttgf26a_vco (VCO driven by DAC) tt_um_fer_logo_music_vga (UNIZG-FER VGA project) tt_um_maqsudbek_dyadic_pwm (Dyadic PWM) tt_um_waferspace_vga_screensaver (Wafer.space Logo VGA Screensaver) tt_um_htfab_vga_tester (Video mode tester)