554 Measurement of CMOS VLSI Design Problem 4.11

554 : Measurement of CMOS VLSI Design Problem 4.11

Project

Measure the delay of the implementations of problem 4.11 in Weste & Harris

https://pages.hmc.edu/harris/cmosvlsi/4e/index.html

Problem 4.11

Consider four designs of a 6-input AND gate shown below. Develop an expression for the delay of each path if the path electrical effort is H. What design is fastest for H = 1? For H = 5? For H = 20?

For this problem, H measures the capacitance the AND gate needs to drive. It’s the output capacitance divided by the input capacitance of one inverter. So H=20 means the output capacitance is the same as 20 inverters.

For an example of when this could occur, consider this AND gate as a row decoder, and H is the number of columns.

The Theory of Logical Effort

For more details see

This is the linear delay model and the basic theory is the delay of a gate can be determined by the equation:

D = G * H + P

Variable Name Description
H Electrical Effort The amount of output capacitance this gate needs to drive relative to the input capacitance of an inverter.
G Logical Effort Logical Effort is a rough measure of the gate’s complexity. It can be thought of as the amount of input capacitance relative to an inverter with equal drive strength, the amount of drive strength when the input capacitance is the same as an inverter, or the slope of the fanout line. More “complex” gates will have higher logic efforts.
P Parasitic Delay Parasitic delay measures the output capacitance of this gate relative to the output capacitance of an inverter of the same strength. More complex gates have more output capacitance.
D Stage Delay The delay of this stage relative to the delay of one inverter.

The linear delay model has several possible issues (Weste & Harris covers this), but critically, it ignores wires. It’s known to work well enough for 0.25u processes and above, where wire loading is less significant relative to the delay and loading of the transistors themselves. We’ll see how well it works at 130nm for Sky130A in the TinyTapeout/OpenLane flow, where placement density is not particularly high.

What I implemented

How I determined the drive values

The key is to notice that when we increase the drive of one cell, we increase the load on the previous cell. This changes the electrical effort of that cell. So, for part D we end up with the following chain

There are equations one can use to optimize this but I just used mathematica because it’s faster. Here is how I solved H=20 for part D.

Then I just used the cell with the nearest drive from the library and checked the result was within one inverter delay of the optimal value.

This procedure is roughly equivalent to what ABC does in the OpenLane flow after mapping. Remember that although ABC can optimize drive, it can’t modify structure. As we can see, the choice of structure does have implications on performance, even after the drive is optimized.

Calculated delays with optimized drive

Design H=1 H=5 H=20
A 10.7 14.8 22.7
B 8.3 12.5 20.0
C 9.7 14.5 23.0
D 12.0 14.8 18.6

These are in units of one inverter delay which is about 70pS in Sky130A.

What is drive?

Drive counts the number of equivalent parallel gates to increase the output current the resulting compound gate. More gates in parallel allows sourcing larger load capacitances with higher dv/dt at the cost of input capacitance, area, and power.

Opting for an integrated cell for the parallel gates is a practical choice. It empowers the layout engineer to implement strategies that effectively minimize wire loading and output capacitance of the overall structure, thereby enhancing the gate’s performance.

See the various incarnations of nand2 in Sky130A for an example.

https://diychip.org/sky130/sky130_fd_sc_hd/cells/nand2/

To those with a more analog bend you might think the previous gate is acting like a pre-amplifier, and you’re right. It is exactly the same. You might also notice the inverter is basically a class B stage, except it inverts. It is and that’s what makes it such a good reference when one considers driving loads.

This provides a bit of theory behind the ZtA video

When are 2 logic gates faster than 1?

The video is a bit different than this problem in that it kept the load constant but shrunk the period constraints.

Here we keep the period constraint the same but increase the load. At the end of the day it’s the same idea in that we’re changing I to get a different dv/dt for a given C.

Test structure

Raw Delay Calculations

In[1]:= NMinimize[{
(x 5/3+3)+(y/x*1+1)+(z/y*4/3+2)+(5/z*1+1),(*G and P from Table 4.2 and 4.3 in Weste & Harris *)
1<=x,x<8,1<=y,y<8,1<=z,z<8},{x,y,z}]
Out[1]= {14.303,{x->1.09546,y->2.00004,z->2.73865}}
In[2]:= (*Part A*)
In[3]:= Round[(H/x 1+1+(x (6+2)/3+6))/.{H->{1,5,20},x->{1,2,4}},0.1]
Out[3]= {10.7,14.8,22.7}
In[4]:= (*Part B*)
In[5]:= Round[((H/x 5/3+2)+(x 5/3+3))/.{H->{1,5,20},x->{1,2,4}},0.1]
Out[5]= {8.3,12.5,20.}
In[6]:= (*Part C*)
Round[((H/x 7/3+3)+(x 4/3+3))/.{H->{1,5,20},x->{1,2,4}},0.1]
Out[6]= {9.7,14.5,23.}
In[7]:= (*Part D*)
In[8]:= 
Round[((x 5/3+3)+(y/x*1+1)+(z/y*4/3+2)+(H/z*1+1))/.{H->{1,5,20},x->{1,1,1},y->{1,1,2},z->{1,2,6}},0.1]
Out[8]= {12.,14.8,18.}

Hopefully there are no typos or transcription errors.

IO

# Input Output Bidirectional
0 sel[0] b[0] a[0]
1 sel[1] b[1] a[1]
2 sel[2] b[2] a[2]
3 sel[3] b[3] a[3]
4 h[0] &(A[5:0]) a[4]
5 h[1] ntest a[5]
6 h[2] count a[6]
7 ntest overflow a[7]

Chip location

Controller Mux Mux Mux Mux Mux Mux Mux Mux Mux Mux Mux Analog Mux Mux Mux Mux Mux Mux Mux Mux Mux Mux Mux Analog Mux Mux Mux Mux Mux Mux Mux Mux Mux Mux tt_um_chip_rom (Chip ROM) tt_um_factory_test (TinyTapeout 06 Factory Test) tt_um_analog_factory_test (TT06 Analog Factory Test) tt_um_analog_factory_test (TT06 Analog Factory Test) tt_um_urish_charge_pump (Dickson Charge Pump) tt_um_psychogenic_wowa (WoWA) tt_um_oscillating_bones (Oscillating Bones) tt_um_kevinwguan (Crossbar Array) tt_um_coloquinte_moosic (Moosic logic-locked design) tt_um_alexsegura_pong (Pong) tt_um_iron_violet_simon (Iron Violet) tt_um_tomkeddie_a (VGA Experiments in Tennis) tt_um_MichaelBell_tinyQV (TinyQV Risc-V SoC) tt_um_andychip1_sn74169 (sn74169) tt_um_mattvenn_r2r_dac (Analog 8bit R2R DAC) tt_um_thorkn_audiochip_v2 (AudioChip_V2) tt_um_faramire_gate_guesser (Gate Guesser) tt_um_urish_simon (Simon Says game) tt_um_TT06_SAR_wulffern (TT06 8-bit SAR ADC) tt_um_soundgen (soundgen) tt_um_ledcontroller_Gatsch (ledcontroller) tt_um_digitaler_filter_rathmayr (Digitaler Filter) tt_um_histefan_top (Snake Game) tt_um_mayrmichael_wave_generator (Wave Generator) tt_um_advanced_counter (jku-tt06-advanced-counter) tt_um_FanCTRL_DomnikBrandstetter (PI-Based Fan Controller) tt_um_ps2_morse_encoder_top (PS/2 Keyboard to Morse Code Encoder) tt_um_calculator_muehlbb (16-bit calculator) tt_um_hpretl_tt06_tempsens (Temperature Sensor NG) tt_um_haeuslermarkus_fir_filter (FIR Filter with adaptable coefficients) tt_um_mattvenn_rgb_mixer (RGB Mixer demo) tt_um_analog_loopback (Analog loopback) tt_um_entwurf_integrierter_schaltungen_hadner (Projekt KEIS Hadner Thomas) tt_um_seven_segment_fun1 (7-segment-FUN) tt_um_moving_average_master (Moving average filter) tt_um_rgbled_decoder (SPI to RGBLED Decoder/Driver) tt_um_4bit_cpu_with_fsm (4-Bit CPU mit FSM) tt_um_flappy_bird (Flappy Bird) tt_um_drops (drops) tt_um_enieman (UART-Programmable RISC-V 32I Core) tt_um_gabejessil_timer (2 Player Game) tt_um_wokwi_384804985843168257 (playwithnumbers) tt_um_wokwi_384711264596377601 (luckyCube) tt_um_hpretl_tt06_tdc (Synthesized Time-to-Digital Converter (TDC)) tt_um_wokwi_384437973887503361 (Asynchronous Down Counter) tt_um_spi_pwm_djuara (spi_pwm) tt_um_SteffenReith_PiMACTop (PiMAC) tt_um_mattvenn_relax_osc (Relaxation oscillator) tt_um_jv_sigdel (1st passive Sigma Delta ADC) tt_um_wokwi_392873974467527681 (PILIPINAS) tt_um_scorbetta_goa (GOA - grogu on ASIC) tt_um_sanojn_ttrpg_dice (TTRPG Dice + simple I2C peripheral) tt_um_urish_dffram (DFFRAM Example (128 bytes)) tt_um_lucaz97_monobit (Monobit Test) tt_um_noritsuna_i4004 (i4004 for TinyTapeout) tt_um_hpretl_tt06_tdc_v2 (Synthesized Time-to-Digital Converter (TDC) v2) tt_um_vaf_555_timer (A 555-Timer Clone for Tiny Tapeout 6) tt_um_obriensp_be8 (8-bit CPU with Debugger (Lite)) tt_um_toivoh_retro_console (Retro Console) tt_um_mattvenn_inverter (Double Inverter) tt_um_SteffenReith_ASGTop (ASG) tt_um_lucaz97_rng_tests (rng Test) tt_um_dieroller_nathangross1 (Die Roller) tt_um_kwilke_cdc_fifo (Clock Domain Crossing FIFO) tt_um_spiff42_exp_led_pwm (LED PWM controller) tt_um_devinatkin_fastreadout (Fast Readout Image Sensor Prototype) tt_um_ja1tye_tiny_cpu (Tiny 8-bit CPU) tt_um_7seg_animated (Animated 7-segment character display) tt_um_neurocore (Neurocore) tt_um_zhwa_rgb_mixer (RGB Mixer) tt_um_wokwi_394704587372210177 (Cambio de giro de motor CD) tt_um_ian_keypad_controller (Keypad controller) tt_um_urish_spell (SPELL) tt_um_vks_pll (PLL blocks) tt_um_fountaincoder_top (multimac) tt_um_dsatizabal_opamp (Simple FET OpAmp with Sky130.) tt_um_obriensp_be8_nomacro (8-bit CPU with Debugger) tt_um_LFSR_shivam (10-bit Linear feedback shift register) tt_um_shivam (Pulse Width Modulation) tt_um_algofoogle_tt06_grab_bag (TT06 Grab Bag) tt_um_meiniKi_tt06_fazyrv_exotiny (FazyRV-ExoTiny) tt_um_wokwi_394888799427677185 (4-bit stochastic multiplier traditional) tt_um_QIF_8bit (8 Bit Digital QIF) tt_um_MATTHIAS_M_PAL_TOP_WRAPPER (easy PAL) tt_um_andrewtron3000 (Rule 30 Engine!) tt_um_tommythorn_4b_cpu_v2 (Silly 4b CPU v2) tt_um_aerox2_jrb8_computer (The James Retro Byte 8 computer) tt_um_wokwi_394898807123828737 (4-bit Stochastic Multiplier Compact with Stochastic Resonator) tt_um_argunda_tiny_opamp (Tiny Opamp) tt_um_fdc_chip (Frequency to digital converters (asynchronous and synchronous)) tt_um_8bit_cpu (8-Bit CPU In a Week) tt_um_mitssdd (co processor for precision farming) tt_um_fstolzcode (Tiny Zuse) tt_um_liu3hao_rv32e_min_mcu (tt06-RV32E_MinMCU) tt_um_kianV_rv32ima_uLinux_SoC (KianV uLinux SoC) tt_um_wokwi_395444977868278785 (*NOT WORKING* HP 5082-7500 Decoder) tt_um_wokwi_394618582085551105 (Keypad Decoder) tt_um_wokwi_395054820631340033 (Workshop Hackaday Juli) tt_um_wokwi_395055035944909825 (Some_LEDs) tt_um_wokwi_395055351144787969 (Hack a day Tiny Tapeout project) tt_um_wokwi_395054823569451009 (First TT Project) tt_um_wokwi_395054823837887489 (Dice) tt_um_wokwi_395055341723330561 (Workshop_chip) tt_um_jduchniewicz_prng (8-bit PRNG) tt_um_wokwi_395054564978002945 (Bestagon LED matrix driver) tt_um_wokwi_395054466384583681 (1-Bit ALU 2) tt_um_wokwi_395058308283408385 (test for tiny tapeout hackaday) tt_um_s1pu11i_simple_nco (Simple NCO) tt_um_wokwi_395055359324730369 (Tiny_Tapeout_6_Frank) tt_um_disp1 (Display test 1) tt_um_pckys_game (PCKY´s Successive Approximation Game) tt_um_tiny_shader_mole99 (Tiny Shader) tt_um_wokwi_393815624518031361 (My Chip) tt_um_minibyte (Minibyte CPU) tt_um_emilian_rf_playground (IDAC8 based on divide current by 2) tt_um_triple_watchdog (Triple Watchdog) tt_um_wokwi_395142547244224513 (EFAB Demo 2) tt_um_chisel_hello_schoeberl (Chisel Hello World) tt_um_aiju_8080 (8080 CPU) tt_um_wokwi_395134712676183041 (Inverters) tt_um_nubcore_default_tape (DEFAULT) tt_um_wuehr1999_servotester (Servotester) tt_um_wokwi_395055722430895105 (Servo Signal Tester) tt_um_exai_izhikevich_neuron (Izhikevich Neuron) tt_um_lisa (LISA 8-Bit Microcontroller) tt_um_wokwi_394707429798790145 (32-Bit Galois Linear Feedback Shift Register) tt_um_CKPope_top (X/Y Controller) tt_um_MNSLab_BLDC (Universal Motor and Actuator Controller) tt_um_couchand_dual_deque (Dual Deque) tt_um_JamesTimothyMeech_inverter (Programmable Thing) tt_um_signed_unsigned_4x4_bit_multiplier (Signed Unsigned multiplyer) tt_um_lipsi_schoeberl (Lipsi: Probably the Smallest Processor in the World) tt_um_i_tree_batzolislefteris (Anomaly Detection using Isolation trees) tt_um_wokwi_394830069681034241 (Cyclic Redundancy Check 8 bit) tt_um_rng_3_lucaz97 (RNG3) tt_um_wokwi_395263962779770881 (Bivium-B Non-Linear Feedback Shift Register) tt_um_dvxf_dj8 (DJ8 8-bit CPU) tt_um_silicon_tinytapeout_lm07 (Digital Temperature Monitor) tt_um_htfab_flash_adc (Flash ADC) tt_um_chisel_pong (Chisel Pong) tt_um_wokwi_395414987024660481 (HELP for tinyTapeout) tt_um_jorgenkraghjakobsen_toi2s (SPDIF to I2S decoder) tt_um_cmerrill_pdm (Parallel / SPI modulation tester) tt_um_csit_luks (CSIT-Luks) tt_um_wokwi_395357890431011841 (Trivium Non-Linear Feedback Shift Register) tt_um_drburke3_top (SADdiff_v1) tt_um_cejmu_riscv (TinyRV1 CPU) tt_um_rejunity_current_cmp (Analog Current Comparator) tt_um_loco_choco (BF Processor) tt_um_qubitbytes_alive (It's Alive) tt_um_wokwi_395061443288867841 (BCD to single 7 segment display Converter) tt_um_SJ (SiliconJackets_Systolic_Array) tt_um_ejfogleman_smsdac (8-bit DEM R2R DAC) tt_um_wokwi_395055455727667201 (Hardware Trojan Part II) tt_um_ericsmi_weste_problem_4_11 (Measurement of CMOS VLSI Design Problem 4.11) tt_um_wokwi_395034561853515777 (2 bit Binary Calculator) tt_um_mw73_pmic (Power Management IC) tt_um_Counter_1_shivam (8-bit Binary Counter) tt_um_wokwi_395054508867644417 (SynchMux) tt_um_otp_encryptor (TT06 OTP Encryptor) tt_um_wokwi_395514572866576385 (Parity Generator) tt_um_ADPCM_COMPRESSOR (ADPCM Encoder Audio Compressor) tt_um_3515_sequenceDetector (Sequence detector using 7-segment) tt_um_faramire_stopwatch (Simple Stopwatch) tt_um_ks_pyamnihc (Karplus-Strong String Synthesis) tt_um_dlmiles_muldiv8 (MULDIV unit (8-bit signed/unsigned)) tt_um_dlmiles_muldiv8_sky130faha (MULDIV unit (8-bit signed/unsigned) with sky130 HA/FA cells) tt_um_tommythorn_ncl_lfsr (NCL LFSR) tt_um_lk_ans_top (ANS Encoder/Decoder) tt_um_MichaelBell_latch_mem (Latch RAM (64 bytes)) tt_um_wokwi_395179352683141121 (Combination Lock) tt_um_Uart_Transciver (UART Transceiver) tt_um_dgkaminski (4-Bit ALU) tt_um_DigitalClockTop (TDM Digital Clock) tt_um_wokwi_394640918790880257 (IFSC Keypad Locker) tt_um_wokwi_395355133883896833 (BIT COMPARATOR) tt_um_alu (SumLatchUART_System) tt_um_alfiero88_VCII (VCII) tt_um_ALU (3-bit ALU) tt_um_topTDC (Convertidor de Tiempo a Digital (TDC)) tt_um_UABCReloj (24 H Clock) tt_um_CDMA_Santiago (CDMA_2024) tt_um_dr_skyler_clock (Clock) tt_um_motor (motor a pasos) tt_um_mult_2b (mult_2b) tt_um_CodHex7seg (Decodificador binario a display 7 segmentos hexadecimal) tt_um_S2P (Serial to Parallel Register) tt_um_PWM (PWM) tt_um_ss_register (serie_serie_register) tt_um_stepper (Stepper) tt_um_g3f (Generador digital trifásico) tt_um_ALU_DECODERS (ALU with a Gray and Octal decoders) tt_um_ram (4 bit RAM) tt_um_sap_1 (SAP-1 Computer) tt_um_guitar_pedal (Integrated Distorion Pedal) tt_um_mbalestrini_usb_cdc_devices (Two ports USB CDC device) tt_um_adammaj (Tiny ALU) tt_um_wokwi_395567106413190145 (4-Bit Full Adder and Subtractor with Hardware Trojan) tt_um_gak25_8bit_cpu_ext (Most minimal extension of friend's 'CPU In a Week' in a day) tt_um_hsc_tdc (UCSC HW Systems Collective, TDC) tt_um_BoothMulti_hhrb98 (UACJ-MIE-Booth 4) tt_um_dlmiles_poc_fskmodem_hdlctrx (FSK Modem +HDLC +UART (PoC)) tt_um_simplez_rcoeurjoly (tt6-simplez) tt_um_nurirfansyah_alits01 (Analog Test Circuit ITS: VCO) tt_um_ppca (drEEm tEEm PPCA) tt_um_wokwi_395522292785089537 (Displays CIt) tt_um_fpu (Dgrid_FPU) tt_um_duk_lif (Leaky Integrate and fire neuron(LIF)) tt_um_bomba1 (Latin_bomba) tt_um_chatgpt_rsnn_paolaunisa (ChatGPT designed Recurrent Spiking Neural Network) tt_um_bit_ctrl (Bit Control) tt_um_array_multiplier_hhrb98 (Array Multiplier) tt_um_wallace_hhrb98 (UACJ-Wallace multiplier) tt_um_I2C_to_SPI (TinyTapeout SPI Master) tt_um_rng (Random number generator) tt_um_wokwi_395599496098067457 (EVEN AND ODD COUNTERS) tt_um_8bitALU (8bit ALU) tt_um_aleena (Analog Sigmoid) tt_um_rejunity_1_58bit (Ternary 1.58-bit x 8-bit matrix multiplier) tt_um_rejunity_fp4_mul_i8 (FP4 x 8-bit matrix multiplier) tt_um_PWM_Controller (PWM Controller) tt_um_couchand_cora16 (CORA-16) tt_um_frq_divider (clk frequency divider controled by rom) tt_um_wokwi_390913889347409921 (Notre Dame Dorms LED) tt_um_timer_counter_UGM (4-Digit Scanning Digital Timer Counter) tt_um_koconnor_kstep (kstep) tt_um_lancemitrex (DIP Switch to HEX 7-segment Display) tt_um_PWM_Sine_UART (PWM_Sinewave_UART) tt_um_nicklausthompson_twi_monitor (TWI Monitor) tt_um_wokwi_395615790979120129 (Cambio de giro de motor CD) tt_um_ancho (Circuito PWM con ciclo de trabajo configurable) tt_um_wokwi_395618714068432897 (32b Fibonacci Original) tt_um_voting_thingey (Voting thingey) tt_um_hsc_tdc_buf (UCSC HW Systems Collective, TDC - BUF2x1) tt_um_hsc_tdc_mux (UCSC HW Systems Collective, TDC - MUX2x1) tt_um_petersn_micro1 (14 Hour Simple Computer) tt_um_sanojn_tlv2556_interface (UART interface to ADC TLV2556 (VHDL Test)) tt_um_gray_sobel (Gray scale and Sobel filter) tt_um_wokwi_395614106833794049 (Universal gates) Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available