#### Designing DDR3 system using Static Timing Analysis in conjunction with IBIS simulations

cādence

Taranjit Kukal, Zhangmin Zhong, Heiko Dudek Cadence Design Systems, Inc.

**Presented by: Kent Ho** 

Asian IBIS Summit Hsinchu, Taiwan November 13, 2012



- Key Design Challenges
  - DDR3 Timing and SI specifications
- Problem Statement
  - Piecemeal simulations do not guarantee optimal design
- Solution
  - Static Timing Analysis in conjunction with IBIS simulations
- Use-cases
  - Step-by-step method to optimally use EDA flows
- Summary





- Key Design Challenges
   DDR3 Timing and SI specifications
- Problem Statement
  - Piecemeal simulations do not guarantee optimal design
- Solution
  - Static Timing Analysis in conjunction with IBIS simulations
- Use-cases
  - Step-by-step method to optimally use EDA flows
- Summary





## Key Design Challenges: Timing Budget

- Set-up / Hold Times
  - Data write w.r.t strobe
  - Data read w.r.t strobe
  - Addressing w.r.t clock
- Strobe w.r.t clock
  - Data w.r.t Address
- Account for
  - Clock/Strobe Jitters and Interconnect Jitters
  - Slew-rates and hence derating of setup/hold

cādence<sup>®</sup>

## Key Design Challenges: Signal Quality

- Thresholds
  - DC and AC
  - Noise-Margins
- Overshoots/Undershoots
  - Magnitude
  - Area
- tVac
  - Minimum time for signal to stay above threshold
- Eye
  - Data-Valid Window after accounting Jitter
- Slews that in-turn affect timing
  - Rise/Fall times



## Key Design Challenges: Component Selection

- Memory-Buffers
  - Trade-off between read-write cycles
- Controller Driver strength
  - Trade-off between read-write cycles
- Connector
  - Insertion loss
- Strobe/Clock differential buffers
  - Should satisfy tDVac and overshoot/undershoot area requirements

cādence<sup>°°</sup>

#### Key Design Challenges: Layout Constraints

- Trace-lengths
  - Relational Propagation-delays Data-Strobe for balanced setup/hold
  - Relational Propagation-delays Address-Clock for balance setup/hold
  - Relational Propagation-delays Strobe-Clock for successful write-leveling
- Topology schedules
  - Point to Point for Data
  - FlyBy for Address
- Trace Impedance
  - Example: Lead-in section (45 ohm) to Load-in section (60 ohm) through neck-down (~5 to 10 mm) for clock

cādence

- Percentage variation that can be tolerated
- Differential matching (CLK, STROBE)
  - Maximum unparallel length



- Key Design Challenges

   DDR3 Timing and SI specifications
- Problem Statement
  - Piecemeal simulations do not guarantee optimal design
- Solution:
  - Static Timing Analysis in conjunction with IBIS simulations
- Recommended methodology
  - Step-by-step method to optimally use EDA flows
- Summary



# Problem Statement: Multiple constraints across Timing / SI

- DDR3 has several SI and Timing constraints and getting all of them to meet is a big solution-space to explore.
- Designer tries to fix a few; and in the process puts other measurements-of -interest out of specifications.

## Problem Statement: Timing Closure across read/write/address

- Timing-Closure is time-consuming as there are too many constraints to be met
  - Etch delays needed for timing-closure during Read cycle may not work during Write cycle.
  - It is not enough to get just positive Setup/Hold margins; optimal design needs setup and hold margins that equally distributed.
- Requirement of relative delays between Data (Strobe) vs Address (Clock) brings additional challenge.
- It is also important to budget for signal and interconnect jitters on various signals.
  - What may look to be meeting the constraint is likely to fail due to jitter causing uncertainty in the signal.

### Problem Statement: SI affects timing

- Slew-rates affect Setup/Hold time-constraints.
  - SI simulations provide slew-rates of signals that in-turn need to be considered for timing constraints. For example, hold-time constraint could be 160pS for slew-rate of 1V/ns while it could be 200pS when the signal slew-rate is 2V/ns
- Eye-shape could indicate a need for different relative etchlengths for equal setup/hold margins.
  - While Static Timing Calculations would provide one set of readings for etch-delays, the eye-shape (that could be narrow on one side) may force the designer to refine the relative delays for balanced setup/hold times.
- Stack-up variation and Cross-talk causes interconnect jitter that needs to be accounted for in the timing-checks.
  - It is important that such jitter is estimated through SI simulations and then annotated to the Timing-models for timing closure.

#### Problem Statement (Current approach): Ad-hoc analysis and verification (Leading to Non-optimal design)

- Timing-checks are done using hand-calculations at times and then the focus is to do post-layout verification using SI simulations to ensure correctness.
- Limitations:
  - Goal is to just meet constraints as against optimal design with enough margins on all constraints.
  - Manual timing-budget calculations are time-consuming and inefficient
  - No way to include SI effects into timing-calculations

cādence<sup>°</sup>

### Problem Statement (Current approach) : Ad-hoc analysis and verification (Leading to Non-optimal design)

- Designers use layout rules provided by device manufacturers
- Limitation:
  - Limits flexibility. PCB designers refrain from trying variations in terms of component selection from different vendors and in trying different board dimensions and circuit configurations.
  - Over-design at times as layout guidelines are usually on the stricter side to ensure working of system

#### Problem Statement (Current approach) : Ad-hoc analysis and verification (Leading to Non-optimal

design)

- SI simulations are usually done as audit at verification step following a piecemeal approach
- Limitations:
  - Use of real-time simulations to do exhaustive timing-verification is too time-consuming and difficult.
  - It is very difficult to manage optimal parameter selection across constraints spread across read/write and address cycles.



- Key Design Challenges
  - DDR3 Timing and SI specifications
- Problem Statement
  - Piecemeal simulations do not guarantee optimal design
- Solution
  - Static Timing Analysis in conjunction with IBIS simulations
- Recommended methodology
  - Step-by-step method to optimally use EDA flows
- Summary



## Unified SI/STA flow for DDR3



### Solution: Static Timing analysis

- Static Timing exploration independent of time-expensive SI simulations can provide seed for etch estimation.
- Automatic update of constraint limits based on data-rate, slew-rate of signals, threshold-values selected for design can make easy computation.
- Automatic calculation of relative etch-delays for balanced setup/hold times while accounting for uncertainty in signals due to jitter can save multiple iterations.

cādence<sup>™</sup>

# Solution: Timing analysis feeds SI simulations

- Results of STA feed into SI simulations.
  - Estimated etch-delays (flight-time) of data, strobe, address, clock map to interconnect flight-times
  - Estimated jitter becomes constraint for cross-talk (interconnect and data-dependent)
- SI Simulations with IBIS buffers
  - Building on interconnect details (vias, trace-lengths, stack-up) keeping the flight-time constraint from STA
  - Improving on interconnect topologies to meet SI constraints and better centering of strobe w.r.t data

# Solution: SI simulations that feedback STA

- Feed-back updated flight-times (switch-delays), worst-jitter and slew-rates from real-time SI simulations to timingmodels to close timing-constraints.
- Generation of layout constraints from interconnect topologies
  - Routing the board based on layout constraints
- Post-route SI simulations followed by timing-closure.



- Key Design Challenges
  - DDR3 Timing and SI specifications
- Problem Statement
  - Piecemeal simulations do not guarantee optimal design
- Solution
  - Static Timing Analysis in conjunction with IBIS simulations
- Use-cases
  - Step-by-step method to optimally use EDA flows
- Summary











cādence<sup>™</sup>

## Building Project



- Frequency of operation and AC threshold levels
  - Configures TD models
  - Configures custom measurements
- Address (1T / 2T)
  - Configures TD models
- New DIMMs (Or On-board) vs Existing DIMMs
  - Pre-created Topologies vs Extracted DIMM topologies
- DIMM Card Type
  - Configures topologies and ECSets

cādence<sup>™</sup>



### Timing estimation

- Data-Strobe
  - Write
  - Read
- Address-Clock (1T or 2T)
- Decide etchdelays that can meet timing specifications



## IO-model selection/Exploration

- For best noise-margins and Eye for read/write

DDR2/3 Module

- Controller Model
  - Impedance
- Memory Model
   ODT
- Connector Model
- Strobe
  - tvac, shootarea

| Configuration      |                    |          |                    | Slo               | ot 1    | Slot 2            |         |
|--------------------|--------------------|----------|--------------------|-------------------|---------|-------------------|---------|
| Slot 1<br>(DIMM 1) | Slot 2<br>(DIMM 2) | Write To | DRAM<br>Controller | Rank 1            | Rank 2  | Rank 1            | Rank 2  |
| Dual rank          | Dual rank          | Slot 1   | ODT off            | 120Ω              | ODT off | ODT off           | 30Ω     |
|                    |                    | Slot 2   | ODT off            | ODT off           | 30Ω     | 120Ω              | ODT off |
| Dual rank          | Single rank        | Slot 1   | ODT off            | 120Ω              | ODT off | 20Ω               | n/a     |
|                    |                    | Slot 2   | ODT off            | ODT off           | 20Ω     | 120Ω <sup>1</sup> | n/a     |
| Single rank        | Dual rank          | Slot 1   | ODT off            | 120Ω <sup>1</sup> | n/a     | ODT off           | 20Ω     |
|                    |                    | Slot 2   | ODT off            | 20Ω               | n/a     | 120Ω              | ODT off |
| Single rank        | Single rank        | Slot 1   | ODT off            | 120Ω <sup>1</sup> | n/a     | 30Ω               | n/a     |
|                    |                    | Slot 2   | ODT off            | 30Ω               | n/a     | 120Ω <sup>1</sup> | n/a     |
| Dual rank          | Empty              | Slot 1   | ODT off            | 40Ω               | ODT off | n/a               | n/a     |
| Empty              | Dual rank          | Slot 2   | ODT off            | n/a               | n/a     | 40Ω               | ODT off |
| Single rank        | Empty              | Slot 1   | ODT off            | 40Ω               | n/a     | n/a               | n/a     |
| Empty              | Single rank        | Slot 2   | ODT off            | n/a               | n/a     | 40Ω               | n/a     |



**On-Board DDR2/3** 

#### cādence<sup>™</sup>

#### SI Solution Space for Relational Topologies - Explore data w.r.t strobe; address w.r.t clock



cādence<sup>™</sup>

#### Timing Verification after SIannotation

 Re-verify timing after import of flight delays and jitter from SI simulations

| Name                     | Formula                               | Min   | Nom | Max   | Margin          | Comment                                                         |
|--------------------------|---------------------------------------|-------|-----|-------|-----------------|-----------------------------------------------------------------|
| JitterSpecifications     | []                                    |       |     |       |                 | Pick from controller data sheet                                 |
| tPLL_PSERR               | 30                                    | 30    | 30  | 30    |                 | Phase Shift Error (On 90 degree clock output for data)          |
| tPLL_Jitter              | 0                                     | 0     |     | 0     |                 | No effect on margin as the same PLL generate both write cloc    |
| tCLOCK_SKEW_ADDER        | 20                                    | 20    | 20  | 20    |                 | Clock skew b/w two dedicated clock networks                     |
| InterconnectJitter       | []                                    |       |     |       |                 | Interconnect jitter on etch                                     |
| vClkJit                  | \$PCBLib:vClkJit                      | 20    | 20  | 20    |                 | Variable for Interconnect Clock Jitter control                  |
| vStbJit                  | \$PCBLib:vStbJit                      | 70    | 70  | 70    |                 | Variable for Interconnect Strobe Jitter control                 |
| vDatJit                  | \$PCBLib:vDatJit                      | 80    | 80  | 80    |                 | Variable for Interconnect Data Jitter control                   |
| InterconnectJitterClock  | 0                                     | 0     | 0   | 0     |                 | Interconnect jitter on clock etch                               |
| InterconnectJitterStrobe | 0                                     | 0     | 0   | 0     |                 | Interconnect jitter on strobe etch                              |
| InterconnectJitterData   | 0                                     | 0     | 0   | 0     |                 | Interconnect jitter on data etch                                |
| PropagationDelay         | []                                    |       |     |       |                 | Estimate or take from SI [Nearest DIMM, Farthest DIMM]          |
| Etch_Delay_ClkCtrl       | \$PCBLib:Etch_Delay_ClkCtrl           | 545   | 545 | 545   |                 | Propagation delay b/w driver andd receiver ( Clock-to-Controlle |
| Etch_Delay_ClkMem        | <pre>\$PCBLib:Etch_Delay_ClkMem</pre> | 600   | 600 | 600   |                 | Propagation delay b/w driver andd receiver ( Clock-to-Memory    |
| Etch_Delay_DQS           | DriverDESIGN.CONTROLLER_STROBE.1      | 674.4 |     | 803.9 |                 | Propagation delay b/w driver and receiver ( Strobe )            |
| Etch_Delay_DQ            | DriverDESIGN.CONTROLLER_DATA.1:Re     | 660.6 |     | 787   |                 | Propagation delay b/w driver and receiver ( Data )              |
| Constraints              | []                                    |       |     |       |                 | Function of above                                               |
| tDQSS                    | \$MemLib:Bin:sub:tDQSS                | -630  |     | 630   | <1005.1,35.4>   | Strobe rising time relative to rising clock edge                |
| tDSS                     | \$MemLib:Bin:sub:tDSS                 | 500   |     |       | <1125.1,>       | Strobe falling edge setup time to rising clock edge             |
| tDSH                     | \$MemLib:Bin:sub:tDSH                 | 500   |     |       | <180.4,>        | Strobe falling edge hold time to rising clock edge              |
| tDIPW                    | \$MemLib:Bin:sub:tDIPW                | 600   |     |       | <408.6,>        | Pulse width of data                                             |
| tDS                      | \$MemLib:Bin:sub:tDS:150(der_su)      | 200   |     |       | <163.4,><163.4, | Data Setup Time                                                 |
| tDH                      | \$MemLib:Bin:sub:tDH(der_hl)          | 200   |     |       | <275.7,><275.7; | Data Hold Time                                                  |
| Ano ou                   | Mont in Doroting 160 DS-4-2.0         | 76    |     |       |                 | Solun Dentine Malues                                            |

cādence<sup>°</sup>

## Setting up Layout constraints depending on SI exploration



- Impedance
- Relative Propagation delays
- Max Parallel

| et Topology Lonstra    | ints      |              |        |          |          |         |
|------------------------|-----------|--------------|--------|----------|----------|---------|
| Max Parallel           | Viring    | User-Defined | Signa  | l Integ  | rity   1 | Jsage ( |
| Switch-Settle          | Prop Dela | av Impedance | Rel F  | rop Dela | ay Dif   | f Pair  |
| Differential V         | alues     |              |        |          |          |         |
| Primary Gap:           |           | 6.00 MIL     |        |          |          |         |
| Line Width:            |           | 5.00 MIL     |        |          |          |         |
| Neck Gap:              |           | 4.00 MIL     |        |          |          |         |
| Neck Width:            |           | 0.10 MIL     |        |          |          |         |
| Coupled Tolerance (+): |           | 0.10 MIL     |        |          |          |         |
| Coupled Tolerance (-): |           | 0.10 MIL     |        |          |          |         |
| Minimum Line           | Spacing:  | 11.81 MIL    |        |          |          |         |
| Gather Contro          | 51:       | Include 💌    |        |          |          |         |
| Max Uncouple           | d Length: | 400.00 HIL   |        |          |          |         |
| Static Phase           | Tol:      | 100.00 MIL   | 1      | Type: Le | ength 💌  |         |
| Dynamic Phase          | e Tol:    |              |        | Type: De | elay 💌   |         |
| Phase Max Le           | ngth:     |              |        |          |          |         |
|                        |           |              |        |          |          |         |
|                        |           |              |        |          |          |         |
|                        |           | -            |        | 1        |          | 1       |
| OK                     | Ap        | ply          | Cancel |          | Help     |         |
|                        |           |              |        |          | cā       | deno    |



## Post-layout verification and Timing



### Use-Case: Reverse-engineer a board



cādence"

## Use-Case: Correcting IC-PHY given board



models

cādence"



- Key Design Challenges
  - DDR3 Timing and SI specifications
- Problem Statement
  - Piecemeal simulations do not guarantee optimal design
- Solution
  - Static Timing Analysis in conjunction with IBIS simulations
- Use-cases
  - Step-by-step method to optimally use EDA flows
- Summary





- DDR3 compliance requires multiple specifications to be met, covering timing and signal-integrity measurements.
- Just using SI simulations to meet all specifications and explore solution space is difficult.
- Use of tools in a piecemeal approach can validate specifications but may not result in the most optimal IC/package/board design





- Use of STA in conjunction with SI simulations in a methodical manner is needed to achieve optimal design.
- Timing models should be able to handshake data with IBIS simulations at pre-route exploration and post-route verification stages to ensure that both SI and Timing constraints are met.



#### cādence<sup>®</sup>

