Image

A look at common-clock bus timing concepts.

Ed.: Parts 1 and 2 were published in the April 2006 and June 2006 issues. All three parts parts are included here.

Part 1

In high-speed design, the need frequently arises for board-level (static) timing computations, in order to maximize system performance (margins) and to avoid failures. Signal integrity analyses often involve evaluation of both signal quality and timing margins. However, signal quality degradations - such as overshoot, ringback, etc. - can be tolerated in many cases provided they do not adversely affect timing. As an example, for PCI/PCIX bus a large amount of ringing is acceptable if it does not cause timing violations.

This three-part series will address numerous high-speed timing concepts. Part 1 will discuss timing diagrams and parameters.

Timing diagrams (voltage vs. time) are graphical representation of circuit behavior over time, and can aid system analyses.

Figure 1 diagrams two signal types commonly encountered in digital timing analyses.

Figure 1
FIGURE 1. Diagram of a pulse (a), and a data signal (b).

Figure 1a illustrates pulse-possessing width (Pw), rise time (Tr), fall time (Tf), Period T, low voltage (Lo) and high voltage (Hi). The signal initiates Lo to Hi transitions at times t1 and t3, and Hi to Lo at t2. For a pulse train, T is reciprocal of frequency F and its duty cycle (defined as time fraction occupied by the pulse1 equals the product of frequency and pulse width. Here, the rise/fall times are defined as times from points 0% to 100%. However, Tr and Tf may be also measured2 at 10% to 90%, or 20% to 80% points. Figure 1b shows a data pulse of duration Tw with state transitions occurring at the crossing points (i.e., t1 and t2), which may be Lo to Hi or vice versa.

Figure 2 is a functional timing diagram demonstrating causal relations (by curved lines) leading from input to resultant output transitions. This figure shows switching as instantaneous (vertical lines), although in reality non-zero times are required for such transitions.

Figure 2
FIGURE 2. An illustration of causality.

Timing analyses of a high-speed net usually necessitates taking into consideration driver and receiver chips timing specifications. These IC specs are of two types: timing requirements and guaranteed responses. Among timing requirements (constraints) are setup time, hold time and pulse width. A typical guaranteed response is a chip's propagation delay. Constraints normally have either a maximum or minimum (but not both) unlike delays, which almost always contain both minimum and maximum values.

Several timing relationships are displayed by Figure 3. In the shaded regions of Figure 3a data can change, but within the remaining interval it must maintain stability. The data setup (Ts) and hold (Th) times are also defined. Figure 3b depicts in-out signals plus associated propagation delay (maximum delay between input and output signal changes) timing specification.

Figure 3
FIGURE 3. Timing relations between: (a) a data "DQ" and clock "CLK," and (b) an input and output signal.

Timing parameters and reference test loads are normally obtainable from AC (dynamic) specifications of device's datasheet3, as exemplified by Figure 4.

Figure 4
FIGURE 4. Section of AC timing table of a high-speed IC (a), and its timing reference load (b).

Depicted by Figure 4a is an AC timing table for an Infineon Technologies DDR SDRAM4. Figure 4b shows the AC output load circuit diagram (timing test load) for that chip.

Among important timing parameters are clock to out Tco (equals time from clock rise3 to measurement voltage Vmeas into test load), set-up/hold requirements and jitter.

In order to calculate timing budgets, it is frequently necessary to determine propagation time via simulation as demonstrated by Figure 5. A free version of Cadence Design System's PSpice was utilized for this simulation.

A value for C_load was ascertained assuming that receiver is an Infineon Technologies DDR SDRAM (32Mbx4), part number HYB25D128400C[C/E/T] in a 66-pin P-TSOPII (plastic thin small outline package Type II). Its associated IBIS model (128m_d11.ibs) indicates typical C_pkg = 0.434 pF, and nominal C_comp = 4.1 pF (for IO_FULL buffer). Subsequently, C_load, which is the sum5 of C_pkg and C_comp is typically 4.534 pF as used in this simulation.

The driver U1 is a pulse of 2.5 V amplitude, having parameters (as defined by Figure 1a) of Tr = Tf = 0.5 ns, Pw = 4.5 ns, and T= 10 ns. They are related via:

Pw = (T/2) - (Tr + Tf)/2

Figure 5b shows the waveform results at driver (in red) and receiver (blue). The rising edge propagation delay is measured from midpoint of a driver rising step (t = 10.24 ns, v = 0.8 V) to midpoint of corresponding receiver rising step (t = 11.12 ns, v = 1.25 V) as indicated by green marker line yielding a delay of ~ 0.88 nsec. Similarly, the falling edge propagation delay obtained from midpoint of driver falling edge (15.25 nsec, 1.7V) to midpoint of receiver falling edge (16.12 nsec, 1.25 V) is ~ 0.87 nsec.

Figure 5
FIGURE 5. A topology for determining propagation time of a transmission line (a), and simulation waveforms at driver (red) and receiver in blue (b).

This delay exceeds the line's TD of 0.7 nsec because of the loading effects of receiver U2. It is interesting to note that at v = 0 V there is negligible effect from receiver loading and the delay between driver and receiver is ~ 0.7 nsec. However, when signal transitions occur the receiver loading effects on delay become evident, as indicated by ~ 0 .88 nsec.

Part 2

The central focus of Part 2 will be the common-clock timing scheme. In this approach (Figure 6), the bus driver and receiver ICs share the same clock.6

Figure 6
FIGURE 6. Common-clock bus topology block diagram.

The setup and hold equations for a common-clock interface are defined by Table 1 [PDF format].

It is assumed that effects due to crystal frequency variation (typically 100 ppm or less) of the clock generator on Tcyc is negligible. Implied by Tsu,min and Thld,min is that the receiver's setup and hold are usually minimum timing requirements with no associated maximums Tsu,max or Thld,max.

In one clock cycle, several events must occur. The clock period must be budgeted to various operations7, such as gate switching, signal propagation times, etc. In order for data to be properly latched in, the receiver's setup and hold requirements must not be violated.

An example of a common-clock bus is provided by PCI-X address/data lines. The parameters for setup and hold times are obtainable from Tables 9-11 and 9-12 of PCI-X bus specifications.8

For 133 MHz operation, the setup margin parameters include: Tco_dq,max = 3.8 ns, Tclk_skw,max = 0.5 ns (this includes Tjtr), Tsu,min = 1.2 ns, Tcyc = 7.5 ns.

Utilizing Equation 1 of Table 1 yields:

Tsu_mrg = 2.0 ns - Tfltdq,max.

Subsequently, to produce a non-negative setup margin, the flight times should not exceed 2.0 ns.

Also, the hold margin parameters8 for 133 MHZ PCI-X are:

Tco_dq,min = 0.7ns, Tclk_skw,max = 0.5ns, Thld,min = 0.5 ns.

From Equation 2 of Table 1:

Thld_mrg = -0.3 + Tflt_dq

This implies that the flight time should be larger than 0.3 ns to avoid hold timing violations. The PCI-X example indicates that bus specifications can be also a good source for timing parameters (and test loads), in addition to the AC section of device data sheets. See Part 1 of this column.

Timing specs furnished by manufacturer data sheets are based on test loads that usually differ from system interconnect loading. Therefore, propagation (or flight) times ascertained through system simulations require certain adjustments before insertion in timing margin computations.

It is necessary to simulate using the reference load in addition to system loading9, as implied by Figure 7. This can eliminate the double-counting portion of buffer delay3, compensate for system loading effects and produce accurate timing data.

Figure 7
FIGURE 7. Simulation topology applying: (a) reference test load; and (b) system loading.

In Figure 7a, Cref and Ref belong to a test load used by semiconductor vendors when specifying propagation delay10 and/or the output switching time of the device. Vref is the timing specification test load voltage.

In Figure 7b, R_pkg1, L_Pkg1 and C_pkg represent the packaging resistance, inductance and capacitance10 of the driver pin. Similarly, R_pkg2, L_pkg2, and C_pkg2 define package parasitics of receiver pin.

In Part 1 (Figure 5a), the driver was modeled as a pulse source in series with a resistor (representing driver output impedance), and the receiver was modeled as a capacitor.

Figure 7 reveals that to produce more accurate simulations, the actual buffer models should be utilized for driver and receiver devices and the package parasitic elements need to be accounted for. Furthermore, simulation of Figure 5 considered only a typical corner.

More complete data can be produced by also analyzing the fast and slow corners (best and worst cases) in order to verify the design under a full range of manufacturing processes. This necessitates varying the various PCB and buffer modeling parameters such as trace impedances, signal velocity, buffer strengths, etc., within tolerances dictated by fabrication processing variations.

In static timing analysis11, signal paths are ascertained by tracing the design connections and summing the worst (or best) case delays. This approach can account for delay relations between clock and data signals, and detect several types of timing violations including setup and hold, period and duty cycle, race conditions and skew checks. However, static timing analysis differs from functional debug and does not consider functional behavior of the circuit.

Figure 8 illustrates that for common clock scheme, skew can influence both setup (Ts) and hold (Th) margins but jitter affects only the setup. The CLK@Receiver and CLK@Driver are clock signals probed at data driver and receiver pins. The clock edges (the upper left side) utilized for skew (Tskw) measurements are regarded as reference. The DQ@Driver and DQ@Receiver are data signals at driver and receiver ICs.

Figure 8
FIGURE 8. Effects of clock skew and jitter on timing margins.

As further example of jitter and skew effects, consider a common clock bus operating at 133 MHz (minimum clock period = 7.5 ns).

Allow maximum skew of 225 ps and edge-to-edge jitter 175 ps.

The minimum effective period (= minimum period - maximum jitter - maximum skew) is

(7.5 ns - 0.175 ns - 0.225 ns) = 7.1 ns.

Subsequently, the maximum allowed delay for inner chip (silicon) and interconnect is 7.1 ns.

Based on timing relations between a data and clock signals, one way to enhance setup margin is to shorten the data trace (or lengthen clock trace) at the price of lowering the hold margin. Conversely, hold margin can be improved by lengthening the data line or shortening the clock line (but at the expense of the setup margin). An alternative way to increase setup margin is by increasing the clock period (lowering operation frequency); this will not affect the hold margin12, which is independent of clock frequency.

Common clock timing techniques have certain limitations. The minimum cycle time (which defines the highest frequency) is limited by the maximum delays. Consequently, there is a dependence on absolute delays. The common clock timing techniques are limited to medium speed (i.e., frequencies below ~ 200 to 300 MHz) buses.6

Therefore, an alternative approach such as source synchronous signaling is required to achieve higher operational frequencies.

Part 3

For successful high-speed bus design, it is important to understand the timing analysis methodologies (at system level) applicable to various signaling schemes. It has been justly stated13: "Ninety percent of signal integrity problems are timing problems."

Part 2 (PCD&M, June 2006) of this series treated common-clock timing concepts. This part is devoted to source synchronous design offering higher frequency/speed capabilities. Depicted by Figure 9, a strobe (clock) is transmitted from driver IC instead of a separate clock chip. Here DQ, DQS and BCLK represent data, strobe and bus clock respectively. First, a DQ bit is transmitted and a short delay later a DQS is sent to latch the data into the receiver IC.

Figure 9
FIGURE 9. Source synchronous bus topology block diagram.

The timing path initiates at the driver's flip-flop and ends at the receiver's flip-flop. DQS serves as clock input for the receiver's flip-flop. The driver transmits DQS and DQ with a defined phase relationship. Normally, DQS is phase shifted by half cycle from DQ signal. This shifting is sometimes achieved in the receiver. Frequently (but not always), there is one (or two) strobe(s) per byte of data signals. A central clock is not essential for controlling the driver/receiver signal flow.

The setup and hold equations for a source synchronous interface are defined in Table 2 [PDF format] . To ensure proper source synchronous operation, the strobe transmission must be timed to meet setup and hold requirements of receiver (latch). Figure 10 presents the source synchronous setup/hold timing diagrams.

Figure 10
FIGURE 10. Source synchronous bus timing diagrams.

One source synchronous example is the DDR (dual data rate) memory bus, where data are sampled on both the rising and falling edges of the strobe. DDR is an excellent choice for high-speed interconnects14 as the clock bandwidth is halved for a given data rate. The DDR-1 (Double Data Rate, version 1) can operate15 at 2.5 V (typical) and data rates approaching 400 Mb/s. DDR-2 can function at 1.8 V (nominal) and up to 800 Mb/s. Figure 11 exemplifies a DDR-1 topology suitable for DQ or DQS signals.

Figure 11
FIGURE 11. DQ/DQS topology for a DDR-1 memory (source synchronous) bus.

It is advantageous for the strobe and data topologies to be identical to minimize skew6 and preserve DQ-DQS phase relationship. The traces can then be quite long limited only by losses, latencies, etc.

The driver/receiver buffer strengths are defined in Figure 11. The typical Z0 and Tpd (reciprocal of velocity) values for each line segment (TL1, TL2, TL3, TL4) are also given. The SDRAMs (which receive during write and drive during read cycles) are on the DIMM modules, which plug into J1 and J2. As explained in Part 2, it is frequently insufficient to consider only the typical corner; the fast and slow cases need to be also analyzed. Let us outline the combination of parameters involved for fast, nominal and slow corner simulations.

Fast Corner: Driver / Receiver: strong, Z0 =55 Ω, Pd = 1.8 ns/ft (fastest velocity), Rs = 11.88 Ω Rp = 22.22 Ω

Typ Corner: Driver / Receiver: Nominal, Z0 = 50 Ω, Pd = 2.0 ns/ft, Rs = 12, Rp = 22.

Slow Corner: Driver / Receiver: weak, Z0 = 45 Ω, Pd = 2.2 ns/ft (slowest velocity), Rs = 12.12 Ω, Rp = 21.78 Ω

The package model (Pkg_u1) may be lumped (consisting of R_pkg, C_pkg, L_pkg) as in Figure 7 (Part 2). However, at higher speeds (fast rise times) it is preferable to use a distributed model represented by a piece of transmission line with Z0 = sqrt(L_pkg / C_pkg) and delay TD = sqrt(L_pkg * C_pkg), or S-parameter model (well suited for GHz speed designs).

When using a lumped package model, the largest C_pkg (provides highest loading) is applied in slow corner simulations. When utilizing a distributed package model, the smallest Z0 and longest TD are employed for slow corner runs. The opposite applies to fast corner simulations. Also, S-parameter package models can be generated based on the longest, nominal or shortest IC's package traces; and thereby designed optimal for slow, typical or fast corner analyses.

Note that for fast corner the smallest Rs and largest Rp (as dictated by each resistors' nominal value and tolerance) are utilized. The opposite is valid for slow corner. Furthermore, for fast and slow corners, the rising and falling flight time measurements are conducted at receiver's Vinh and Vinl (input logic "high" and "low" DC voltages) rather than at mid-point of the rising/falling edges, shown in typical analysis of Figure 5b (Part 1).

One reason that source synchronous is advantageous over common-clock bus is that its performance depends on relative, rather than absolute delays (however, the receivers' setup and hold timing requirements must still be fulfilled). In practical systems, smaller delay differences between signals can be achieved as compared to absolute delays. Subsequently, source synchronous signaling permits longer traces and superior performance.

At very fast edge rates, effects due to reflection (overshoot, ringback, etc.) and crosstalk can adversely affect timing; hence, special measures may be needed for margin improvement. DDR-2 can operate faster than DDR-1 (reaching 800 Mb/s) and includes ODT (on die termination) feature. ODT is a dynamic termination incorporated into the SDRAM and memory controller, and can be enabled or disabled depending on write/read modes and addressing conditions16. DDR-2 designs can also demand slew rate derating16 (an advanced concept for timing enhancement).

At GHz frequencies, multi-drop buses become prohibitive and topology of choice is point-to-point (as in PCI Express serial links) avoiding stubs.

Eye diagrams provide a preferred means for timing analysis of high-speed differential serial links (with embedded clocks). Eye diagrams can be also applied6 for setup/hold determination of source synchronous signals, centering the clock transitions in the middle of the data eye13 and for deriving timing equations.   PCD&M

Abe (Abbas) Riazi is a senior staff electronic design scientist with ServerWorks (a Broadcom company) in Santa Clara, CA. He can be reached at This email address is being protected from spambots. You need JavaScript enabled to view it..

ACKNOWLEDGEMENTS

Thanks to Peter Arnold, Clement Yuen, Richard Kuo, Jeremy Plunkett and Dean Gonzales of SeverWorks and Oliver Kiehl of Infineon Technologies.

REFERENCES

1. J.A. Coekin, "High-Speed Pulse Techniques," Pergamon Press, 1975, P. 3.
2. William R. Blood, Jr., "MECL System Design Handbook," 4th edition, Motorola Inc., 1988, P. 19.
3. Todd Westerhoff, "Closing the Loop Between Timing Analysis and Signal Integrity," Cadence Online Seminar, August 28, 2000.
4. Infineon Technologies HYB25D128 [400/800/160]C[C/E/T] (L), 128-Mbit Double-Data-Rate SDRAM, Data Sheet, Rev. 1.4, Nov. 2005, P. 18, P. 68, P. 73, P. 94.
5. Abe Riazi, "Stub or No Stub?," Printed Circuit Design and Manufacture, October 2004, P. 18.
6. Stephen H. Hall, Garrett W. Hall, James A. McCall, "High-Speed Digital Systems Design, A Handbook of Interconnect Theory and Design Practices," John Wiley and Sons Inc. 2000, PP. 178-193.
7. Eric Bogatin, "Signal Integrity - Simplified," Prentice Hall, 2004, P. 3 8. "PCI-X Addendum to the PCI Local Bus Specification" Revision 1.0, June 17, 1999, PP. 186-187.
9. Lynne Green, "Timing Correction for Flight Time Compensation," Application Note
10. "IBIS (I/O Buffer Information Specification)" Version 4.1, January 30, 2004, P. 12, PP. 26-28, P. 48.
11. Bruno A. Messina, "Timing Your PCB Design," Printed Circuit Design, May 1998, PP. 31-34.
12. Bob Kirstein, "Practical timing analysis for 100-MHz digital designs", EDN, August 2002, PP. 95-104.
13. Jim Peterson, "Timing Numbers from ICX-What Do We Do With Them?" Mentor Graphics International User Conference, May 2-5, 2006.
14. Brian Young, Digital Signal Integrity: Modeling and Simulation with Interconnects and Packages, Prentice Hall, 2000, p. 121.
15. Lee W. Ritchey, Right the First Time: A Practical Handbook on High Speed PCB and System Design, vol. 1, Speeding Edge, 2003, p. 122.
16. Steve Mckinney, "Successful DDR2 Design," Memory Interfaces Solution Guide, March 2006, pp. 9-13.

Submit to FacebookSubmit to Google PlusSubmit to TwitterSubmit to LinkedInPrint Article