2013 Articles

I visited Convertech Japan 2013, another high tech trade show at Tokyo Big Sight. The three day event began on Jan. 30 and ended on Feb 1. The event was about 30% smaller compared to the InterNepcon electronics and manufacturing exhibition that was held at the same place two weeks ago. 

Read more: NanoTech at Convertech Japan

One of the truly fun diversions of the electronics manufacturing community has been the ongoing Friday Element Quiz on the IPC TechNet email listserv.

For nearly two years, a few clues have been posited to the TechNet members each week, who then try to guess the corresponding element. (No element was repeated.)

The quiz was the brainchild of Dave Hillman, an engineer at Rockwell Collins and one of the longtime contributors to the listserv. Each week, Hillman (with the help of a few reference manuals), poses a question to the group. For example:

This element has no biological role for humans. History shows that the mineral containing this element was encountered in silver mines in the Bohemia (Czech Republic) in the Middle Ages and was give a name that is the combination of the words "ill luck" and deceiver" because it was found to have no use. This element plays a significant role in industry today in several different industry segments and is more abundant that tin in the Earth's crust. What element is being described?*

For those not keeping track, the first winner was Lamar Young of SCS Coatings; the most recent was Hillman’s colleague Doug Pauls. Over the 96 weeks the quiz has run, there have several been several repeat winners. The leaders to date are Dr. Bev Christian of RIM, who has picked the correct answer eight times, nosing out Leland Woodall of CSTech, who has correctly named seven elements.

Given there are 112 elements, the FEQ should be winding down. By popular demand, however, Hillman has stocked up on new reference books and pledges to start over.

Let the good times roll.

 

*The answer is Uranium (U).

The cost of improving and maintaining reliability can be minimized by a model that quantifies the relationships between product cost-effectiveness and availability.

A repairable component (equipment, subsystem) is characterized by its availability, i.e., the ability of the item to perform its required function at or over a stated period of time. Availability can be defined also as the probability that the item (piece of equipment, system) is available to the user, when needed. A large and a complex system or a complicated piece of equipment that is supposed to be available to users for a long period of time (e.g., a switching system or a highly complex communication/transmission system, whose “end-to-end reliability,” including the performance of the software, is important), is characterized by an “operational availability.” This is defined as the probability that the system is available today and will be available to the user in the foreseeable future for the given period of time (see, e.g., Suhir1). High availability can be assured by the most effective combination of the adequate dependability (probability of non-failure) and repairability (probability that a failure, if any, is swiftly and effectively removed). Availability of a consumer product determines, to a great extent, customer satisfaction.

Intuitively, it is clear that the total reliability cost, defined as the sum of the cost for improving reliability and the cost of removing failures (repair), can be minimized, having in mind that the first cost category increases and the second cost category decreases with an increase in the reliability level (Figure 1)2. The objective of the analysis that follows is to quantify such an intuitively more or less obvious relationship and to show that the total cost of improving and maintaining reliability can be minimized.



Availability index. In the theory of reliability of repairable items, one can consider failures and restorations (repairs) as a flow of events that starts at random moments of time and lasts for random durations of time. Let us assume that failures are rare events, that the process of failures and restorations is characterized by a constant failure rate λ (steady-state portion of the bathtub curve), that the probability of occurrence of n failures during the time t follows the Poisson’s distribution
(1)

(see, e.g., Suhir1), that the restoration time t is an exponentially distributed random variable, so that its probability density distribution function is

(2)

where the intensity 

 

 

of the restoration process is reciprocal to the mean value of the process. The distribution (2) is particularly applicable when the restorations are carried out swiftly, and the number of restorations (repairs) reduces when their duration increases.

Let K(t) be the probability that the product is in the working condition, and k(t) is the probability that it is in the idle condition. When considering random processes with discrete states and continuous time, it is assumed that the transitions of the system S from the state si to the state sj are defined by transition probabilities λij. If the governing flow of events is of Poisson’s type, the random process is a Markovian process, and the probability of state pi(t) = P{S(t) = si,} i = 1,2...,n of such a process, i.e., the probability that the system S is in the state si at the moment of time t, can be found from the Kolmogorov’s equation (see, e.g., Suhir1)

(3)

Applying this equation to the processes (1) and (2), one can obtain the following equations for the probabilities K(t) and k(t):
(4)

The probability normalization condition requires that the relationship K(t) + k(t) =1 takes place for any moment of time. Then the probabilities K(t) and k(t) in the equations (4) can be separated:

(5)

These equations have the following solutions:
(6)

The constant C of integration is determined from the initial conditions, depending on whether the item is in the working or in the idle condition at the initial moment of time.  If it is in the working condition, the initial conditions K(0) = 1 and k(0) = 0 should be used, and

 

 

If the item is in the idle condition, the initial conditions K(0) = 0 and k(0) = 1 should be used, and

 



Thus, the availability function can be expressed as

(7)

if the item is in the working condition at the initial moment of time, and as

(8)

if the item is idle at the initial moment of time. The constant part

(9)

of the equations (7) and (8) is known as availability index. It determines the percentage of time, in which the item is in workable (available) condition. In the formula (9),

 

 

is the mean time to failure, and

 

 

is the mean time to repair. If the system consists of many items, the formula (9) can be generalized as follows:

(10)

Minimized reliability cost. Let us assume that the cost of achieving and improving reliability can be estimated based on an exponential formula
(11)
where R = MTTF is the reliability level, assessed, e.g., by the actual level of the MTTF; R0 is the specified MTTF value; CR(0)  is the cost of achieving the R0 level of reliability, and r is the factor of the reliability improvement cost. Similarly, let us assume that the cost of reliability restoration (repair) also can be assessed by an exponential formula
(12)
where CF(0) is the cost of restoring the product’s reliability, and f is the factor of the reliability restoration (repair) cost. The formula (12) reflects an assumption that the cost of repair is smaller for an item of higher reliability.

The total cost
(13)

has its minimum
(14)

when the minimization condition is fulfilled. Let us further assume that the factor r of the reliability improvement cost is inversely proportional to the MTTF, and the factor f of the reliability restoration cost is inversely proportional to the MTTR. Then the formula (14) yields
(16)
where the availability index K is expressed by the formula (9). This result establishes the relationship between the minimum total cost of achieving and maintaining (restoring) the adequate reliability level and the availability index. It quantifies the intuitively obvious fact that this cost depends on both the direct costs and the availability index. From (16) we have

(17)

This formula indicates that if the availability index is high, the ratio of the cost of repairs to the cost aimed at improved reliability is low. When the availability index is low, this ratio is high. Again, this intuitively obvious result is quantified by the obtained simple relationship. The formula (16) can be used, particularly, to interpret the availability index from the cost-effectiveness point of view; the index reflects the ratio of the cost of improving reliability to the minimum total cost of the item associated with its reliability level.

The relationship between the availability index and cost-effectiveness of the product is quantified, assuming that the cost of improving reliability over its specified level increases, and the restoration (repair) cost decreases, when reliability level (assessed in our analysis by the mean-time-to-failure) increases. It has been shown that the total cost of improving and maintaining reliability can be minimized, and that such a minimized cost is inversely proportional to the availability index. The developed model can be of help when there is a need to minimize costs without compromising reliability.

References

1. E. Suhir, Applied Probability for Engineers and Scientists, McGraw-Hill, New York, 1997.
2. E. Suhir, R. Mahajan, A. Lucero and L. Bechou, “Probabilistic Design for Reliability (PDfR) and a Novel Approach to Qualification Testing (QT),” IEEE/AIAA Aerospace Conference, March 2012.

Ephraim Suhir, Ph.D., is Distinguished Member of Technical Staff (retired), Bell Laboratories’ Physical Sciences and Engineering Research Division, and is a professor with the University of California, Santa Cruz, University of Maryland, and ERS Co.; This email address is being protected from spambots. You need JavaScript enabled to view it.. Laurent Bechou, PH.D., is a professor at the University of Bordeaux IMS Laboratory, Reliability Group.

The combination of complete model libraries, advanced tools and engineering expertise addresses modern buses and data rates.

Memory interfaces are challenging signal integrity engineers from the chip level to the package, to the board, and across multiple boards. As the latest DDR3 and DDR4 speeds support multi-gigabit parallel bus interfaces with voltage swings smaller than previous generation interfaces, there is no room for error in any modern memory interface design.

Designing a memory interface has always been about timing closure. Each data signal’s timing needs to be compared to its related strobe signal in such a way that the data can be captured on both the rising and falling edge of the strobe, hence the term double data rate (DDR). The increase in data rates to more than 2 Gbps has made the timing margin associated with each rising and falling edge much smaller (Figure 1).



However, today’s biggest challenge comes in accurately measuring timing while considering the fluctuations in power and ground rails that occur due to simultaneously switching signals. In the worst case, when all 64 bits of a data bus transition simultaneously, large instantaneous changes in current across the power distribution networks (PDNs) cause fluctuations in voltage levels that impact the timing margins of the transitioning signals. These signal switching variations are often called timing “push-out” or “pull-in.” If the time between data settling and the strobe transition is too much, meta-stable conditions can occur that would impact the data integrity (Figure 2).



Characterization of simultaneous switching noise (SSN) effects requires system-level transient analysis, including transmit and receive buffers and all interconnect in between. Unlike for SPICE, real circuits may not apply a global ground (node 0) and all signals are referenced to local power/ground pads. Therefore, not just the interconnect, but the associated PDN must also be included in this system model.

The system interconnect includes an on-chip path from the active silicon transmit buffers to external die pads, the package, a PCB and possibly a motherboard; the same components are on the receive side of the system. The on-chip portion of the system is typically modeled as a spatially distributed, lumped RC (more recently RLCK) SPICE circuit. Low-speed packages are represented by RLCK lumped models and higher frequency packages by S-parameters. PCBs are large enough that lumped element models rarely apply, and S-parameters are typically used. These non-lumped, broadband frequency domain models imply a difficult transient simulation even without the nonlinear buffers included.

Because most signal integrity (SI) software tools were created in an era when the timing effects of SSN could be ignored, many tools perform SI analysis assuming ideal power and ground rails. However, with the margins becoming so tight, assuming ideal power and ground could cause prototypes to fail or, worse yet, data integrity problems on production hardware in the field.

The trend in SI engineering is to analyze memory interfaces considering the effects of signal and non-ideal power/ground. This is now being referred to as “power-aware” SI analysis. Modeling of I/O buffers can now follow an updated IBIS standard (IBIS 5.0+) where power-aware IBIS models permit SI tools to consider the parasitics of the power and ground connections as well as the signals.

Here, we discuss the I/O modeling, interconnect modeling, simulation, and analysis challenges associated with power-aware SI of today’s high-speed memory interfaces and how modern tools can be used to address these challenges.

Power-Aware I/O Modeling

Transmit and receive buffers are critical intellectual property to both fabs and fabless design companies. They are either extracted at a detailed netlist level by cell characterization software or carefully crafted manually by I/O designers. These models are then encrypted and distributed only under strict nondisclosure agreements. Each individual buffer includes many transistors. These buffer circuits suffer from slow convergence during SPICE simulation even with ideal lumped loads.

Full-bus SSN characterization requires hundreds, in some cases literally thousands, of transistors combined with broadband frequency domain models. Such simulations are extremely resource-intensive and sensitive to SPICE convergence issues. Typical simulation times are measured in days and memory consumption in double or even triple digit gigabytes when performed on high-performance computer platforms.

IBIS buffer macromodels are commonly applied for system-level SI simulations instead of transistor-level netlists. Simulation time, memory consumption, and convergence issues are all dramatically reduced versus transistor-level simulation. However, in the past it has been well known that IBIS models are not amenable to SSN simulations because 4.2 and previous versions did nothing to ensure proper power/ground buffer currents.

IBIS 5.0 was enhanced to address this situation. Updates called BIRD-95 and BIRD-98 were added to the specification to model power currents and their fluctuations with respect to PDN voltage noise. Together, these two updates provide an accurate modeling of buffer power currents and enable IBIS 5.0-compliant models to be applied for full-bus SSN characterization (Figure 3).



Not all SI software presently supports application of IBIS 5.0 buffer models for power-aware SI analysis, but it is becoming more common. Commercial solutions are now available to support conversion of transistor-level buffer models to IBIS 5.0 behavioral macromodels (Figure 4).



These are easily applied by fabs, fabless design companies with their own IP, and even designers who may be provided only transistor-level models. Semiconductor vendors are moving toward providing IBIS 5.0 models. If such models are not available from their website, they may well be available for internal application or distribution to designers under nondisclosure. IBIS 5.0 buffer models address IP sensitivity issues that exist for transistor-level netlists (even encrypted netlists) and eliminate the need to disclose process parameters.

Interconnect and PDN Modeling

One of the key challenges in enabling a power-aware SI methodology is extraction and modeling of the interconnect, for example PCBs. Historically, this has been done by extracting transmission line models (e.g., SPICE “W” elements) for signal traces, while assuming they are routed adjacent to an infinite, solid reference plane. Signal vias are often modeled using a fast closed-form approach as isolated, uncoupled objects with only self-parasitics (i.e. ideal return paths). This kind of technique is very convenient mathematically, as it enables extractions that are relatively inexpensive from a computational standpoint. However, this approach completely ignores the power delivery network (PDN), forcing an undesirable “ideal power” assumption upon the simulation and masking any PDN effects from the simulation results.

Incorporating the PDN into the extraction process is a significant challenge. This involves the extraction of the copper shapes that typically comprise the power and ground planes, as well as vias that run through them, along with the coupling to the signal traces. These vias essentially act as radial transmission lines that excite the parallel plate plane structures, perturb the power supplied to the chips, and couple noise back onto the signals as well.

 

 

 

 

 

 

 

 

 


Decoupling capacitors must also be modeled and incorporated into the extraction, as do models for the voltage regulator module (VRM), which is where power is brought into the PCB from the external world. Once the extraction problem expands from “signals and vias” to “signals, planes and vias,” the simple transmission line extraction techniques historically used are no longer applicable, and the problem requires some kind of full-wave-based solution.

Traditional full-wave field-solvers address the full set of Maxwell’s equations, with no (computationally) simplifying assumptions. Full-wave engines are certainly able to handle all the structures discussed previously, but come at a major computational cost. From a practical standpoint in a typical design schedule, it may only be possible to extract a few signals and some small portion of the PDN using purely full-wave techniques. While this may be quite accurate for this small portion, it does not enable modeling on the scale desired for the power-aware SI problem. What is generally desired is to include a significant number of bus signals, for example 16 or 32 of them, to include the cumulative effects of simultaneously switching outputs (SSOs).

The entire PDN for the bus needs to be extracted as well, including the power and ground planes from the stack-up, and the associated decoupling caps. To provide extraction and modeling on this scale, a different approach must be taken.

Available technology attacks this daunting problem in a unique manner. Using a patented “hybrid solver” technique, the layout is decomposed into traces, vias, planes and circuits (e.g., for decoupling cap models). These elements are sent off to specifically tuned solvers optimized for these structures, and their results are integrated back together into comprehensive S-parameters. This technique provides nearly full-wave accuracy, while at the same time enables very large-scale problems to be handled in a reasonable amount of time. These S-parameters can be simulated directly in the time domain, or optionally converted into Broadband SPICE models, providing even better time-domain simulation performance (Figure 5).



Simulation Environment

The challenges with regard to simulation environment in the context of power-aware SI fall into several categories: bus modeling, the time-domain simulation itself, and post-processing of results. The bus modeling challenge comes once the models are available for the I/Os, decoupling caps, and passive interconnect. At that point, it becomes necessary to build out or represent the entire die-to-die topology of the bus of interest for simulation. It is generally convenient to do this in a schematic-like environment, where the bus is easily visualized across chip, package, and board fabrics.

There are a few enabling features of a schematic-like environment that facilitate analysis. One of these is hierarchical connectivity, which is in contrast to the “wire-by-wire” connectivity found in traditional schematic-based tools. Wire-by-wire connectivity, in which each individual wire is shown from terminal to terminal in the schematic, works fine with smaller topologies. But as you look to model large groups of coupled signals, together with multiple power and ground connections in each model, this approach quickly becomes impractical. In a hierarchical connectivity approach, only a single connection is shown between models, with the explicit wiring details available one level below. This enables significantly large bus topologies to be easily constructed for analysis (Figure 6).



With regard to the simulation itself, it appears this would be straightforward, but there are still some things to consider. SI tools have historically broken down a bus-level problem into multiple piecemeal parts, such as running uncoupled single-line simulations on individual signals to gather delay data, then running other partially coupled subsets of the bus to gather some coupling-related effects, and then trying to combine the results together afterward. (SSO is typically ignored altogether.) This kind of divide-and-conquer approach worked well when the margins were relatively large, but the margins on a modern 1.6 Gbps DDR3 data bus are substantially different from those of the 333 Mbps DDR data buses of yesteryear, when those techniques were commonly deployed.

In hardware, reflections and inter-symbol interference (ISI) do not occur independently of crosstalk or SSO. These effects all happen together, where each affects the other. They cannot be cleanly separated. The simulation needs to much more closely emulate the behavior of the hardware, which boils down to essentially running the entire bus structure in one large simulation. In this manner, all the interplay and interactions between these major effects are captured in the results. The other benefit of this approach is that raw setup and hold measurements can be taken directly, the same way one would measure it in the lab with an oscilloscope.

Post-Processing and Analysis of Results

Once the simulation results are available, the next challenge is to automate the post-processing of the raw waveforms in order to take measurements, generate reports, and close timing. A multitude of measurements are called out per the latest Jedec specifications for DDR memory interfaces. To do this comprehensively, measurements must be taken for each signal, on every cycle. This produces a tremendous amount of data very quickly, so plots of the data are very useful in evaluating the design, as opposed to just generating spreadsheets with many, many rows.

Another key aspect of the post-processing is to automate the derating of setup and hold times (Figure 7). Per Jedec specifications, the slew rates of the signals determine how much more or less setup and hold time is required at the memory, on top of the base setup and hold requirements. What this means for the case of a data bus is that the slew rates of the data and strobe signals need to be automatically measured at each cycle. Then from those two pieces of data, a lookup table provides the incremental setup and hold delta that applies for that cycle, and a final setup and hold margin can be determined, again for that cycle. This needs to be repeated on each cycle for all signals. Again, the amount of data accumulates very quickly, so automation is critical.



To handle the large quantity of data and close timing, the automated reporting needs to post-process the data and provide intelligent summaries to show critical results, such as:

  • Positive setup and hold margins for address/command and control buses, and for data bus “write” transactions at the memory devices.
  • Skew requirements are met at the controller for data bus “read” transactions.
  • Strobe-to-clock skew requirements are met.

Summary

Moving from an ideal power assumption to a power-aware SI methodology requires some upgrades to modeling and simulation techniques, and is required for adequate SSN characterization of modern memory systems.

A key enabler is power-aware I/O modeling, allowing SSN simulations in minutes on a laptop instead of days on a large server. From the release of IBIS 5.0 and onward, there is an industry-standard way in which this can be done, and tools are available to automate the generation of these models from transistor-level netlists. Demand by systems engineers will quickly drive the broader availability of these I/O models from component suppliers.

Tools to perform efficient interconnect and PDN extraction have been available in the market for a number of years, and are becoming increasingly mainstream for SI applications, as the number of DDR3 and DDR4 design starts increase over time. Simulation environments also require advancement to handle complex bus topologies, comprehensive simulation, and highly automated post-processing to analyze today’s challenging interfaces.

[Ed.: To enlarge the figure, right-click on it, then click View Image, then left-click on the figure.)

Ken Willis is product engineering director, and Brad Brim is senior staff product engineer – High Speed Analysis Products at Cadence Design Systems (cadence.com); This email address is being protected from spambots. You need JavaScript enabled to view it..

Operating mostly under the radar -- although not necessarily by design -- the High Density Packaging Users Group in now well into its second decade of pushing electronics manufacturing technology forward.

Executive director Marshall Andrews is no stranger to consortia work: He spent about 12 years at MCC, the US-based computer consortium, and later was the founding CEO of ITRI, a consortium for promoting and enhancing printed circuit board fabricators and technology. He spoke with editor in chief Mike Buetow this week.

Read more: Quietly, HDPUG Gaining Members, Performing Research

IPC-2221A, as most designers know, was released in 2003. Since that time, lead-free has gone from a niche technology to a mainstream one, and its added a generous dose of complexity to the design decision tree.

Read more: Updating the Design Standard

Page 10 of 11