Reliability is as much a key to success in the microelectronics
industry as is performance. Not only must a product perform as
desired, it must also work for an extended period of time without
fail, typically 10 years or more. It does little good to make the
world's fastest microprocessor, if after two weeks of operation it
fails. Except for very few applications, such as missile guidance
systems that only operate for a few seconds, anything other than
superb long-term reliability would be unacceptable.
With the complexity of today's microelectronics, a phenomenal
level of reliability must be maintained. For instance, if the
probability of failure for a transistor is one in a million, and
you have a million transistors, failure is very near certainty. And
yet, a modern IC can have more than 10 million circuit elements.
Therefore, for any acceptable reliability on the chip level,
today's circuit elements must be among the most reliable things
ever built. In addition, reliability must continue to increase as
the complexity increases.
The reliability we have enjoyed thus far has not come without
considerable cost. Billions of dollars and the equivalent in Yen,
Francs, Deutschmarks, and so on have been expended to solve the
daunting problems facing reliability engineers designing integrated
circuits. The few wear-out failure mechanisms that exist (hot
carrier, time-dependent dielectric breakdown, and electromigration)
have become understood well enough that we can incorporate them
into design tools.
We know the limitations to apply in order to delay any wear-out
issues to long past the useful life. However, to apply the
limitations effectively, one must understand the limitations of the
materials used to manufacture ICs, and work around them.
Overestimating the capabilities of the materials and the process
could spell disaster, and underestimating them could limit designs
so severely that nothing of commercial interest could be made.
Striking a balance between conservatism and judicious use of the
process capabilities is necessary for continuous advancements.
ICs must work rather hard. High currents, high temperatures, and
many thermal cycles eventually take their toll. Just as any
mechanical device, like an old car, boat or airplane, eventually
fails from repeated exposure to the everyday stress of operation,
electrical stresses cause similar problems in electronic
components. Two types of reliability issues plague the industry:
defect-related problems and wear-out. Defect-related problems are
caused by manufacturing defects, such as a missing process step,
dirt, or other unavoidable calamities. Even the best, most
efficient process lines suffer from an occasional defect related
problem. Wear-out is due to the circuit or the product just wearing
out, without any initial defects being present.
Although redundancy and insensitivity to a failure mechanism may
be up to designers, defects are in the realm of the process
engineer. Improved processes and statistical process control
efforts often reduce such failures to a minimum. Wear-out, on the
other hand, which occurs due to limitations in the "perfect"
material, is a problem that lies squarely with the designer. One of
the principal wear-out failure mechanisms is electromigration.
Fortunately, although not completely understood in all its
subtleties, it is controllable by proper design and a firm
appreciation of where one can get into trouble.
Electromigration History
Electromigration is the mass transport of a metal due to the
momentum transfer between conducting electrons and diffusing metal
atoms. Discovered more than 100 years ago, it became a concern only
when the relatively severe conditions necessary for operation of
integrated circuits made it painfully visible. Although
electromigration, in principle, exists whenever current flows
through a metal wire, the conditions necessary for electromigration
to be a problem simply did not exist back then. In bulk wires, such
as those used for home circuitry, the maximum current density is to
about 10,000 A/cm2 due to Joule heating. Any current
density even modestly exceeding this value will produce enough heat
to melt a metal wire; however, the driving force from electrons
colliding into diffusing metal atoms would be insufficient to make
electromigration a significant problem. Only a research scientist
would pass enough current through a bulk metal wire to observe the
effects of electromigration, and only with great experimental
difficulty. Therefore, for at least 100 years, electromigration was
an interesting problem in solid state physics, fascinating grist
for the research mills at universities, but of no interest
whatsoever commercially.
All of this changed in 1966 when the IC made its commercial
appearance. Electromigration was rediscovered by a much larger
audience, and with a vengeance. In ICs, electricity is conducted
via thin film stripes that are in direct contact with an effective
heat sink. Because most of the heat generated by the current is
conducted away into the chip, thin film conductors can withstand
current densities at least two orders of magnitude greater than
traditional bulk wires. This allows current densities of nearly
106 A/cm2 with minimal Joule heating. At
these current densities electromigration becomes significant.
The first ICs were constructed with metal lines that were 10 mm
in width or more—wide by today's standards. At the same time
they were exceedingly thin, on the order of 3000A. Furthermore, the
conductors were made of pure Aluminum, a material with a low
melting temperature, which implies fast diffusion at low
temperatures. Very thin film contains small grains and thus many
grain boundaries that are conduits for even more rapid diffusion.
This combination of high current density and fast diffusion at low
temperatures was a recipe for disaster.
 |
| |
"A billion here and a billion there, pretty
soon……."
At IBM it was estimated that close to a billion 1966 dollars were
spent in the effort to understand and fix the problem of
electromigration failure. This was when a billion dollars was a lot
of money.
|
|
 |
ICs were supposed to be very reliable and great hope was placed in
their use. When the first ICs were placed into service, they failed
within weeks. The shock to the industry was tremendous. IC
manufacturers were in a panic to understand why they failed.
When parts returned from the field were subsequently examined,
there was nothing visible, even under a microscope. A relatively
new research tool, the scanning electron microscope, was used and
failure sites were identified. The open circuits were very fine
"cracks" in the metal, sometimes only a few hundred angstroms wide.
When the culprit was identified, the immediate fix was simple: make
the metal thicker. Easy with 10 mm wide lines, but not so easy
today.
Since then, electromigration has not gone away, but it has come
under control. The first solution was to make the metal conductors
more resistant to electromigration by alloying the Al with Copper
(Cu), initially up to 4%. This has changed due to processing
considerations but today generally 0.5% Cu is still alloyed with
Al. The addition of Cu, of course, had a deleterious effect on the
resistivity and low resistance was available only by using
relatively thick metal, 1.0 mm or so. Today, fine pitch circuits
cannot tolerate such thick metal, and other schemes are used to
insure reliability.
The Physics of Electromigration
Electromigration is due to the momentum exchange between
conducting electrons and diffusing metal atoms. Simply stated,
perhaps, but how does it happen?
 |
| |
Designers Beware.
Many reliability engineers working in electromigration define the
current exactly opposite to the way you do. To them current is
electron flow and positive current flow is in the direction the
electrons are traveling.
Ben Franklin had a 50-50 chance of getting it right.
|
|
 |
In a perfect lattice, there is no resistance. Electrons move about
in a periodic potential with no other interaction with the metal
atoms. This may sound like superconductivity, but it isn't. The
problem here is that a perfect lattice cannot exist above absolute
zero due to missing atoms ("vacancies"), impurities, boundaries
between crystals of different orientation ("grain boundaries"), and
regions of imperfection ("dislocations"). Perhaps even more
important, at any temperature above 0ºK, atomic vibrations
occur. These vibrations ("phonons") put a metal atom out its of
perfect position about 1013 times each second and
disturb the periodic potential, causing electron scattering. The
scattering event makes the electron change direction; any change in
direction is accompanied by an acceleration; and for every
acceleration there is a force. After many collisions (another word
for the scattering event), the force averages out in the direction
of electron flow.
The force due to collisions of electrons to metal atoms is
called the momentum exchange. In electromigration, momentum is
exchanged between the electrons and the metal atoms and a change in
momentum with time is called a force. To provide sufficient
momentum exchange to cause measurable effects, many electrons must
be available to collide with the atoms. This can only happen in a
metal. In metals, many electrons are easily accelerated in an
electric field.
 |
| |
Sign of the Charge Carriers
Heavily doped polycrystalline silicon was used to illustrate an
interesting property of electromigration physics. Both p-type and
n-type polysilicon resistors doped to approximately 1% were
stressed until failure in strong Joule heating induced temperature
gradients. In the n-type material, failure was near the cathode and
in p-type material failure was near the anode, thus demonstrating
the role of the sign of the charge carrier in
electromigration.
|
|
 |
Semiconductors have far fewer electrons and in a true
semiconductor, electromigration does not exist because there just
aren't enough charge carriers. However, electromigration can occur
in semiconductor-like materials, such as silicon, when they are so
heavily doped that they act as if they were metals. At dopant
levels of around 1%, electromigration has been observed in
polycrystalline silicon, but then the temperature coefficient of
resistance (TCR) is positive. A positive TCR is probably the best
definition of a metal.
The size of the momentum exchange will be proportional to the
distortion in the lattice at any given point. This distortion is
greatest when there is a vacancy nearby, or in the region of a
grain boundary. This is also where diffusion occurs. Vacancies or
grain boundaries must be present for metal atoms to move from their
fixed positions in the crystal lattice ("diffuse"). You can't have
two things in the same place at the same time, so for an atom to
move from site A to site B, site B must be vacant. In grain
boundaries the problem is less well defined, but the concept still
applies. However, a boundary is a region of distortion and open
space, and the diffusion of atoms can be accommodated in these
regions rather easily as compared to the lattice. This creates a
fortuitous situation where the greatest momentum exchange occurs
only at the sites where it is possible for atoms to move.
For the design engineer, electromigration physics can be simply
stated. Electrons flow through a metal film and collide with metal
atoms. The collisions produce a force on the metal atoms in the
direction of electron flow (for n-type materials, opposite for
p-type materials). Electromigration is only significant at high
current densities and only in metals. The magnitude of the
electromigration force is proportional to the current density.
Materials Science
The flux of metal atoms due to electromigration can be expressed
rather simply, using an electrostatic analogue and Einstein's
equation for diffusion in a potential field.

where J is the atomic flux, D is the diffusion coefficient for
the appropriate mass transport mechanism, Z* is a quantity called
the effective valence or the effective charge (although it is
neither a charge nor a valence) that represents the sign and the
magnitude of the momentum exchange, r is the resistivity and j is
the current density. kT is the average thermal energy per atom. The
important observation from Equation 1 is that the
electromigration-induced mass flux is directly proportional to the
current density, to the diffusion coefficient and to the
concentration of diffusing atoms.
Just having an electromigration-induced mass flux is not enough
to cause a problem. For a problem to exist, either more or less
mass must be entering a region than leaving it. If more mass is
leaving than arriving, we can form voids and open circuits. If more
mass is entering than leaving, extrusions will form short circuits
or breaks in the passivation and provide an opportunity for
corrosion. These regions are called flux divergences.
Unfortunately, many opportunities exist for flux divergences in a
typical IC.
A principal source of trouble is in the unavoidable contact to
silicon. The diffusion of Al from Silicon (Si) is zero, and,
hopefully, the diffusion of Si into Al is the same. Therefore,
since electromigration will be driving the Al away from the Si
contact and attempting to stuff it into another, a serious problem
can result. Under the right circumstances, metal atoms will leave
and none will replace them, so voids will form at contacts where
electron current is entering the metal from the Si. Conversely,
extrusions will be generated where the electrons are entering the
Si.
Since contacts and other similar structures are unavoidable, the
potential for electromigration failure exists in any real circuit.
All we can do is design our circuits such that this inevitable
problem is delayed until it no longer matters—and this is the
circuit designer's responsibility.
Effect of Current Density on Conductor
Lifetime
 |
| |
Black's Law
In the late sixties, Jim Black of Motorola was heavily involved in
understanding the "cracked stripe" problem that was later
identified as electromigration. Jim's pioneering work included the
first careful systematic investigations of electromigration failure
kinetics. His experiments uncovered the curious behavior that
electromigration failures followed kinetics that depended not on
the inverse of the current density, but on the inverse
square.
where t 50 is the median time to failure in an ensemble
of samples, A is a constant that needs to be empirically determined
and DH is the activation energy for failure. The experimental
values found for the activation energy suggested grain boundary
diffusion as the mass transport mechanism. For nucleation dominated
failure, this equation has proven to be adequate even to the
present day. Only small corrections, often too small to be detected
experimentally have been needed to keep Black's Law consistent with
the latest theoretical developments.
|
|
 |
From Equation 1 we see that the electromigration driving
force is proportional to the current density. It could be assumed
that electromigration failure would scale in the same
way—linearly with the current—but that is not always
the case. Traditionally, it has been observed that electromigration
failure followed a 1/j2 law rather than 1/j. This has
become known as Black's Law. However, whether this empirical
law holds or not depends entirely on whether the failures are
nucleation or growth dominated. This, in turn, depends heavily on
the process used to construct the metal lines. If there is no
refractory "shunt layer" such as TiN or TiW under the Al line,
failure is nucleation dominated and Black's Law holds. If, however,
the failures are growth dominated, such as is usually the case for
W via failure in narrow lines with shunt layers, Black's Law is not
followed and failure times are dependent on 1/j kinetics. Often, as
might be expected, the failure process involves both nucleation and
growth of damage, and the behavior is more complicated and cannot
be described by a simple power law in j.
Wherever growth dominates or is a significant part of the
failure time, we assume that 1/j kinetics hold. Most recent
experimental data where contacts or vias have been examined in the
presence of refractory conductive shunt layers has supported the
use of 1/j kinetics, whereas most data on conductor lines attached
to bond pads has supported 1/j2 kinetics.
To ensure that electromigration failure does not occur in the
field, we need to limit the current density such that
electromigration failure will not become significant until long
after the projected useful lifetime of the circuit. This is a
function of not only the current density in the metal lines and
contacts, which may behave differently, but also of temperature and
often process variations.
Effect of Temperature on Current Density
Limits
The major effect of temperature on electromigration is in the
diffusion coefficient. Diffusion is a thermally activated process
characterized by the Arrhenius relation and it possesses an
activation energy.

 |
| |
Activation Energy
The activation energy for self diffusion depends strongly on the
diffusion mechanism. Diffusion can proceed through the lattice, or
grain boundaries, and along interfaces or the surface. The lattice
is the most difficult path with the highest activation energy (for
Al DHlattice is about 1.4 eV), followed by the grain
boundary (for Al, DHgrain boundary is about 0.6 eV ) and
then the surface. In Al, the surface is generally not available due
to the presence of a coherent oxide film. Interfacial diffusion
activation energies differ for every interface and can be either
greater or less than that for grain boundary diffusion. Adding
alloying elements generally has the paradoxical effect of
decreasing the lattice and increasing the grain boundary activation
energies. The effect on interfaces is unclear.
|
|
 |
where D0 is a pre-exponential factor that depends on the
diffusion mechanism and DH is the activation energy, also dependent
on the diffusion mechanism.
Equation 3 shows that electromigration is very sensitive
to temperature. For Al, generally a change in temperature of 20
degrees can double the rate of electromigration. Therefore, the
current permitted in a thin film conductor is a function of
temperature. The higher the temperature, the less current can be
permitted and still remain safe from electromigration failure.
Just how much current can be permitted and still maintain
reliability as the temperature is changed will depend on whether
you have nucleation or growth dominated failure and what the
dominant diffusion mechanism is. If we have growth-dominated
diffusion and we increase the temperature such that we double the
diffusion coefficient (approximately 20 degrees for Al alloys and
grain boundary diffusion), we must reduce the current density by
half. Conversely, if we want to increase the current density by a
factor of two, we must ensure that the temperature is at least 20
degrees cooler. If failure is nucleation dominated, an approximate
30% reduction in current is needed for a similar temperature
increase to maintain equal reliability.
Whether failure is nucleation or growth dominated is a matter of
the process used to deposit the metal and the overlying dielectric.
Almost everything that happens consists of an initiation followed
by a continuation. Electromigration is no exception. First the
damage must be initiated, a void nucleated or an extrusion formed,
then the damage proceeds, such as void growth or continuing the
extrusion, until failure occurs. Sometimes nucleation is slow and
takes a long time and growth is fast. When this happens we have
nucleation dominated failure. Sometimes we have the converse, and
the nucleation is either very short or non-existent, and we then
have growth-dominated failure. Electromigration exhibits both types
of behavior.
Nucleation Dominated Failure
Nucleation-dominated failure will be most common in processes
that do not contain a redundant "shunt" layer. Void nucleation
occurs when sufficient stress is generated. To generate stress,
significant mass transport must take place. This takes time. At a
critical stress level, a void will form to reduce the stress in the
system. When the void forms, a tremendous release of strain energy
occurs that promotes very rapid void growth. In the absence of a
shunt layer, an open circuit develops almost immediately, and
failure follows 1/j2 kinetics. At least two other
nucleation dominated failure mechanisms have been identified: the
stress buildup following Cu depletion in Al/Cu alloys, and
passivation cracking induced by compressive stresses which produce
extrusions. In all three scenarios, 1/j2 kinetics
prevail.
Growth Dominated Failure
If there is a redundant shunt layer, the initial rapid growth of
the void will not produce an open circuit. The shunt layer, usually
of a refractory material such as W or TiN, can conduct electricity
even if a void exists in the primary Al conductor. These metals can
withstand extremely high current densities at high temperatures for
very long times. If failure is defined as an open circuit, they
don't fail. However, for most realistic situations, an open circuit
is not a realistic definition of failure. Since a resistance change
of about 10% in global wiring can produce timing errors, the 10%
increase has often been chosen as a failure criterion.
Using a percentage increase as a failure criterion during a test
has some problems. The actual damage that causes a failure will be
a function of the precise geometry of the test structure and the
initial resistance. This is unsatisfying for evaluating real
circuits that don't look like test structures. It is recommended,
therefore, that failure criteria be based on an absolute change in
resistance, the maximum that a particular circuit can withstand
before problems arise.
It is necessary to use test structures that can measure a
resistance change without geometric effects, such as the Blech
Length to affect the data.
 |
| |
The Blech Length
In the 1970's Ilan Blech of the Technion in Israel performed one
of the most important series of experiments in the history of
electromigration science and technology. In these experiments he
had created a test structure that consisted of islands of gold (Au)
deposited onto a refractory underlay. When current was passed
through these samples, the upstream side of the islands moved in
the direction of electron flow and the downstream edge stayed
stationary. If the island was long enough, extrusions formed on the
downstream edge, but if the island was short enough,
electromigration essentially stopped. Electromigration also stopped
when the longer islands shrunk to a critical level. He discovered
that there is a critical product of the current density and the
length of the island, below which electromigration ceases. This is
the origin of the "Blech Length." For any given current density,
there is a length below which electromigration will not
occur.
This behavior occurred because a mechanical back stress, generated
by electromigration, resisted the electromigration force. The back
stress exists only in the presence of a flux divergence and it is
greater in the presence of a mechanically strong confining
passivation layer. For this reason, the Blech Length cannot be
easily pre-determined. It is a strong function of the process and
the physical design of the chip.
In principle, one could make a circuit immortal by designing all
the lines to be shorter than a Blech Length. However, the Blech
Product jxl is only on the order of a few thousand and is a strong
function of the thermal history, so this idea has not been
seriously considered.
|
|
 |
The growth of a void depends on the rate that metal atoms leave the
void, or, equivalently, the rate at which vacancies enter it. The
flux of vacancies or atoms is linearly dependent on the current
density, and therefore the time required to attain a certain void
size will obey 1/j kinetics. Care must be taken in experimental
measurements, however, since inappropriate test structures can
result in just about any value for the current exponent.
For a given metallization, growth dominated failure must take
longer than nucleation dominated failure, since the damage needs to
nucleate before it can grow. However, the nucleation phase can be
very short, approaching zero. The kinetics of failure must be
evaluated experimentally and applied properly. This means that for
electromigration damage in real conductors, we can have either 1/j
or 1/j2 kinetics. It has been observed that for wide
lines, defined as those where the average grain size is smaller
than the line width, 1/j2 kinetics usually dominates,
whereas for narrow lines, 1/j kinetics dominate.
RMS Current and Temperature Gradients
When current is passed through a conductor, the interaction of
the electrons with the lattice produces a thermal energy equal to
the product of the square of the current and the resistance. This
is called Joule heating. Metal lines will heat up whenever current
is passed through them. If the current is low, the heat is
effectively conducted away, but there must be some temperature
increase even if it is not detectable. If the current density
approaches 106 A/cm2, Joule heating can
produce enough energy to make the conductor lines heat up
appreciably. At first this does not appear to be a problem, since
current densities are almost always lower than this due to
limitations induced by electromigration. However, one must realize
that Joule heating is caused by root mean square (RMS) current and
not by the average current, as is electromigration. For a narrow
pulse, the RMS current can be much higher than the average current.
The average current can be well within any guidelines that may be
set for electromigration considerations, yet significant Joule
heating can result. This can be more prevalent on upper level
metallization, where heat must be conducted through several layers
of interlevel dielectric, which is a poor thermal conductor.
The problem with Joule heating is not the modest temperature
increase, but the temperature gradients that result. Typically, at
the current densities found in modern circuitry, temperature
increases would range between a few and a few tens of degrees
Celsius. This produces temperature profiles that decay within a few
microns, so that temperature gradients of 104 to
105 degrees Celsius/cm will be found. Since
electromigration is thermally activated, the temperature gradients
produce flux divergences that approach that found at absolute
divergences such as at contacts or at microstructural features.
RMS current density must then be limited to about 2 x
106 A/cm2 for lower level lines and about
half that for upper level lines. Unfortunately, the reliability of
metal lines in the presence of temperature gradients cannot be
accurately estimated. Temperature gradients can vary tremendously
throughout a real structure, depending on subtleties of the
geometry and on the use of the underlying silicon devices. The only
way to deal with these issues is to take a conservative approach
and forbid temperature gradients by limiting the RMS current
density to the levels suggested above.
Microstructure and Electromigration: Line
Width Effects
 |
| |
Al/Cu
One of the first applications of electromigration engineering to
solve reliability problems came about 1970. At that time, thin
films were usually deposited by the high temperature evaporation of
metal films. Legend has it that when IBM was trying to solve the
electromigration problem, one evaporator was producing better
material than any other. It was a mystery. After weeks of study,
someone found out that the electron beam used to melt the Al used
for the conductors was misaligned. Instead of impacting directly
onto the Al charge placed in a Cu container for that purpose, the
e-beam was hitting the Cu and causing some of it to melt and be
deposited along with the Al. The resulting Al/Cu alloy proved to be
remarkably resistant to electromigration failure, increasing the
median time to failure by more than an order of magnitude. It was
determined that Cu slowed down the diffusion of Aluminum in the
Al/Cu grain boundaries. After this effect was understood, it was
exploited.
This, however, did not eliminate electromigration failure, but
served as a band-aid until the technology caught up with the
capabilities of Al/Cu. However, the use of Cu was a great
breakthrough in electromigration technology, buying several years
of performance and making the high performance IC possible. Today
we live within the limitations of Cu in Al by making intelligent
compromises and choices. Searches for other alloys in a process
reminiscent of alchemists looking for the Philosopher's Stone have
not turned up anything that works better.
Sometimes you just get lucky!
|
|
 |
Electromigration is a form of mass diffusion, where the driving
force is provided by the electron flow. Therefore, things that
affect diffusion will affect electromigration. Metals are composed
of atomic crystals where atoms are lined up very nearly perfectly
in only a few allowable configurations. The size of these crystals
("grains") is finite. Where the grains meet, they form a region of
disorder ("grain boundary"), and provide a pathway for easy
diffusion as compared to the nearly perfect metal lattices.
In the early days of ICs, the thin film conductors used in
manufacturing were relatively wide, fine grained, and composed of
many grains. These were referred to as polycrystalline. The grain
size was about the thickness of the film, generally about one
micron. Across the width of a typical conductor several microns
wide, many grain boundary pathways were available to accommodate
the electromigrating atoms. It came as no surprise that
electromigration failure was inversely proportional to the grain
size of the films: the more grain boundaries present, the more
atoms that can be transported along them, and the earlier the
failure time.
As line widths became smaller, the grain size of the metal films
became larger. Conductor lines became comparable in width to the
grain size and took on a "bamboo" like appearance where most of the
grains spanned the line width, providing no continuous grain
boundary pathway in the direction of the current flow. When this
occurred, a peculiar effect was found: failure times were strongly
dependent on line width. Narrow lines at the same current density
became substantially more reliable than wider lines, as long as the
grain size was uniform.
The reason for this behavior was not hard to figure out. The
lack of easy grain boundary pathways meant that the atoms had to
take more arduous paths such as the lattice or various interfaces
in their journeys. The activation energy for failure was found to
be a function of line width, since the diffusion process changed.
What became even more interesting and important to reliability
engineers was that the precise arrangement and orientation of the
grains had a large effect on the lifetime of the conductor. In
fact, as the ratio of grain size to line width increased, the
reliability became poorer before it got better, and then got worse
again as lines entered sub-micron widths.
Today, we understand this behavior and can predict the
reliability from test data, grain structure, and particulars of the
metal deposition process. New effects, due to the presence of
refractory shunt layers and W plugs, have surfaced and have also
been explained well enough that they can be tamed. However, a
fundamental understanding of the process of solid state diffusion
and what affects it are essential in interpreting test results. For
this reason, conservative default values for parameters used in
relating electromigration test data to real circuits should be
employed until careful testing and data interpretation justify a
change.
The choice of test structures and test conditions are of
critical importance in extracting meaningful parameters to be used
in interpreting the test data as it relates to actual chip
performance. The wrong test or the wrong test structure can produce
fatal results. The test structure must be designed to reflect the
process and usually a single structure cannot.
Optimizing for and Ensuring Reliability
 |
| |
Failure Distribution
The distribution of electromigration failures has recently been
the subject of much discussion. Traditionally the lognormal
distribution was used, where the logarithms of the failure times
are normally distributed. But this has conceptual and practical
problems, the most important of which is that the lognormal
distribution is not extendable. This means that given an ensemble
of n components and a lognormal failure distribution, if we make up
a new ensemble of combinations of the components in series so that
the weakest of these "links" produces failure, the resulting
distribution cannot be lognormal. Mathematically, the probability
of failure, P f, for a chain of n links, given that the
probability of failure of a single link is known, is:
If P f (1,t) is lognormal P f (n,t) cannot be
for n>1. Therefore, this earlier way of estimating the
reliability of n components must be incorrect.
We can estimate the value of P f(1,t) from test
structures and define that the chip would consist of n effective
failure elements. Determining this is not a trivial exercise,
however. The number of failure elements in a test structure must be
estimated. The good news is that once we have defined what a
failure element is, we can, in principle, decide what the
probability of failure for each element is. The probability of
failure of the chip then can be estimated more accurately than Equation 4 by substituting the failure probability for each element.
where P i is the probability of failure for each failure
element.
|
|
 |
The challenge to IC designers is to ensure reliability while
squeezing as much performance out of the process as possible.
Unfortunately, the requirements for these two goals are
conflicting. Higher performance means higher currents in smaller
conductors, whereas reliability demands lower current densities.
In the past, the custom has been to generate design rules based
on "worst case" scenarios. In this strategy, current densities were
limited to a certain value assuming that all the lines on the chip
were to be used at this high current density. This was patently
silly. The limiting values were determined from extrapolating the
failure times, usually fitted to a lognormal failure distribution,
to some required level of reliability based on the chip complexity.
This approach was too confining and designers of today's ultra-high
performance microprocessors have begun to use a strategy known as
"Reliability Budgeting." All one needs to do is calculate how much
power is dissipated by a chip running with every wire at the
electromigration limit. It is often kilowatts.
To perform reliability budgeting, we need to know how much
current is going through each element. In today's complex
microcircuits, this is a daunting task, but the payback is
significant. The allowable current density for critical circuit
paths can be increased substantially while maintaining reliability,
since the majority of circuit elements have little to no current
flowing though them and are thus effectively immortal. In addition,
if Pi can be located in the circuit, trouble spots can
be eliminated and a more reliable circuit can be designed.
Great care must be taken to ensure that the information fed into
the calculation of Equation 5 is correct. If the failure
statistics are incorrect, or the input parameters such as lifetime
and current exponent are wrong, a disaster can unfold. However,
optimizing for performance and reliability can be done successfully
and, in fact, the successful design and manufacture of high
performance microprocessors has been possible only by employing
some form of reliability budgeting.
Summary
Electromigration has been with us since the early days of solid
state devices, even before ICs took center stage. Like an old
soldier, electromigration never dies, and unfortunately it does not
have the good taste to fade away. Whenever we "conquer"
electromigration, we enter new regimes where the demands of
increased performance require that interconnect be more and more
reliable under conditions where metallization is inherently less
reliable. The promise of developing future metallization schemes
that will erase the problem has so far eluded us and there is no
guarantee that the future holds a panacea. Copper may help a
little, but not nearly enough as was hoped for and it still only
buys a little time. Eventually the capabilities of Cu will be
seriously challenged, and this is assuming we can solve the
daunting processing problems that have confronted us over ten years
of development.
Recent advance have given us hope that although electromigration
will always exist and cause problems, we can control it such that
advanced microcircuits can still be designed with the reliability
we need. The use of reliability budgeting, if coupled with a
detailed knowledge of manufacturing process capabilities, can allow
advances without compromising long-term performance. This
complicated task can only be accomplished with the right tools and
talents.
Electromigration as a design issue will be with us until we
develop a room temperature superconductor with a critical current
density of millions of amps per square centimeter that is
compatible with semiconductor processing. Such a development is far
in the future, and we must exercise diligence in controlling the
beast and respect its potential.
Where is it written that life is to be easy?
About the Author
J.R. "Jim" Lloyd specializes in
electromigration and metallization reliability for chip and
packaging applications, reliability testing and analysis,
qualification plans, and electromigration failure modeling. His
industrial experience includes reliability engineering and R&D
positions at IBM and Digital Equipment Corporation. In addition, he
was visiting scientist at Max-Planck-Institut in Stuttgart,
Germany. He has published more than 60 papers on semiconductor
materials science and reliability engineering, has been invited to
speak to audiences throughout the world, and has taught courses and
workshops at Stevens Institute of Technology, New York Polytechnic,
MRS, ASM, IBM, Digital Equipment, IRPS and ESREF (Europe). He holds
the Ph.D., M.S., and B.S. degrees in materials science and
engineering from Stevens Institute of Technology. He can be reached through email at
jrlloyd@vinfiz.net.