Microelectrmdcs and Reliabih'ty
Pergamon Press 1973. Vol. 12, pp. 35--44.
Printed in Great Britain
IS THERE A RELIABLE SCREEN FOR VHF POWER TRANSISTORS? J. INGRAM.COTrON, D. V. SULWAY and J. P. M. LEDUC
Communications Research C~tre, Shirley Bay, P.O. Box 490, Station "A", Ottawa KIN 8T5, Canada 1. INTRODUCTION
DUIUNO the recent manufacture of an r.f. transmitter by a Canadian company, the company concerned was forced to reject the r.f. power devices chosen for the output stage due to the fact that the devices delivered were not capable of being operated within the manufacturers' ratings. The devices had been 100 per cent screened and burntin. However, examination of the fall-out figures of the screen showed inconsistencies between the two batches. This raised doubts as to the degree of confidence which could be placed in either batch and as a result led to the development of a non-statistical analytical method of assuring confidence. In this paper we examine the various procedures a reliability engineer can go through in order to obtain reliable devices, using the example of the purchasing of a relatively small number (~-400) of state-of-the-art r.f. power transistors by a small company. The procedures are considered from a cost-effectiveness viewpoint, where the equipment manufacturer has only limited resources for testing and analyzing devices. 2. I~ILOSOPHY AND BACKGROUND OF SCI~gs~'ING Screening is the panacea for sorting out unreliable devices and the magic figures of 168 hr at 125-150°C burn-in are calculated to assuage the fears of the most intransigent customer. But in the end, what does this give you, as the customer? What have you paid for? How much extra reliability have you gained? Could you get more confidence for more expenditure, or even the same for less expenditure? Some manufacturers' PR will
extol the virtues of their high reliability devices which are standard parts subjected to 168 hr or less burn-in at 125 or 150°C. After burn-in, a sample of the lot is taken, and key electrical parameters measured. The rest of the lot is given a functional test on a "go, no-go" basis. These parts are offered with a price tag of 30-40 per cent over the commercial standard part cost. Devices screened to full JANTX or equivalent type specifications can be negotiated on a basis of quantifies involved and complexity of screen. Generally, a standard set-up charge of $300 is made irrespective of the number of devices required. What then is the gain in using a manufacturer's version of "highreliability" devices over the commercial part, or in using a full JANTX part over the manufacturer's "high-reliability" part? The customer is now entering into the sphere of the "numbers game" and too much reliance is placed on statements such as "tested to 1 or 0.65 per cent AQL" or accelerated life tests inferring true failure rates for a given confidence level. The glossy brochures in some cases present very confident predictions which are not always met in practice. As an example, a typical sampling plan used by manufacturers for final test is a 1 per cent AQL which assures a manufacturer that a lot with 1 per cent defective devices will be accepted at least 95 per cent of the time (or 19 times out of 20). A look at Fig. l(a) gives an indication that there is a 25 per cent probability of accepting a lot in which the defective content is as high as 5-10 per cent. A look at Fig. l(b) will show the probability of board or unit rework against the number of devices 35
J. I N G R A M - C O T T O N , D. V. SULWAY and J. P. M. LEDUC
Probab~'ty of accep~mce Lot size
Fl0. l(a). Risk mmciated with 1 per cent AQL xmpled lore. per board or unit for the two AQL levels of 0.65 and 1.0 per cent. Thus, ten devices in a transmitter could imply a 14.7 per cent prohabih'ty of unit rework. Over the past few years the prices of semiconductors and integrated circuits have decreased radically. This price reduction has been met by cost reduction programs which have streamlined processes and removed some process screens not usually specified by a customer. As a result, the lot reject content has tended to increase.
No. devices/ unit or board Probability of board re-work 10
FIG. l(b). Probability of rework on units which have devices inspected to 0"65 per cent or 1"0 per cent AQL. AQL is really a manufacturer's ally if the customer does not understand the risks that he is taking. The problem of obtaining high reliability devices becomes one of: (a) Specification of screens and tests. (b) Confidence and trust in a manufacturer or monitoring of his processing.
(c) 100 per cent testing. What is the purpose of a screen? T o produce a given number of devices which have a co~'~n6~ent, predictable failure rate at reasonable cost and which will meet the overall system reliability requirements. One hundred per cent testing of critical parameters verifies the elfeetivenem of the screen or screens, provided that the right parameters have been selected in the first place. Testing prior to burn-in and after burn-in with serially numbered devices gives an ir~'cation of the stability of the parameters. Testing to prove failure rates is not normally economic or feasible. For example, for 60 per cent confidence of a failure rate ~<0.001°/o/10Shr, requires only one failure in 2>< 10s device hours, or, for 99 per cent confidence, one failure in 6.6 × l0 s device hours. The possible alternative is to carry out accelerated stress testing, Ref. . But this is another subject on its own and cannot form part of this paper. Examination of a curve of failure rate vs. time for a number of devices usually shows an initially high failure rate which drops off to an approximately constant, usually small value, until the time scale becomes comparable with the mean life of that batch, at which point the failure rate begins to increase again. The initial failures or "freak" failures can sometimes be traced to defects introduced during the bonding and encapsulation process. The final failures represent the appearance of a failure mechanism usually more fundamental than that causing the initial failure. If the device has been in production for some years, then a degree of confidence hm evolved in the device due to the amount of device hours that
VHF POWER TRANSISTORS have accumulated. However, there have been cases where the current production of well-established, reliable devices have shown a deterioration of reliability which mmetimes borders on catastrophic. Information of this drop in reliability only becomes generally known after independent testing has been carried out. During this period, tens of thousands of unreliable devices can reach equipment production lines, which can give rise to expensive rework. Also, there is the question of large volume and small volume users. Large volume users are those where the number of devices required runs into the tens of thonsanda and is probably a continuing requirement, while small users may only require a few hundred devices. Where the device manufacturer is faced with a sizeable order, he is then more ready to listen to requests for particular requirements, and set-up charges are insignificant compared to the cost of the devices. On the other hand, where the requirement is for a small number of devices for one specialized production run, the set-up charges for the screening specified can be a substantial part of the total device cost. Since the order is a relatively small one from the manufacturer's point of view, he is not likely to attach as much importance to sorting out problems which arise after the devices have arrived in the customer's plant. 3. T I ~ IDEAL SCi~m~ This is a hypothetical case from which we can examine the practical methods of screening. The ideal screen will: (1) Remove all devices which would fall before the minimum required life. (2) Not degrade the survivors in any way. (3) Will provide absolute assurance that all survivors will function in the various circuit cortfignrations to the day after the warranty expires. In order to provide the above information, it is necessary to go through a sequence of mechanical and electrical tests. However, the greater the number of tests, the greater the cost. Against this, it is possible to evaluate the effects of device failure in delivered equipment. Sometimes such failures can be catastrophic. In these cases the cost of the
devices used is allowed to rise (i.e. a more complex screen is speci6ed). If the device failure mode results in a degradation of the equipment performance, this could be tolerated for some applications where repairs are possible. Thus, the three factors--device cost, maintainability and reliability--have to be weighed up and, hopefully, optimized for a given piece of equipment. Intrinsically, a correctly rated semiconductor device will operate for ever (where "for ever" is defined as much greater than the time required for the chassis to fall apart). However, practical semiconductor devices do not operate for this length of time. This is due to (1) bad design and (2) that they are fabricated and tested by human beings who are highly prone to error. The customer can go some way towards the concept of prior inspection by specifying a 100 per cent pre-cap visual inspection. This inspection, carried out immediately before the device is hermetically encapsulated, is designed to remove gross procuring defects. If properly appfied, this can greatly reduce the infant mortalities. Unfortunately such inspections are usually carried out at magnifications of ~ × 30- × 70. At these magnifications, significant failure mechanisms can pass through. This is being remedied on aerospace ICs by specifying a scanning electron microscope examination of samples of the die from each wafer, and random samples of the bonded devices3 s~ The simpler, low magnification, visual inspection is not satisfactory, because the people performing the inspection are prone to error. Economic pressures are such that a certain number of devices must be examined on each shift, and, as the inspectors become tired, faulty devices are passed as acceptable. Examples of this are quoted later in this paper. Thus, the ideal screen would recognize these human frailties and successfully remove the errant devices. Ideally, this would involve examining the device construction and processing prior to purchase, and attempting to pick out weak points in the final device. Specific tests are then designed to particularly excite these expected failure mechanisms. Figure 2 shows the failure mechanisnm which are excited by the common screening methods. The problem here is one of cost and time. A
J. I N G R A M - C O T T O N , D. V. SULWAY and J. P. M. LEDUC
detailed examination, either of the manufacturer's plant (assuming the manufacturer agrees) or a sample of his current production, is expensive. Examination of the plant implies that the purchaser has permunel with the necessary background to meaningfully undertake this work. Analyeis of current production samples implies either an expensive commitment of equipment and personnel on the part of the purchaser or the use of external agencies with the resulting additional costs. Even then, there is a time lag between evaluation and purchase, during which the device manufacturer may change his process, thus negating the expensive work carried out. 4. TI-m RgAL ~ This may become a watered-down version of the ideal screen, where economies has thrust itself in as "cost-effectiveness". The device reliability engineer may specify a certain procedure which the purchasing department will exeamine and call on the engineer to justify the additional expense. Thus, screens should be examined to determine the failure mechanisms excited for each stage, and the cost of the various stages. It is unneceauay to have two steps which remove the same failure mechanism unless the second step also removes some other failure mechanisms. When the screening processes have been examined, a practical screen can be evolved which includes the necessary steps to remove the failure mechanisms to which the device is prone, using screening methods which are the least expensive (see Fig. 2). Of the screening methods usually employed, bum-in has the greatest potential for exciting failure mechanisms. What do you specify as your requirement for bum-in? This raises the question of parameter drift meaaurements as opposed to "go, no-go" measurements, the former being far more expensive than the latter. "Go, no-go" measurements cannot take into account large changes in device parameters where the end-point is less than the li~it. This means that if the limit could be suitably chosen, "go, no-go" testing could be made to give more meaningful results. For example, if the initial parameter values were tabulated and found to be within fairly narrow limits, then the end-point limit could be set such that large parameter drifts would
be rejected. This presupposes that the absolute value of the initial parameter spread is small, which may not be the case if the lot size is large. Thus, A parameter measurements should be specified and the ensuing cost absorbed. However, there are possible snags to A parameter measuremerits. For example, Icno limits could be set to a maximum excursion of <~-~100 per cent but not to exceed 10-° A. Thus a device which changes from 10--x0A to 1.8× 10-1° A is acceptable while one in the same lot which changes from 10-11A to 2 × 10- u A is rejected t But these measurements are only taken at one value of VcB. Thus, no information on the shape of the I c s o vs. V c s curve is obtained. Such a curve can be used to give some information on the mechanism of the excess leakage current. ~-iso, since the measurements are made at the beginning and end of the burn-in period, there is no real information on the behaviour of the parameter drift with time. Thus, an increase in IcBo with time which has reached a saturation value is more acceptable than a smaller increase over the bum-in time which will carry on increasing when the device is in service. The question then arises as to the stress levels to be applied to the devices during burn-in. Burning-in a device at an elevated temperature with bias applied uses up a fraction of its useful working life. For example, ira failure mechanism is present which follows an Arrhenius' law with an activation energy of 0.7 eV, then 192 hr at 200°C may be equivalent to approximately 10 yr at 60°C. The implication is that having survived the equivalent of 10 yr at 60°C, the device will survive a further 10 yr at an operating temperature of 60°C and that a device which has survived a screen of, for example, 384 hr at 200°C, is twice as reliable as one that has survived a screen of only 192 hr at 200°C. It should be borne in mind that we are consider-
ing two basic failures, so-called "initial" failures and long-term failures. Initial failures are the devices which fail in the early stages of bum-in, while long-term failures are those devices which have a failure mechanism present which has a long time-constant and will give rise to failure at a time much longer than the bum-in time. Ideally, the bum-in should be in two parts: an initial, relatively low-stress period for 100 per cent of the lot
High ~ p ,
~I~aermel ~ o c k
Herm Ox. Drift Particles leak def
(Courtesy of Rome Air Development Centre)
2. Failure mechanisms exc/ted by various m~reeningmethods.
Die Lead Cont'mn Bulk bond bond
Long Ext. Sec'y leads lead Metal breakdef down
J. I N G R A M - C O T T O N ,
D. V. S U L W A Y and J. P. M. L E D U C
to remove the initial failures. Statistically meaningful samples of the survivors are then put on accelerated life-test at higher streas-leveLs, and the mean time between failure of the lot calculated. 5. PARTICULAR P R ~ POW~g ~ R $
Of necessity, power transistors must handle high currents while operating at a relatively high junction temperature. R.f. power devices usually operate at a relatively low voltage, but, against this, they operate at frequencies where parasitic inductances and capacitances could give rise to spurious oscillations. The dominant failure mechanisms to be considered here are: (1) Bond problems: presence of intermetallics. (2) Metallization: electromigration, contact to the Si. (3) Thermal runaway: non-uniform current sharing between fingers of the inter-digitated structure, and between the individual chips of a multi-chip device. (4) Shorted junctions: etch pits can form in Si regions contacted by an A1 film at a rate strongly influenced by temperature. (8) These mechanisms occur in addition to those associated with mechanical problems such as hermeticity, bose conductive particles, contamination, etc. The over-riding consideration of the reliability engineer who wishes to reduce the dominant failure mechanisms associated with r.f. power transistors is to reduce junction temperature for a given set of operating conditions. This consists of maximizing the heat flow from the chip to the case or stud (i.e. minimizing Ojc) and suitably heat-sinking the case or stud to dissipate this heat flow to the surroundings. A silicon chip is only capable of delivering a certain number of watts/cm 2 of active area. When a greater power output is required, the chip size increases. As the chip size increases, the chances of a chip being defective increases, so there is an argument in favour of connecting two or more chips in parallel rather than designing a larger chip. However, this raises the possibility of nonuniform performance between chips, and if the die bond of one has a large void, then this one will run hot and may degrade more rapidly. T h e other problem particular to r.f. devices is
the fact that a particular circuit is designed around a particular device. This leads to a lack of interchangeability of devices and a dependence on sole sourcing. Thus the purchaser is completely dependent on the current production of a particular line of devices. The decision to use a particular r.f. device is usually based on the assessment of a small number of selected samples which are not representative of the production run which will be used in the finished equipment. 6. ~ l~ffrIc~ ~II!RRN AII~IYrI~ Details of the screen used are outlined in Fig. 3. The pre-cap visual inspection used as the first step of the screen is designed to remove defects due to poor bonding techniques, gross metallization defects, damaged die or contamination on the die or header. The next step, a hermeticity test, checks on package construction and sealing. The procurement specification set a maximum value on 0j~ of 1.5°C/W, which is a monitor of the quality of the die bond, thus giving a measure of the ability to remove the heat generated at the collector-base 100% SCREEN LOT QUANTITY
I00 REJECTS 324
HERMETICITY IxIO-SATM ¢c/.ec
I RESISTANCE 1.5" C/W
I I °cELECTEST I I oc HRS ~ - I27 NW 48 T¢-90e C
I'r-~iff'-I LEFT IN I BURN-INI
I EQPT DAMAGE I REJECT, LOST
I & RF POWERO/P 75W
Fxo. 3. Practical screen adopted, showing the fall-out at each step for the two Iota proce~ed.
V H F P O W E R T R A N S I STORS
junction. Following this, the d.c. electrical parameters were measured;, the values for these parameters are shown in Fig. 4. The survivor8 of this part of the screen then went to burn-in. The stress level adopted was 48 hr at a case temperature of 90°C while dissipating 2 7 W ( I ¢ = 1 A , V c m --- 27V). These conditions do not represent any form of accelerated stress testing, but rather duplicate the d.c. conditions under wldch the devices would operate in the finished equipment. The r.f. power dissipated in the device in circuit is ~ 1 6 W peak with a 10:1 duty cycle, and a 10 kI-Iz pulse repetition frequency. This addition, in terms of average d.c. equivalent power, is quite small. However, this raises the question of testing an r.f. device under d.c. conditions, and adequately simulating in-circuit conditions. Unfortunately, an r.f. operational screen with the attendant power drives, matched loads and Parameters
r.f. power output test at a frequency of 150 MHz which required a minimum power output of 75 W. 7. INTERPKI~ATION OF THE RESULT8 OF THE PRACTICAL 8CRm~N
Figure 3 shows the fall-out of the two lots received at each step of the screen; the first batch of devlces received consisted of 80 survivors of an initial batch of 100 devices. Only one of the 84 devices that reached the power output test failed this test. This could be dismissed as a freak failure, so that the survivors could be looked on as a good lot. However, the results of the second lot, i.e. 26 failures out of 62 on the power output after burnin, raised serious doubts as to the validity of the first lot. How could there be such a variation between lots? Given that tbis variation existed, how was the Test limit
35 V rain at Ic ----200mA
65 Vmin at Ic = 200mA
4 V rain at IB ----10mA
2.0mAmaxat Vc~ = 30V 5.0 min at I c = 3.0 A and V c e = 5.0 V (d.c. test)
Fxo. 4. D.c. electrical tests made before and after burn-in. monitoring equipment would give rise to very high set-up costs, which could not be justified for the small number of devices required. Also, the practical difficulty of finding a firm willing to carry out this scr~nlng would have led to an increase in the procurement delay. Thus the purpose of the low-stress burn-in was to reduce the risk of initial failures. Also, since this was a state-of-the-art device, we were dependent to a great deal on the experience of the device manufacturer in selecting a suitable burn-in schedule. This stress level was considered to be capable of removing the initial failures without degrading the survivors appreciably. After the 48 hr burn-in, the devices were given a d.c. electrical test (see Fig. 4), and a functional
equipment manufacturer to obtain some degree of confidence in the survivors of each batch? Remembering that the burn-in had been aimed at removing "initial failures" only, ,~35 per cent failure rate on the second batch cansed a great deal of consternation, particularly as 219 pieces were "still in burn-in". At this stage the equipment manufacturer is faced with a difficult problem. What to do with the 114 "good" devices ready to be soldered into the equipment? Also, what to do about the 300 devices still to b e processed and shipped? He is faced with two alternatives. The first is to take random samples of the devices in his possession, and perform an accelerated life-test to be able to predict a failure rate for the other devices in the lot.
J. INGRAM-COTTON, D. V. SULWAY and J. P. M. LEDUC
The other approach is to ascertain the failure mechanism of throe devices which failed the power output test, and to examine samples of the survivors to determine whether these devices could be subject to the same failure ~ . Either approach involves the equipment manufacturer in extra cost. Also, during such investigations production of equipment would be frozen pending the results. Also, the second approach relies on the co-operation of the device manufacturer at least to supply the failed devices, and at best to conduct a failure analysis and to communicate the results to the purchaser. Another question which arises at this stage, is whether or not the burn-in is damaging the devices. The device manufacturer, faced with the results of batch number 2, is obviously under some pressure to blame the poor yield on the burn-in conditions. However, examination of the manufacturer's recommended operating limits in this case shifts the emphasis back to the device manufacturer, since the burn-in level chosen is well within the safe operating limits. As further corroborative evidence to the results of burn-in, prototype engineering samples of these devices, which had not been subjected to burn-in, were showing failures due to loss of output power in bench tests of bread-board models. Thus the whole feasibility of using these particular devices in the circuit was called into doubt.
(2) Approximately one-half of the 144 devices available would be required in order to generate a stati~ically meaningful number of failures at, say, three stre~ levels. This would not have left enough devises for equipment completion, and would only apply to that particular lot and not to the subsequent devices. A quality assessraent approach to survivor reliability is not easily made quantitative. Figure 5 is an exploratory approach based on the limited experience that we have had to date. This figure describes the options open to the reliabifity engineer depending on the results obtained. Further investigation is still required to determine an optimum figure for the number " N " , although this is quite likely to vary with the particular investigation being performed. 9. RESULT8 OF THK FAILURE MECHAN/BM A~i88MENT
In discussing these results, we need only consider two of the six samples examined. The two samples were one of the "good" devices and one of the bread-board failures. The results so obtained were strengthened by failure analysis carried out on the burn-in rejects by the device manufacturer. The two samples considered show that the same failure mechanisms applied both during burn-in and bread-board operation. Also, a cursory examination of the "good" device indicated that manu8. m TgST VS. FAmURe facturing defects were present which should not AssImsMmer have passed the pre-cap visual inspection. The method chmen for these particular devices Figure 6 is an optical micrograph taken at was an independent asseos_~mentof the survivors of ~-,× 30 magnification of a survivor of batch 1. the burn-in and power output test, and an analysis (Note: The 100 per cent pro-sap visual inspection of the prototype devices which gave problems in of this batch was carried out at between × 30 and the bread-board circuit. Also we were able to main- × 45.) Chip number 1 shows mechanical damage tain a good liaimn with the device manufacturer, which cannot be attributed to burn-in or to who performed a failure analysis of some of the removal of encapsulation. At this magnification device, rejected after burn-in. there are indications of metallization changes The reamns for not choosing an accelerated which could be attributed to bum-in. life-test were: The failure mechanism assessment consisted (I) No company was readily available with mainly of a scanning electron microscope examifacilities for accelerated life testing of r.f. nation of the device surface after a low-power devices. (Note: Shipment of such devices optical examination. Figure 7 illustrates the surface to and from the U.S.A. could lead to appearance of this device which meets the minidelays comparable with the time mum r.f. power requirements as specified, i.e. it is a "good" device. required for testing.)
FIG. 6. Optical micrograph of the two chips.
x 500 FIG. 7. SEM micrographs of the device. (a) Fused emitter resistor. (b) Improperly etched metallization. (c) and (d) Smeared and scratched metallization.
(e) Over-alloying of the Al contacts due to the formation of a stable hot-spot. (f) Higher magnification of part of (e).