Adaptive optimization in neural networks

Adaptive optimization in neural networks

Physica A 185 (1992) 466-470 North-Holland Adaptive optimization in neural networks K . Y . M . W o n g ~ a n d D. S h e r r i n g t o n Department o...

202KB Sizes 0 Downloads 97 Views

Physica A 185 (1992) 466-470 North-Holland

Adaptive optimization in neural networks K . Y . M . W o n g ~ a n d D. S h e r r i n g t o n Department of Physics, University of Oxford, 1 Keble Road, Oxford OXI 3NP, UK We apply the principle of adaptation to optimize the performance of neural networks with (i) noisy retrieval and (ii) disruptive dilution.

I. Introduction Learning in neural networks can be described as a search procedure in the space of the adjustable synaptic weights. A performance function of the information to be stored is prescribed, and the purpose of the search is to look for the network configuration which maximizes it. Although early work [1] has focused on performance functions when errorless inputs are presented to the network, we are more often interested in situations dependent on the operating conditions after the learning state has completed. Networks operating best in unperturbed environment may not do so in disruptive environment. Perturbations to the network may refer to ambient noises present in the dynamics of retrieval, or disruptive cutting of the synaptic weights. In this paper we consider two examples of optimizing the performance of neural networks: (i) How should we optimize the final overlaps and basins of attraction in attractor neural networks with noisy dynamics of retrieval? (ii) How should we optimize the robustness against disruptive cutting of the synaptic weights?

2. The principle of adaptation Below we illustrate how the principle o f adaptation enables us to find the optimal solution. In fact, this principle has a potentially much wider scope of Present address: Department of Physics, Hong Kong University of Science and Technology, Clear Water Bay, Kowloon, Hong Kong. 0378-4371/92/$05.00 O 1992- Elsevier Science Publishers B.V. All rights reserved

K.Y.M. Wong, D. Sherrington / Adaptive optimization in neural networks

467

application than these specific examples. Details pertaining to the above two questions are published in refs. [2, 3] respectively. Just as a living organism survives best in an environment to which it is adapted, a neural network is expected to perform best in an operating environment identical to the training environment. This notion has recently been formulated as the principle of adaptation [2]: the performance of a neural network is optimal in a retrieving environment, if it is optimized in the same environment during the training stage.

3. Example 1: Noisy retrieval in dilute attractor networks

To optimize the attractor overlap, i.e. the asymptotic retrieving overlap with a nominated pattern, the principle of adaptation requires us to adjust the training overlap to be identical to the attractor overlap. This means that the performance function employed in the learning stage of the network has to be the averaged output overlap when the input is chosen from an ensemble of examples, each having a training overlap with the clean patterns, which is exactly equal to the attractor overlap. Because of this interdependence of training and optimal attractor overlaps, they can only be determined selfconsistently. Similarly, to maximize the size of the basin of attraction, the principle of adaptation requires us to adjust the training overlap to be identical to the overlap at the basin boundary of attraction. Again, they can only be determined self-consistently. Graphically, retrieval in dilute attractor networks can be determined by an iterative map f(m) between the input and output overlaps. This means that if m(t) is the averaged overlap with a particular pattern at the time t, then m(t + 1 ) = f ( m ( t ) ) for parallel dynamics, and d m / d t = f ( m ( t ) ) - m(t) for random sequential dynamics [4]. The attractor overlaps and the basin boundaries are therefore given by the stable and unstable fixed points respectively. Let fm~(m) be the retrieval iterative map for a network optimized at the training overlap m t. The principle of adaptation implies that this family of retrieval maps is enveloped by the curve fro(m), as shown in fig. 1. Thus the optimal overlaps can be obtained at the stable fixed points of the retrieval envelope, and the optimal boundary overlaps at the unstable fixed points of the envelope. These fixed points give both the retrieval conditions to be optimized, and at the same time the training conditions used in the optimization procedure. Fig. 2 shows the phase diagram in the space of storage level t~ and retrieval noise temperature T; a is the number of stored patterns per synapse, and T is

468

K.Y.M. Wong, D. Sherrington / Adaptive optimization in neural networks

m f (m ,o) ~ ITI

-0

Fig. I. A schematic diagram of the family of retrieval curves f,.,(m) and their envelope fro(m). The fixed point of the envelope optimizes the network performance.

0.8 i~ 0.6 0.4

02 O. 0,0

0.50

I 1.0

I ~ 1.5

l 2.0

Fig. 2. The phase diagram in the space of storage level a and retrieval noise temperature T. The full line and dotted lines show the storage capacities of the optimally adapted networks and the MSN respectively. More than one stable fixed point of the retrieval envelope is present in region R. the width of a G a u s s i a n variable i n c o r p o r a t e d in the local field of the n e u r o n s to m o d e l the retrieval noise in the dynamics. T h e full line indicates the storage capacity o f the n e t w o r k w h e n it is optimally adapted. F o r c o m p a r i s o n , the d o t t e d line indicates the storage capacity using the so-called maximally stable n e t w o r k (referred to as M S N hereafter) which has the m a x i m u m storage capacity at T = 0. H o w e v e r , the increase in storage capacity f r o m the M S N to the optimally a d a p t e d n e t w o r k is increasingly m a r k e d at increasing t e m p e r a ture. A t T = 0.38, the phase b o u n d a r y has a kink, and for higher t e m p e r a t u r e s , the H e b b i a n n e t w o r k [4] has the best storage.

K. Y.M. Wong, D. Sherrington / Adaptive optimization in neural networks

469

4. Example 2: Dilution robustness in feedforward networks

We consider 3 types of dilution: (i) Annealed dilution: This means that dilution is carried out as an integral part of the learning process, so that the diluted synapses can be placed at the most favourable positions for storing the information. (ii) Random dilution: This means that synapses are diluted randomly without correlation with the stored information. (iii) Clipped dilution: This means that synapses with magnitudes weaker than some threshold are diluted. Results for the output overlaps in the different cases are shown in fig. 3. Not surprisingly, the annealed diluted network has the highest output overlap (curve AD). In fact, for a < 2, there is output error only when the fraction of undiluted synapses is below a critical value. For random dilution, the MSN is most robust only at a low degree of dilution (curve RM) [5]. In general, consider a randomly diluted network with a fraction f of retained synapses. Its output overlap can be shown to be equal to that of an undiluted network fed with noisy input patterns with an input overlap of V~. Hence using the principle of adaptation, the network with optimal robustness is attained by using a training overlap of V~. Fig. 3 shows the behaviour of the network with optimal robustness both in the entire weight space (curve RO), and in the restricted weight space which stores the patterns correctly before dilution (curve RR). For clipped dilution, the MSN is also most robust only at low degree of dilution (curve CM). At synaptic fraction f, the output overlap can be shown to be equivalent to an input overlap of V ~ 2 . Here M, = J" dz e-Z2/20(z 2 - B 2) zn/ ~/2"rr and M 0 = f. However, the principle of adaptation does not imply that the

0.8

0.6

0.4

0.2

O.

o.

I

I

0.2

0.4

f

I

I

0.6

o.a

1.

Fig. 3. T h e o u t p u t o v e r l a p at a = 1.5 f o r d i f f e r e n t t y p e s o f d i l u t i o n .

470

K.Y.M. Wong, D. Sherrington / Adaptive optimization in neural networks

network with training overlap V ~ 2 is optimally robust against clipping. This is because clipped dilution is not a r a n d o m process, but is strongly correlated with the information stored in the network. Nevertheless, the network with training overlap V ~ 2 (curve C O ) does improve its robustness when c o m p a r e d with the MSN. T h a t the curves A D and CM (or C O ) are distinct also clarifies another question. Since it has been found that the annealed diluted network has the s a m e distribution of synaptic weights as the clipped MSN [5], one is easily led to the conclusion that they are identical. H o w e v e r , we have demonstrated here that they are not.

References [1] [2] [3] [4] [5]

E. Gardner and B. Derrida J. Phys. A 21 (1988) 271. K.Y.M. Wong and D. Sherrington J. Phys. A 23 (1990) 4659. K.Y.M. Wong and M. Bouten, Europhys. Lett. 16 (1991) 525. B. Derrida, E. Gardner and A. Zippelius, Europhys. Lett. 4 (1987) 167. M. Bouten, A. Engel, A. Komoda and R. Serneels, J. Phys. A 23 (1990) 4643.