Progressive dimensionality reduction by transform for hyperspectral imagery

Progressive dimensionality reduction by transform for hyperspectral imagery

Pattern Recognition 44 (2011) 2760–2773 Contents lists available at ScienceDirect Pattern Recognition journal homepage: www.elsevier.com/locate/pr ...

2MB Sizes 0 Downloads 1 Views

Pattern Recognition 44 (2011) 2760–2773

Contents lists available at ScienceDirect

Pattern Recognition journal homepage: www.elsevier.com/locate/pr

Progressive dimensionality reduction by transform for hyperspectral imagery Chein-I Chang a,b,c,d,n, Haleh Safavi a a

Remote Sensing Signal and Image Processing Laboratory, Department of Computer Science and Electrical Engineering, University of Maryland, Baltimore County, Baltimore, MD 21250, USA Department of Electrical Engineering, National Chung Hsing University Taichung, Taiwan, ROC c Department of Computer Science and Information Management, Providence University Taichung, Taiwan, ROC d Center for Quantitative Imaging in Medicine, Taichung Veterans General Hospital, Taichung, Taiwan, ROC b

a r t i c l e i n f o

a b s t r a c t

Article history: Received 31 March 2010 Received in revised form 29 October 2010 Accepted 30 March 2011 Available online 8 April 2011

This paper develops to a new concept, called progressive dimensionality reduction by transform (PDRT), which is particularly designed to perform data dimensionality reduction in terms of progressive information preservation. In order to materialize the PRDT a key issue is to prioritize information contained in each spectral-transformed component so that all the spectral transformed components will be ranked in accordance with their information priorities. In doing so, projection pursuit (PP)-based dimensionality reduction by transform (DRT) techniques are developed for this purpose where the Projection Index (PI) is used to define the direction of interestingness of a PP-transformed component, referred to as projection index component (PIC). The information contained in a PIC is then calculated by the PI and used as the priority score of this particular PIC. Such a resultant PDRT is called progressive dimensionality reduction by projection index-based projection pursuit (PDR-PIPP) which performs PDRT by retaining an appropriate set of PICs for information preservation according to their priorities. Two procedures are further developed to carry out PDR-PIPP in a forward or a backward manner, referred to forward PDR-PIPP (FPDR-PIPP) or backward PDRT (BPDR-PIPP), respectively, where FPDRPIPP can be considered as progressive band expansion by starting with a minimum number of PICs and adding new PICs progressively according to their reduced priorities as opposed to BPDRT which can be regarded progressive band reduction by beginning with a maximum number of PICs and removing PICs with least priorities progressively. Both procedures are terminated when a stopping rule is satisfied. The advantages of PDR-PIPP allow users to transmit, communicate, process and store data more efficiently and effectively in the sense of retaining data integrity progressively. & 2011 Elsevier Ltd. All rights reserved.

Keywords: Backward progressive dimensionality reduction by PI-PP (BPDR-PIPP) Dimensionality reduction by transform (DRT) Forward progressive dimensionality reduction by PI-PP (FPDR-PIPP) Progressive dimensionality reduction by projection index-based projection pursuit (PDR-PIPP) Progressive dimensionality reduction by transform (PDRT) Projection index-based projection pursuit (PIPP)

1. Introduction Due to very high spectral resolution provided by hyperspectral imaging sensors, it is anticipated that the redundancy resulting from inter-band correlation should be also very high and can be removed without loss of significant information. In doing so, two approaches are generally adopted to perform spectral compression, Band Selection (BS) and Dimensionality Reduction by Transform (DRT). Compared to the BS which selects a partial subset of full bands for information preservation while completely discarding information of un-selected bands, the DRT performs data compaction on the entire set of bands via a transform and then preserves information from the new transformed data representation. In many applications DRT is generally preferred to BS, for example, endmember extraction [1,2] and data compression [3,4]. In this paper, a new concept, called progressive dimensionality reduction by transform (PDRT), is

n

Corresponding author. E-mail address: [email protected] (C.-I Chang).

0031-3203/$ - see front matter & 2011 Elsevier Ltd. All rights reserved. doi:10.1016/j.patcog.2011.03.030

introduced. It performs data dimensionality reduction via a transform progressively in terms of information preservation. The motivation of the PDRT arises from a need of processing vast amount of hyperspectral data in a more effective means in many applications, for example, satellite data processing and communication, data computational complexity, data archiving and transmission. The PDRT provides a feasible solution to meeting these demands. However, this is easier said than done. There are several issues needed to be addressed. The first and foremost is how to develop a credible DR transform that can compact the original data in a lower dimensional data space. A second issue is how to represent the original data in the 2D spatial-1D spectral data coordinate system in a dimensionality-reduced data space. Finally, a third and important issue is how to prioritize each of dimensions in the new data representation system so that all dimensions can be ranked by their priorities for PDRT. Here we use the commonly used DRT, principal components analysis (PCA) as an illustrate example to address these three issues. The PCA takes advantage of a data sample covariance matrix to de-correlate spectral correlation of the original data in

C.-I Chang, H. Safavi / Pattern Recognition 44 (2011) 2760–2773

spectral dimensions. It then transforms the original 3D data cube in a 2D spatial-1D spectral data coordinate system into a new data representation system formed by a set of principal components (PCs), each of which is specified by an eigenvector that is determined by a particular eigenvalue, i.e., a data variance. Finally, each PC is further prioritized by the magnitude of the eigenvalue that determines its associated eigenvector. In other words, each dimension in a PCA-transformed data space is actually a PC which is no longer a wavelength-specified spectral dimension. By virtue of the PCA the original data acquired by wavelength-based spectral dimensions can now be reduced to a small number of PCs specified by larger eigenvalues. The progressive DR is then carried out by adding or removing eigenvalues-prioritized PCs in a forward or a backward manner. While the PCA satisfies all the good and desired properties required by the PDRT as described above, other transforms do not, for example, the independent component analysis (ICA) [5]. There is a significant difference between PCA and ICA. First of all, the ICA does not have an equation similar to the characteristic polynomial equation obtained for the PCA from which the sample covariance matrix can be constructed to solve eigenvalues. In this case, it must rely on a numerical algorithm to find projection vectors which serve as the same purpose of eigenvectors for the PCA to produce ICs. In doing so, a general approach is to use random initial conditions for a designed numerical algorithm to converge to desired projection vectors. As a result of using different sets of random initial conditions, the generated components not only appear in a random order, but also are different even if they appear in the same order. Accordingly, the results produced by different runs using different sets of random initial conditions or by different users will be also different. This same issue is also encountered in the ISODATA (K-means) clustering method in [6]. Secondly, once ICs are generated, there is no guideline to rank ICs in a proper order in a similar fashion that the PCA uses eigenvalues to rank PCs. This is because there is no counterpart of eigenvalues used by the PCA in the ICA that can be used to prioritize the generated ICs. Finally, a key challenging issue is how to integrate the PCA and the ICA in a general setting which includes the PCA and ICA considered as special cases. Unless all these issues are resolved, the PDRT can be only applicable to the PCA. As a consequence, the potential of PDRT will be rather limited in many hyperspectral data exploitation applications where many targets of interest can be better and more effectively characterized by high-order statistics (HOS) than second order statistics. It has been shown in [7] that the PCA can only capture information characterized by second-order statistics and small targets could not be preserved in PCs but instead in independent components (ICs). This evidence was further confirmed by Ren et al. [8], where HOS was also able to detect subtle targets such as anomalies, small targets. Fortunately, such prioritization issue has been recently addressed for ICA in [5] and for HOS-based components in [7], where a concept of using priority scores to rank components was introduced to prioritize components according to the significance of the information contained in each component measured by a specific criterion. So, one goal of this paper is to develop an approach that can unify PCA, ICA and HOS in context of a more general framework for PDRT. Interestingly, projection pursuit (PP) developed by Friedman and Tukey [9] turns out to be a good candidate to serve this purpose. It was originally derived as a technique for exploratory analysis of multivariate data which makes use of so-called Projection Index (PI) as a criterion to specify a particular interesting direction. Such a PI-determined direction is then used as a projection vector that produces a component, referred to as Projection Index Component (PIC) which comprises of projections of all data sample vectors mapped onto this particular projection vector. The PP using PI to produce projection vectors is referred to

2761

as PIPP in this paper. So, if the PI is specified by data variance, the resulting data variance-PP becomes PCA. On the other hand, if the PI is specified by mutual information to measure statistical independence, the resulting mutual information-PP turns out to be ICA. Unfortunately, this approach suffers from a major issue that the PICs generated by PP are not consistent due to the use of random initial conditions to produce projection vectors. In order to cope with this problem, two approaches developed in [6] can be considered for component prioritization. One is referred to as PI-PRioritized PP (PI-PRPP) which utilizes the PI as a criterion to prioritize PICs produced by any arbitrary PIPP. In other words, due to the use of random initial conditions the PIPP-generated PICs generally appear in a random order. So, on the one end, if we run the same PIPP on the same data set at two different times, the two sets of resulting PICs will appear in different orders. On the other end, if two users run the same PIPP on the same data set, the resulting two sets of PICS will also appear in different orders as well. Therefore, one advantage of using a PI-PRPP is to allow users to rank and prioritize PICs regardless of what random initial condition is used. Despite that the PI-PRPP resolves the issue of random order in which PICs appear it does not necessarily imply that PIPP using two different sets of random initial conditions produces the PICs ranked in the same order even though the PICs at the same rank are very close in terms of their projection vectors. To further remedy this problem, another approach, referred to as Initialization-Driven PP (ID-PIPP) is to specify an appropriate set of initial conditions for PIPP. In this case, the PIPP uses the same initial condition all the time and thus, it always produces identical PICs in the same order. By means of the PI-PRPP and ID-PIPP the DR via PIPP can be accomplished by retaining a small number of components whose priorities are ranked by PI-PRPP via a specific prioritization criterion. After PICs are prioritized, PDR-PIPP which uses PIPP as a DR transform can be performed as band expansion in a forward manner or band reduction in a backward manner, referred to forward PDR-PIPP (FPDR-PIPP) or backward PDR-PIPP (BPDR-PIPP), respectively. The FPDR-PIPP starts with a minimum number of PICs, minPIC such as, minPIC ¼1 and then increases one PIC at a time by a step size, nsize, for example, nsize ¼1 progressively until it reaches a stopping rule which can be determined by a particular application. To the contrary, BPDR-PIPP begins with a maximum number of PICs, maxPIC such as, maxPIC ¼L which is a total number of spectral bands and then decreases one PIC at a time by a step size, nsize for example, nsize ¼1 progressively until it reaches a stopping rule which is also determined by a particular application. In other words, in order to carry out DR in a progressive manner, the PDR-PIPP first represents each PIC by its associated projection vector pointing to a particular interesting direction. It then prioritizes PICs according to priority scores calculated by a customdesigned criterion to measure the significance of the information contained in each of PICs. Finally, the PDR-PIPP is performed progressively in either a forward manner by adding more PICs so as to achieve band expansion or a backward manner by removing current PICs so as to achieve band reduction. In light of FPDR-PIPP or BPDR-PIPP the issue of determining number of PICs is essentially determined by various applications in which case there is no need to know exactly how many PICs needed to be retained.

2. Projection pursuit-based component analysis for dimensionality reduction Dimensionality reduction by transform (DRT) is an important preprocessing technique that represents multi-dimensional data in a lower dimensional data space without significant loss of desired data information. A common approach which has been

2762

C.-I Chang, H. Safavi / Pattern Recognition 44 (2011) 2760–2773

widely used in many applications such as data compression is the PCA which represents data to be processed in a new data space whose dimensions are specified by eigenvectors in descending order of eigenvalues. Another example is the ICA which represents data to be processed in a new data space whose dimensions are specified by a set of statistically independent components. This section extends PCA and ICA to a general setting via a Projection Index (PI)-based dimensionality reduction technique, referred to as PI-based Project Pursuit (PIPP). The term of ‘‘Projection Pursuit (PP)’’ was first coined by Friedman and Tukey [9] which was used as a technique for exploratory analysis of multivariate data. It designs a PI to explore projections of interestingness by projecting a high dimensional data set into a low dimensional data space while retaining the information of interest in the data. More specifically, assume that there are N data points fxn gN n1 each with dimensionality K and X ¼ ½x1 x2    xN  is a K  N data matrix and w is a K-dimensional column vector which serves as a desired direction. Then wTX represents an N-dimensional row vector that is the orthogonal projections of all sample data points mapped onto the direction w. Now if we let H(  ) is a function measuring the degree of the interestingness of the projection wTX for a fixed data matrix X, a Projection Index (PI) is a real-valued function of w, I(w): RK-R defined by IðwÞ ¼ HðwT XÞ:

ð1Þ fwj gJj ¼ 1 .

The PI can be easily extended to multiple directions, In this case, W ¼ ½w1 w2    wJ  is a K  J projection direction matrix and the corresponding projection index is also a realvalued function, I(W):RK  J-R is given by IðWÞ ¼ HðWT XÞ:

ð2Þ

The choice of the H(  ) in (1) and (2) is application-dependent. Its purpose is to reveal interesting structures within data sets such as clustering. However, finding an optimal projection matrix W in (2) is not a simple matter [8]. In this paper, we focus on PIs which are specified by statistics of high orders such as skewness, kurtosis, etc. [10]. Assume that the ith projection index-projected component can be described by a random variable zi with values taken by the gray level value of the nth pixel denoted by zin (z¼wTX). Since a random variable is completely determined by moments specified by statistics of all orders via a moment generation equation, it suffices to design PIPP with the PI specified by various statistics of all orders. In doing so a similar approach to that proposed in [8,10] can be also derived for PI specified by the kth-order orders of statistics (i.e., kth moment) by solving the following eigenproblem ðE½ri ðrTi wÞk2 rTi l IÞw ¼ 0: 0

ð3Þ

It should be noted that when k¼2, 3, 4 in (3) is then reduced to variance, skenwness and kurtosis, respectively. Due to unavailability of finding analytic solution to (3), a numerical algorithm described in the following is further developed by finding a sequence of projection vectors that will converge to the solution to (3). Projection-Index Projection Pursuit (PIPP) 1. Initially, assume that X ¼ ½r1 r2    rN  is data matrix and a PI is specified. 2. Find the first projection vector w1 by maximizing the PI. 3. Using the found w1 , generate the first projection image Z1 ¼ ðw1 ÞT X ¼ fz1i 9z1i ¼ ðw1 ÞT ri g which can be used to detect the first projection vector. 4. Apply the orthogonal subspace projector (OSP) specified by ? Pw ¼ Iw1 ðwT1 w1 Þ1 wT1 to the data set X to produce the first 1 ? OSP-projected data set denoted by X1, X1 ¼ Pw X. 1 5. Use the data set X1 and find the second projection vector w2 by maximizing the same PI again.

? 6. Apply Pw ¼ Iw2 ðwT2 w2 Þ1 wT2 to the data set X1 to produce the 2 ? second OSP-projected data set denoted by X2, X2 ¼ Pw X1 2 which can be used to produce the third projection vector w3 by maximizing the same PI again. Or equivalently, we define a ? matrix projection matrix W2 ¼[w1w2] and apply PW 2 ¼ ? IW2 ððW2 ÞT W2 Þ1 ðW2 ÞT to the data set X to obtain X2 ¼ PW 2 X.

7. Repeat the procedure of steps 5 and 6 over and over again to produce w3 ,. . .,wk until a stopping criterion is met. It should be noted that a stopping criterion can be either a predetermined number of projection vectors required to be generated or a pre-determined threshold for the difference between two consecutive projection vectors.

3. Projection Index-based prioritized PP According to the PIPP described in Section 2 a vector is randomly generated as an initial condition to produce each of its desired projection vectors that are used to generate PICs. As a consequence, a different randomly generated initial condition may converge to a different projection vector which also results in a different PIC. In other words, if the PIPP is performed in different times or by different users, the resulting final PICs will also be different due to the use of different sets of random vectors. In order to correct this problem, this section presents a PI-based Prioritized PP (PI-PRPP) which also uses a PI as a prioritization criterion to rank PIPP-generated PICs so that all PICs will be prioritized in accordance with the priorities measured by the PI. In this case, the PICs will be always ranked and prioritized by the PI in the same order regardless of what initial vectors are used to produce projection vectors. It should be noted that there is a major distinction between PIPP and PI-PRPP. While the PIPP uses a PI as a criterion to produce a desired projection vector for each of PICs, the PI-PRPP uses a PI to prioritize PIPP-generated PICs. Therefore, the PI used in both PP and PI-PRPP is not necessarily the same. In other words, the PI used to prioritize PICs can be different from the PI used to generate the PICs. As a matter of fact, on many occasions, different PIs can be used in applications. In what follows, we describe various criteria which use statistics beyond the second order and can be used to define a PI. Projection Index (PI)-based criteria 1. Sample mean of third order statistics: skewness for zj. PIskewness ðPICj Þ ¼ ½k3j 2 3

ð4Þ PKN

where k3j ¼ E½zj  ¼ ð1=KNÞ n ¼ 1 ðzjn Þ3 is the sample mean of the third order of statistics in the PICj. 2. Sample mean of fourth order statistics: kurtosis for zi. PIkurtosis ðPICj Þ ¼ ½k4j 2

ð5Þ

P 4 j 4 where k4j ¼ E½zj  ¼ ð1=KNÞ KN n ¼ 1 ðzn Þ is the sample mean of the fourth order of statistics in the PICj. 3. Sample mean of kth order statistics: kth moments for zj. PIk-moment ðPICj Þ ¼ ½kkj 2

ð6Þ

P k j k where kkj ¼ E½zj  ¼ ð1=KNÞ KN n ¼ 1 ðzn Þ is the sample mean of the kth moment of statistics in the PICj. 4. Negentropy: combination of third and fourth orders of statistics for zj. PInegentropy ðPICj Þ ¼ ð1=12Þ½k3j 2 þð1=48Þ½k4j 32

ð7Þ

C.-I Chang, H. Safavi / Pattern Recognition 44 (2011) 2760–2773

It should be note that (7) is taken from (5.35) in Hyvarinen et al. [5, p. 115], which is used to measure the negentropy by high-order statistics. 5. Entropy PIentropy ðPICj Þ ¼ 

XKN j¼1

pji log pj

ð8Þ

where pj ¼ ðpj1 ,pj2 ,. . .,pjKN ÞT is the probability distribution derived from the image histogram of PICi. 6. Information divergence (ID) XKN p logðpji =qi Þ ð9Þ PIID ðPICj Þ ¼ j ¼ 1 ji where pj ¼ ðpj1 ,pj2 ,. . .,pjKN ÞT is the probability distribution derived from the image histogram of PICi and qj ¼ ðqj1 ,qj2 ,. . .,qjKN ÞT is the Gaussian probability distribution with the mean and variance calculated from PICi..

4. Initialization-driven PIPP The PI-PRPP in Section 3 intended to remedy the issue that PICs can appear in a random order due to the use of randomly generated initial vectors. The PI-PRPP allows users to prioritize PICs according to the information significance measured by a specific PI. Despite the fact that the PICs ranked by PI-PRPP may appear in the same order independent of different sets of random initial conditions, they are not necessarily identical because the slight discrepancy in two corresponding PICs in the same appearing order may be caused by randomness introduced by their used initial conditions. Although such a variation may be minor compared to different appearing orders of PICs without prioritization, the inconsistency may still cause difficulty in data analysis. Therefore, this section further develops a new approach, called Initialization-Drive PP (ID-PIPP) which custom-designs an initialization algorithm to produce a specific set of initial conditions for PIPP so that the same initial condition is used whenever PIPP is implemented. Therefore, the ID-PIPP-generated PICs are always identical. When a particular initial algorithm, say A, is used to produce a specific initial set of vectors for the ID-PIPP to generate PICs, the resulting ID-PIPP is referred to as A-ID-PIPP. One good candidate algorithm that can be used for this purpose is the automatic target generation process (ATGP) previously developed in [11]. It makes use of an orthogonal subspace projector defined in [12] by PU? ¼ IUU# : #

ð10Þ T

1

T

where U ¼ ðU UÞ U is the pseudo-inverse of the U, repeatedly to find target pixel vectors of interest from the data without prior knowledge regardless of what types of pixels are these targets. Details of implementing the ATGP can be described in the following steps. Automatic target generation process (ATGP) (1) Initial condition: Let L be the total number of spectral bands. Select an initial target pixel vector of interest denoted by t0. In order to initialize the ATGP without knowing t0, we select a target pixel vector with the maximum length as the initial target t0, namely, t0 ¼ argfmaxr rT rg, which has the highest response, i.e. the brightest pixel vector in the image scene. Set n ¼1 and U0 ¼[t0]. (It is worth noting that this selection may not be necessarily the best selection. However, according to our experiments it was found that the brightest pixel vector was always

2763

extracted later on, provided that it was not selected as an initial target pixel vector in the initialization.) (2) At nth iteration, apply Pt?0 via (10) to all image pixels r in the image and find the nth target t n generated at the nth stage which has the maximum orthogonal projection as follows: n h io ? ? tn ¼ arg maxr ðP½U rÞT ðP½U rÞ n1 tn  n1 tn 

ð11Þ

where Un1 ¼ ½t1 t2    tn1  is the target matrix generated at the (n  1)st stage. (3) Stopping rule: If noL  1, let Un ¼ ½Un1 tn  ¼ ½t1 t2    tn  be the nth target matrix, go to step 2. Otherwise, continue. (4) At this stage, the ATGP is terminated. At this point, the target matrix is UL  1, which contains L  1 target pixel vectors as its column vectors, which do not include the initial target pixel vector t0. As a result of the ATGP, the final set of L target pixel vectors produced by the ATGP at step 4, SATGP ¼ ft0 ,t1 ,t2 ,    ,tL1 g ¼ ft0 g [ ft1 ,t2 ,    ,tL1 g will be used as an initial set of vectors to produce L PICs where each of the L target pixels in SATGP is used to generate a particular PIC. An ID-PIPP using the ATGP as its initialization algorithm is called ATGP-ID-PIPP.

5. Progressive dimensionality reduction by PIPP There are two key issues to implementing PDR, viz., how to interpret and represent dimensions after DR and how to prioritize DR-transformed dimensions. Unlike DR by band selection which retains original spectral bands, the dimensions resulting from DRT are actually transformed components, each of which contains specific information preserved by a transformation of all the spectral bands via an information measure. Accordingly, a transformed component is no longer a spectral band but rather a combination of weighted spectral bands by a transform. This section presents an approach to PDRT where the transformation used to perform DR is the PIPP in which case a transformed component is a PIC specified by a PI-based projection vector. The resulting PDRT is referred to as PDR-PIPP. In this case, prioritizing PICs is equivalent to prioritizing their associated projection vectors. By virtue of prioritization of PICs two dual processes of PDR-PIPP can be developed as follows. 5.1. Forward progressive dimensionality reduction by PIPP (FPDR-PIPP) One process of implementing the PDR-PIPP is called forward progressive dimensionality reduction by PIPP (FPDR-PIPP) which can be considered as progressive band expansion. It starts with any number of PICs, denoted by minPIC, and then gradually adds more PICs to increase information required by data processing. The PICs to be added are prioritized in a descending order of their priority scores which can be calculated by a custom-designed PI criterion such as those described by (4)–(9) in Section 3. Theoretically, the FPDR-PIPP can begin with the first PIC with the largest priority score and then add new PICs according to their decreasing priority scores until a certain stopping rule is reached when the procedure of FPRD-PIPP is terminated. Practically, this may not be necessary. According to [7] the virtual dimensionality (VD), a new concept recently developed in [15,16] can be used as a reasonable lower bound on the number of PICs to begin with by setting minPIC ¼VD. On the other hand, the number of PICs, nsize to

2764

C.-I Chang, H. Safavi / Pattern Recognition 44 (2011) 2760–2773

be added each time need not be limited to one, i.e., nsize ¼1 and can vary such as two or three, etc., i.e., nsize ¼ 2,3,. . .. FPDR-PIPP process

progressively in m times are

1. Initial conditions: nsize ¼d and L¼total number of spectral bands. 2. Determine the VD and set minpic ¼ VD. 3. Generate L prioritized PICs, denoted by fPICj gLj ¼ 1 by either PI-PRPP and ID-PIPP, both of which described in Sections 3 and 4, respectively, where the index j ¼ 1,2,. . .,L has been prioritized. 4. Start with fPICj gVD and progressively increase the number of 

where L-md¼VD if L mdrVD.

j ¼1

PICs by adding d new un-selected PICs to the current selected set of PICs with d highest priority scores at a time. That is, let m be the number of times in progression. Then the PICs selected progressively in m times are þ ðm1Þd þd þ md fPICj gVD ,fPICj gVD fPICj gVD ,. . .,fPICj gVD j ¼ 1 j ¼ 1 j ¼ 1 j ¼ 1

ð12Þ

where VD þmd¼L if VDþmdZL. Three comments are noteworthy. 1. The step 3 in the FPDR-PIPP by PRPP is actually carried out by two substeps. 3.1. Generate L PICs by PIPP, denoted by fPICj gLj ¼ 1 where PI is pre-determined for PP. 3.2. Prioritize PICs by a custom-designed PI, denoted by fPICj gLj ¼ 1 where the index j ¼ 1,2,. . .,L has been prioritized by the PI. 2. In order to implement ID-PIPP in the step 3 in the FPDR-PIPP, an initialization driven algorithm needs to be specified. A general recommendation is to use the ATGP for this purpose. 3. According to our extensive experiments based on various applications [17], the upper bound on the maximum number of PICs to be selected can be empirically set to twice VD, i.e., 2VD and nsize ¼1.

5.2. Backward progressive dimensionality reduction by PIPP (BPDRPIPP) As a complete opposite process of the FPDR-PIPP, the backward progressive dimensionality reduction by PIPP (BPDR-PIPP) presented in this section works in a reverse manner. It can be viewed as progressive band reduction which begins with a maximum number of PICs, denoted by maxPIC, and then gradually removes PICs from the selected set of PICs with the lowest priority scores until a stopping rule is met when the process is then terminated. Furthermore, the number of PICs, nsize to be removed each time need not be limited to one, i.e., nsize ¼1 and can vary such as two or three, etc., i.e., nsize ¼ 2,3,. . .. A detailed implementation of the BPDR-PIPP is described in the following. BPDR-PIPP process 1. Initial conditions: nsize ¼d, L¼total number of spectral bands and maxpic ¼L. 2. Determine the VD. 3. Generate L prioritized PICs, denoted by fPICj gLj ¼ 1 by either PIPRPP and ID-PIPP, both of which described in Sections 3 and 4, respectively, where the index j ¼ 1,2,. . .,L has been prioritized. 4. Start with fPICj gLj ¼ 1 and progressively decrease the number of PICs by removing d PICs from the current selected set of PICs with d lowest priority scores at a time. That is, let m be the number of times in progression. Then the PICs selected

fPICj gLj ¼ 1 ,fPICj gLd ,fPICj gLðm1Þd ,. . .,fPICj gVD j ¼ 1 j ¼ 1 j ¼ 1

ð13Þ

It should be noted that like the FPDR-PIPP the VD can be also used for the same purpose by setting a lower bound on the minimum number of PICs to be selected by the BPDR-PIPP which can be empirically set to VD, i.e., VD and nsize ¼1. There is a fundamental difference between the commonly used ‘‘sequential’’ process and ‘‘progressive’’ process. A progressive process is one using previous results to gradually improve or reduce performance depending upon applications. A sequential process is one carrying out data one after another in sequence. As a simple example for illustration, assume that there is an 8-bit image. We can represent this image by using progressive processing via the bit-plane coding by first representing the image with the most significant bit and then beginning to improve image quality progressively by adding more bits until the least bit is used to completely represent the image. So, in this case, there are 8 stages to represent images progressively and the previous images are part of subsequently generated images. This progressive process is terminated when it completes 8 progressive stages. On the other hand, we can also represent the image a sequential process with using 8-bit full pixel resolution pixel-by-pixel which begins with the first pixel using 8 bits, second pixel with 8 bits, etc., until it reaches the last pixel with 8 bits. So, this process is carried out pixel by pixel and line by line with 8-bit coding and completed until it reaches the last image pixel. So, in progressive processing the progressive nature is determined by 8 stages where the entire image is improved from low-to-high resolution. How many stages required to be completed is determined by how fine resolution is needed. In sequential processing the sequential nature is determined by the number of pixels to be processed. The entire image cannot be completed until all the image pixels are processed. The web image is represented by such a sequential image representation. Therefore, the idea of the proposed PDRT is quite different from that derived from sequential forward selection (SFS) and sequential backward selection (SBS) developed for feature selection in [13,14] where the latter must re-solve optimization problems each time once a feature is added to or removed from a given pre-selected set of features compared to the former which always adds new PICS to or removes existing PICs from the previously selected set of PICs in progression.

6. Real hyperspectral image experiments The proof-of-concept of PDRT requires applications to justify its effectiveness. For this reason a commonly used linear spectral unmixing in hyperspectral image classification is chosen for this purpose. 6.1. HYDICE data experiments A real image scene collected by HYperspectral Digital Imagery Collection Experiments (HYDICE) was used for experiments in this section to demonstrate the utility of the PIPP approaches proposed in this paper. The image scene to be studied for experiments is an image scene shown in Fig. 1(a), which has a size of 64  64 pixel vectors with 15 panels in the scene and the ground truth map in Fig. 1(b). It was acquired by 210 spectral bands with a spectral coverage from 0.4 to 2.5 mm. Low signal/high noise bands: bands 1–3 and bands 202–210;

C.-I Chang, H. Safavi / Pattern Recognition 44 (2011) 2760–2773

2765

p11, p12, p13 interferer

p311, p312, p32, p33

grass

p211, p22, p23 p221

trees

p411, p412, p42, p43

road

p511, p52, p53 p521 8000 7000 6000 5000 4000 3000 2000 1000 0

p1 p2 p3 p4 p5

0 20 40 60 80 100120140160180 Fig. 1. (a) A HYDICE panel scene with 9 signatures identified by prior knowledge via the ground truth given in (b) which contains 15 panels with ground truth map of spatial locations of the 15 panels. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.).

and water vapor absorption bands: bands 101–112 and bands 137–153 were removed. So, a total of 169 bands were used in experiments. The spatial resolution is 1.56 m and spectral resolution is 10 nm. Within the scene in Fig. 1(a), there is a large grass field background, and a forest on the left edge. Each element in this matrix is a square panel and denoted by pij with rows indexed by i and columns indexed by j¼1,2,3. For each row i ¼ 1,2,. . .,5, there are three panels pi1, pi2, pi3, painted by the same paint but with three different sizes. The sizes of the panels in the first, second and third columns are 3 m  3 m and 2 m  2 m and 1 m  1 m, respectively. Since the size of the panels in the third column is 1 m  1 m, they cannot be seen visually from Fig. 1(a) due to the fact that its size is less than the 1.56 m pixel resolution. For each column j¼1,2,3, the 5 panels, p1j, p2j, p3j, p4j, p5j have the same size but with five different paints. However, it should be noted that the panels in rows 2 and 3 were made by the same material with two different paints. Similarly, it is also the case for panels in rows 4 and 5. Nevertheless, they were still considered as different panels but our experiments will demonstrate that detecting panels in row 5 (row 4) may also have effect on detection of panels in row 2 (row 3). The 1.56 m-spatial resolution of the image scene suggests that most of the 15 panels are one pixel in size except that p21, p31, p41, p51 which are two-pixel panels, denoted by p211, p221, p311, p312, p411, p412, p511, p521. Fig. 1(b) shows the precise spatial locations of these 15 panels where red pixels (R pixels) are the panel center pixels and the pixels in yellow (Y pixels) are panel pixels mixed with the background. Fig. 1(c) plots the five panel spectral signatures pi for i ¼ 1,2,. . .,5 obtained by averaging R pixels in the 3 m  3 m and 2 m  2 m panels in row i in Fig. 1(b). It should be noted the R pixels in the 1 m  1 m panels are not included because they are not pure pixels, mainly due to that fact that the spatial resolution of the R pixels in the 1 m  1 m panels is 1 m smaller than the pixel resolution is 1.56 m. These panel signatures along with the R pixels in the 3 m  3 m and 2 m  2 m panels were used as required prior target knowledge for the following comparative studies. In order to perform PDR-PIPP on this image scene, we first estimate the range of feasible values of dimensionality needed to be retained after DR, q. In doing so, a new concept, called virtual

dimensionality (VD) [7,15,16] which was developed to estimate the number of spectrally distinct signatures was used to estimate the q. In this case, the VD-estimated value for the HYDICE scene in Fig. 1(a) was 9 with false alarm probability PF greater than or equal to 10  4. So, if we interpret a fact that a spectrally distinct signature can only be specified by one spectral dimension, this implies that it requires at least 9 dimensions, i.e., transformed components to accommodate 9 spectrally distinct signatures. In other words, the lower bound to the value of q can be set to 9. According to the ground truth as well as visual inspection, there are also at least 9 signatures present in the scene shown in Fig. 1(a). Additionally, since DR is a preprocessing technique it needs an application to evaluate its performance. In this case, the linear spectral unmxing (LSU) [15] is used for performance evaluation where the fully constrained least squares (FCLS) method in [15] is used as a supervised LSU technique where the signature matrix consists of the five panel signatures obtained from Fig. 1(a) plus the other four background signatures obtained by prior knowledge as the areas marked in Fig. 1(a) as interferer, tree, grass, road to make up 4 BKG signatures. These signatures represent 9 signatures with 9 estimated by the VD. This fact provides further evidence that VD is an effective estimation method to estimate the number of signatures used for LSU, but not necessarily represents accurate number of signatures. Nevertheless, the VD provides a good lower bound to the q. The PDRT is proposed to address this issue when the value of q is not known exactly, but can use the VD-estimated value to produce a feasible range of the q in which case an empirical range of the q would be in the range from VD¼9 to 2VD¼18. Therefore, in the following experiments the PDR-PIPP is implemented with q ranging from 5 to 18. where the value of q was back tracked to q¼5 which represents five panel signatures to see how changes are made when the value of q was smaller than VD¼ 9. To simplify experiments, only the PDR-ID-PIPP was used for experiments where the PIPP was implemented by the FastICA [18] with the ATGP used as the initialization algorithm. The resulting PDR-ID-PIPP is referred to as PDR-ATGP-FastICA where the PI is mutual information but has been shown in [5, Eq. (5.35), p. 115] that it can be approximated by combination of the 3rd and 4th

2766

C.-I Chang, H. Safavi / Pattern Recognition 44 (2011) 2760–2773

signatures in 8th and 9th ICs. Now using the prioritized 18 ICs in Fig. 2 we can form 14 reduced image cubes with their dimensionality ranging from 5 to 18 to be used as data cubes for FCLS to perform LSU. Fig. 3(a)–(n) shows the classification results produced by the FCLS operating on the image cubes formed by ICs from 5 to 18. For example, Figs. 3(a) and (n) are the FCLS classification results of 15 panels in five rows using the image cube formed by the first five ICs and the entire 18 ICs in Fig. 3, respectively.

order of statistics specified by (7). Then the PDR-ATGP-FastICA performed PDR via the ATGP-FastICA according to (12) with q¼5 growing to q¼18. Fig. 2 shows the first 18 Independent Components (ICs) produced by the ATGP-FastICA where the 18 ICs generated by the FastICA were prioritized by their initial projection vectors specified by the ATGP. As we can see from Fig. 2, all the panel pixels in five rows were separated in the first 6 ICs by the FastICA with the interferers separated in the 1st and 7th ICs, and background

IC1

IC6

IC11

IC2

IC7

IC12

IC16

IC3

IC4

IC5

IC8

IC9

IC10

IC13

IC14

IC15

IC17

IC18

Fig. 2. First 18 prioritized ICs produced by ATGP-Factice.

panels in row 1

panels in row 2

panels in row 3

panels in row 4

panels in row 5

panels in row 1

panels in row 2

panels in row 3

panels inrow 4

panels in row 5

Fig. 3. FCLS-classification results using image cubes obtained by PDR from q ¼5 to 18.

C.-I Chang, H. Safavi / Pattern Recognition 44 (2011) 2760–2773

2767

panels in row 1

panels in row 2

panels in row 3

panels in row 4

panels in row 5

panels in row 1

panels in row 2

panels in row 3

panels in row 4

panels in row 5

panels in row 1

panels in row 2

panels in row 3

panels in row 4

panels in row 5

panels in row 1

panels in row 2

panels in row 3

panels in row 4

panels in row 5

panels in row 1

panels in row 2

panels in row 3

panels in row 4

panels in row 5

panels in row 1

panels in row 2

panels in row 3

panels in row 4

panels in row 5

panels in row 1

panels in row 2

panels in row 3

panels in row 4

panels in row 5

Fig. 3. (continued)

The results in Fig. 3(a)–(n) reveal that q¼ 8 is the least number of dimensions to make FCLS work effectively if the FPDR-ATGP-FastICA was performed beginning with q¼5 and nsize ¼1. When q is less than 8, the FCLS-classification results were not good. To the contrary, as q increases from 8 to 18 the FCLS consistently performed well in terms of 15-panel classification and there was no visible difference among all the classification results. This suggested that q¼8 was the minimum number of ICs to be retained after DR, which is very close

to the value of 9 estimated by VD if the FPDR-ATGP-FastICA started with q¼9. On the other hand, if the BPDR-ATGP-FastICA was performed backward starting from 2VD¼18 with nsize ¼ 1 down to q¼5, the results in Fig. 3 also suggested that 8 would be the best minimum number of ICs required to be retained. To further confirm this finding, Table 1 tabulates the FCLS-quantization results for q¼8 and 9 where the results obtained for q¼9 are included in parentheses. As shown in Table 1, the results obtained for q¼8 and 9 by FCLS

2768

C.-I Chang, H. Safavi / Pattern Recognition 44 (2011) 2760–2773

panels in row 1

panels in row 2

panels in row 3

panels in row 4

panels in row 5

panels in row 1

panels in row 2

panels in row 3

panels in row 4

panels in row 5

panels in row 1

panels in row 2

panels in row 3

panels in row 4

panels in row 5

panels in row 1

panels in row 2

panels in row 3

panels in row 4

panels in row 5

panels in row 1

panels in row 2

panels in row 3

panels in row 4

panels in row 5

Fig. 3. (continued)

were very close and nearly identical. This implies that there is no additional gain by including one more dimension. 6.2. AVIRIS data experiments As another example, an Airborne Visible InfraRed Imaging Spectrometer (AVIRIS) image data was also used for experiments. It is the Purdue Indiana Indian Pine site and has 20 m-spatial resolution collected by 224 bands using 10 nm-spectral resolution in the range of 0.4 and 2.5 mm with size of 145  145 pixel vectors taken from an area of mixed agriculture and forestry in Northwestern Indiana, USA. The data set is available at website [19] and was recorded in June 1992 with 220 bands with water absorption bands, bands 104–108 and 150–162 removed and leaving only 202 bands. Fig. 4(a) shows 9 bands selected from the website and a USGS Quadrangle map of the test site provided in Fig. 4(b). According to the ground truth provided in Fig. 4(c) there are 17 classes in this image scene shown in Fig. 4(d) including the

background labeled by class 17 which includes a wide variety of targets such as highways, railroad, houses/buildings, vegetation that may not be of interest in agriculture applications. The number in a parenthesis after a class label in Fig. 4(d) is the total number of data samples in that particular class. The total number of data samples in the scene is 145  145¼ 21,025. Table 2 lists labels of each of 17 classes where the numeral in the parenthesis under each of 17 classes in Fig. 4(c) is the number of data samples in that particular class. This particular scene is probably one of most studied test sites in hyperspectral data exploitation. Due to its 20 m raw spatial resolution, most data sample vectors in this scene are heavily mixed in which case finding an appropriate set of endmembers to perform spectral unmixing presents a great challenge to hyperspectral image analysts. Accordingly, the maximum likelihood classifier (MLC) has been widely used for this particular scene. Here the MLC was also used for this purpose. Assume that Sj represents a training sample set selected for the jth class and [jSj

C.-I Chang, H. Safavi / Pattern Recognition 44 (2011) 2760–2773

2769

Table 1 FCLS quantification for ATGP-Factice-produced image cubes with q ¼8 (q¼ 9). Panels in row 1 1 0.6172 0.2413 0 0 0.0067 0 0 0 0.0126 0.0045 0 0 0.0258 0 0 0 0.0375 0

p11 p12 p13 p211 p212 p22 p23 p311 p312 p32 p33 p411 p412 p42 p43 p511 p512 p52 p53

(0.6174) (2413)

(0.071)

(0..013) (0.0047)

(0.258)

(0.0377)

Panels in row 2

Panels in row 3 0 0 0.0606 (0.0606) 1 1 0.9432 (0.944) 0.2862 (0.286) 0.0024 (0.0026) 0 0 0 0 0 0 0 0.005 (0.0054) 0 0 0.0256 (0.0255)

0 0.0382 (0.038) 0 0 0 0 0.0229 (0.0232) 0.9976 (0.9974) 1 0.663 (0.6627) 0.414 (0.4139) 0 0 0.0149 (0.149) 0.0036 (0.0037) 0 0 0.0388 (0.387) 0

is the entire training set. The mean and sample covariance matrix P of [jSj are calculated by l ¼ ð1=9[j Sj 9Þ r A [j Sj r and 1 X ðrlÞðrlÞT ð13Þ K¼ r A [j Sj 9[j Sj 9 where 9A9 is defined as the number of elements in the set A. Furthermore, let the mean of the jth class calculated from Sj be P given by lj ¼ ð1=9Sj 9Þ r A Sj r. Then for each data sample vector r, the MLC denoted by dMLC(r) used for the following experiments is defined by

dMLC ðrÞ ¼ j

ð14Þ

which assigns the data sample vector r to the class j* with the j* found by n o ð15Þ j ¼ arg minj ðrlj ÞT K1 ðrlj Þ It should be noted that when the MLC is implemented, there are three different ways to calculate the sample covariance matrix K in (15). The simplest one is the global sample covariance matrix which is calculated based on the entire data sample vectors. The second one is to calculate class sample covariance matrices for each of class by

Kj ¼

1 X ðrlj Þðrlj ÞT : r A Sj 9Sj 9

Panels in row 4

ð16Þ

The third one is the one defined in (13) which uses the entire training sample covariance matrix. The reason to use (13) instead of (16) is the fact that in some cases, Kj in (16) is ill ranked due to singularity when the training set Sj is too small compared to the total number of spectral bands. In general, such a case seldom occurs in multispectral imagery but does happen very often when it comes to hyperspectral imagery such as class 1, class 7, class 9 and class 16 in Fig. 4. So, (13) is the best compromise between the global sample covariance matrix and class sample covariance matrices (16). However, if the training samples are appropriately selected, the results using (13) are very similar to those obtained by the global sample covariance matrix. This is the case in our experiments because 50% of data sample vectors in each class were selected as training samples. In analogous to HYDICE experiments, the PDR-PIPP was implemented to produce PICs. The MLC defined by (14) and (15) was used to classify each class where a 50–50% cross validation was used for performance analysis in which case 50% of data sample

Panels in row 5 0 0 0.0432 (0.0432) 0 0 0 0.0072 (0.0072) 0 0 0.0259 (0.0257) 0.0121 (0.0118) 1 0.9873 (0.987) 0.8792 (0.8791) 0.2131 (0.2131) 0 0 0.022 (0.0219) 0

0 0 0 0 0 0.0058 0 0 0 0 0 0 0.0127 0 0 0.8617 1 0.9008 0.1605

(0.0068)

(0.013)

(0.8621) (0.9012) (0.1603)

vectors in each class was randomly selected to be used for training and the other half was used for testing. Only16 classes were used for classification since there is no ground truth provided about the background, class 17. The classification rate of each class by MLC is calculated by the number of correct classified data sample vectors in each class divided by total number of that class according to the ground truth. Fig. 5 plots the classification rates of 16 classes produced by two dual processes of PDR-PIPP, FPDR-ATGP-FastICA and BPDR-ATGP-FastICA with nsize ¼1 where PI/PI in the legends are referred to projection index used to produce PICs/projection index used to prioritize PICs. In order to see how the number of dimensions affects the classification performance, the number of prioritized dimensions was run from 1 to the entire number of bands, 202 bands. As shown in Fig. 5 it is very clear that different classes require different number of dimensions to achieve their best performance in classification. For example, the most easiest cases are class 5 and class 13 where only a small number of prioritized dimensions (less than 10) could achieve very high classification rates. To the contrary, the most difficult cases are class 7 and class, both of which have only very few data sample vectors, 26 for class 7 and 20 for class 9. This is mainly due to the fact that the sample size of these two classes is too small to constitute reliable statistics to be used by the MLC. This is particularly evidential when the number of used prioritized dimensions was increased and the performance was actually degraded. Other than these extreme cases there are two cases of interest. One is the cases that the classification rates never improved and saturated after a certain number of used prioritized dimensions, for example, classes 2, 5, 6, 8, 13, 14 and 16. Another case is that the classification kept improving as the number of used prioritized dimensions was increased such as classes 3, 11 and 12. The above experiments provide such evidence that each class does have its own difficulty in classification. Over the past years we have seen that many works published based on this particular data set have selected certain classes to perform supervised classification without offering compelling reasons. It is believed that our proposed PDR offers good answers. Additionally, there are also many other observations worth being discussed based on the results in Fig. 5. Finally, Fig. 5 also allows users to see how FPDR-PIPP or BPDR-PIPP performs if the prioritized dimensions are gradually added in a forward manner or removed in a backward manner, respectively. These advantages cannot be offered by the traditional DR.

2770

C.-I Chang, H. Safavi / Pattern Recognition 44 (2011) 2760–2773

Band 8

Band 16

Band 27

Band 70

Band 39

Band 86

Band 46

Band 136

3

Band 186

15

3 510 12 2

15 3

14 14

2

4 12 3

11

17

16 12

10

17

6 9

2

2

2 11 10

8

11

11 17

6 13

6

17

6

5 14

3 10

class 1 (54)

class 2 (1434)

class 7 (26)

class 13 (212)

class 3 (834)

class 8 (489)

class 9 (20)

class 14 (1294)

class 4 (234)

class 10 (968)

class 15 (380)

class 5 (497)

class 6 (747)

class 11 (2468)

class 12 (614)

class 16 (95)

class 17 (10659)

Fig. 4. AVIRIS image scene: Purdue Indiana Pine test site. (a) 9 bands selected from the Purdue Indiana Pine test site, (b) a USGS Quadrangle map of the test site, (c) Ground truth of Purdue Indiana Pine test site and (d) Ground truth of each of 17 classes.

Table 2 Labels of 17 classes. Class Class Class Class Class Class

1 2 3 4 5 6

Alfalfa Corn-no till Corn-min Corn Grass/pasture Grass/trees

Class Class Class Class Class Class

7 8 9 10 11 12

Grass/pasture-mowed Hay-windrowed Oats Soybeans-no till Soybeans-min Soybeans-clean

Class Class Class Class Class

13 14 15 16 17

Wheat Woods Bldg-grass-green-drives Stone-steel towers Background

C.-I Chang, H. Safavi / Pattern Recognition 44 (2011) 2760–2773

7. Conclusions Dimensionality reduction by transform (DRT) has been widely used for various applications in data processing. This paper presents a new approach to DRT, called progressive DRT (PDRT) which can perform DRT progressively according to priorities assigned to transformed components. The transform used for PDRT is called Projection Index-based Projection Pursuit (PIPP) and is particularly designed based on the concept of Projection Pursuit (PP) whose direction of interestingness is specified by a Projection Index (PI). The transformed components resulting from the PIPP is called projection index components (PICs). Within the framework of the PIPP the commonly used PCA and ICA can be considered as its special cases where the PICs are reduced to principal components (PCs) and independent components (ICs), respectively. Since the PIPP generally uses random initial conditions to generate PICs, the PIPP-generated PICs may appear in a random order and do not necessarily indicate that an earlier-generated PIC is more significant than a subsequently generated PIC. In order to remedy this problem, two prioritized versions of PIPP, referred to as PI-Prioritized PP (PI-PRPP) and Initialization-Driven PIPP (ID-PIPP), are further developed for this purpose. While the PI-PRPP uses a selected PI as a criterion to prioritize its generated PICs, the ID-PIPP takes advantage of an initialization-driven algorithm to produce an appropriate set of 100

initial conditions which prioritize each of PICs it generates. By virtue of these two prioritized PIPP algorithms implementing PDR-PIPP becomes feasible where two dual processes PDR-PIPP, referred to as Forward Progressive Dimensionality Reduction by PIPP (FPDR-PIPP) and Backward Progressive Dimensionality Reduction by PIPP (BPDR-PIPP) are proposed to implement the PDR-PIPP to increase or decrease number of PICs. None of these two processes was investigated in the past. In order to reduce computational complexity, the feasible range of PICs is further narrowed by using a recently developed concept, called Virtual Dimensionality (VD) which can be used to estimate a lower bound and an upper bound on the number of PICs used by the FPDR-PIPP and BPDR-PIPP, respectively. The experimental results conducted two different hyperspectral images demonstrate two applications of PDRT that cannot be accomplished by traditional fixed-size dimensionality DR. Due to its progressive nature in DR PDRT is able to illustrate the fact that each target signature in HYDICE data has its own discriminatory difficulty in spectral unmixing via HYDICE data and each class in the Purdue Indian Pine data has also its own different degree of difficulty in class classification. This evidence demonstrates that the proposed PDRT offers advantages over the commonly used fixed size dimensionality DR for hyperspectral image analysis in many aspects such as generalizability of traditional component analysis by PIPP, reduction of computational complexity due to the use of dimensionality prioritization, various

80

60

60

40

20

0 0

skewness/skewness skewness/negentropy negentropy/skewness negentropy/negentropy PCA PCA/skewness PCA/Negentropy 50 100 150 200 Number of Prioritized Dimensions

60 50 40 30 20 10 0 0

250

50 Classification Rate (%)

Classification Rate (%)

Classification Rate (%)

70

80

skewness/skewness skewness/negentropy negentropy/skewness negentropy/negentropy PCA PCA/skewness PCA/Negentropy 50 100 150 200 Number of Prioritized Dimensions

40 skewness/skewness skewness/negentropy negentropy/skewness negentropy/negentropy PCA PCA/skewness PCA/Negentropy

30 20 10 0 0

250

50 100 150 200 Number of Prioritized Dimensions

250

100

70

100

2771

20

skewness/skewness skewness/negentropy negentropy/skewness negentropy/negentropy PCA PCA/skewness PCA/Negentropy 50 100 150 200 Number of Prioritized Dimensions

50 40 30 20 10 0 0

250

Classification Rate (%)

40

0 0

skewness/skewness skewness/negentropy negentropy/skewness negentropy/negentropy PCA PCA/skewness PCA/Negentropy 50 100 150 200 Number of Prioritized Dimensions

80

60

20

100

100

100

80

80

80

60

40

20

0 0

skewness/skewness skewness/negentropy negentropy/skewness negentropy/negentropy PCA PCA/skewness PCA/Negentropy 50 100 150 200 Number of Prioritized Dimensions

250

60

40

20

0 0

skewness/skewness skewness/negentropy negentropy/skewness negentropy/negentropy PCA PCA/skewness PCA/Negentropy 50 100 150 200 Number of Prioritized Dimensions

250

skewness/skewness skewness/negentropy negentropy/skewness negentropy/negentropy PCA PCA/skewness PCA/Negentropy

40

0 0

250

Classification Rate (%)

Classification Rate (%)

Classification Rate (%)

60

Classification Rate (%)

Classification Rate (%)

60 80

50 100 150 200 Number of Prioritized Dimensions

250

60 skewness/skewness skewness/negentropy negentropy/skewness negentropy/negentropy PCA PCA/skewness PCA/Negentropy

40

20

0 0

50 100 150 200 Number of Prioritized Dimensions

Fig. 5. 16-class classification rates versus the number of PDR-PIPP-prioritized dimensions used by MLC.

250

2772

C.-I Chang, H. Safavi / Pattern Recognition 44 (2011) 2760–2773

20

70

skewness/skewness skewness/negentropy negentropy/skewness negentropy/negentropy PCA PCA/skewness PCA/Negentropy 50 100 150 200 Number of Prioritized Dimensions

40 skewness/skewness skewness/negentropy negentropy/skewness negentropy/negentropy PCA PCA/skewness PCA/Negentropy

30 20 10 0 0

250

Classification Rate (%)

Classification Rate (%)

60

40

80

50

80

0 0

50 100 150 200 Number of Prioritized Dimensions

60 50 40 30 20 10 0 0

250

100

100

100

80

80

80

60

40

20

0 0

skewness/skewness skewness/negentropy negentropy/skewness negentropy/negentropy PCA PCA/skewness PCA/Negentropy 50 100 150 200 Number of Prioritized Dimensions

60 skewness/skewness skewness/negentropy negentropy/skewness negentropy/negentropy PCA PCA/skewness PCA/Negentropy

40

20

0 0

250

Classification Rate (%)

Classification Rate (%)

60

Classification Rate (%)

Classification Rate (%)

100

50 100 150 200 Number of Prioritized Dimensions

250

skewness/skewness skewness/negentropy negentropy/skewness negentropy/negentropy PCA PCA/skewness PCA/Negentropy 50 100 150 200 Number of Prioritized Dimensions

250

60

40

20

0 0

skewness/skewness skewness/negentropy negentropy/skewness negentropy/negentropy PCA PCA/skewness PCA/Negentropy 50 100 150 200 Number of Prioritized Dimensions

250

Classification Rate (%)

100

80

60 skewness/skewness skewness/negentropy negentropy/skewness negentropy/negentropy PCA PCA/skewness PCA/Negentropy

40

20

0 0

50 100 150 200 Number of Prioritized Dimensions

250

Fig. 5. (continued)

levels of discrimination and classification of target signatures and classes through two dual processes of PDRT. References [1] J.W. Boardman, F.A. Kruse, R.O. Green, Mapping target signatures via partial unmixing of AVIRIS data, Summaries of JPL Airborne Earth Science Workshop, Pasadena, CA, 1995. [2] M.E. Winter, N-finder: an algorithm for fast autonomous spectral endmember determination in hyperspectral data, Image Spectrometry V, Proceedings of the SPIE 3753 (1999) 266–277. [3] G. Motta, F. Rizzo, J. Storer, Hyperspectral Data Compression, Springer-Verlag, 2006. [4] B. Ramakishna, A. Plaza, C.-I Chang, H. Ren, Q. Du, C.-C. Chang, Spectral/ SPATIAL hyperspectral image compression, in: G. Motta, F. Rizzo, J. Storer (Eds.), Hyperspectral Data Compression, Springer-Verlag, 2006, pp. 309–347. [5] A. Hyvarinen, J. Karhunen, E. Oja, Independent Component Analysis, John Wiley & Sons, 2001. [6] R.O. Duda, P.E. Hart, Pattern Classification and Scene Analysis, John Wiley & Sons, 1973. [7] J. Wang, C.-I Chang, Independent component analysis-based dimensionality reduction with applications in hyperspectral image analysis, IEEE Transactions on Geoscience and Remote Sensing 44 (6) (2006) 1586–1600. [8] H. Ren, Q. Du, J. Wang, C.-I Chang, J. Jensen, Automatic target recognition hyperspectral imagery using high order statistics, IEEE Transactions on Aerospace and Electronic Systems 42 (4) (2006) 1372–1385.

[9] J.H. Friedman, J.W. Tukey, A projection pursuit algorithm for exploratory data analysis, IEEE Transactions on Computers C-23 (9) (1974) 881–889. [10] S. Chu, H. Ren, C.-I Chang, High order statistics-based approaches to endmember extraction for hyperspectral imagery, SPIE 6966, 2008. [11] H. Ren, C.-I Chang, Automatic spectral target recognition in hyperspectral imagery, IEEE Transactions on Aerospace and Electronic Systems 39 (4) (2003) 1232–1249. [12] J. Harsanyi, C.-I Chang, Hyperspectral image classification and dimensionality reduction: an orthogonal subspace projection approach, IEEE Transactions on Geoscience and Remote Sensing 32 (4) (1994) 779–785. [13] P. Pudil, J. Novovicova, J. Kittler, Floating search methods in feature selection, Pattern Recognition Letters 15 (1994) 1119–1125. [14] S.B. Serpicoand, L. Bruzzone, A new search algorithm for feature selection in hyperspectral remote sensing images, IEEE Transactions on Geoscience and Remote Sensing 39 (7) (2001) 1360–1367. [15] C.-I Chang, Hyperspectral Imaging: Techniques for Spectral Detection and Classification, Kluwer Academic/Plenum Publishers, New York, 2003. [16] C.-I Chang, Q. Du, Estimation of number of spectrally distinct signal sources in hyperspectral imagery, IEEE Transactions on Geoscience and Remote Sensing 42 (3) (2004) 608–619. [17] H. Safavi, Applications of Projection Pursuit in Hyperspectral Image Analysis, Ph.D. Dissertation, Department of Computer Science and Electrical Engineering, University of Maryland, Baltimore County, MD, May 2010. [18] A. Hyvarinen, E. Oja, A fast fixed-point for independent component analysis, Neural Computation 9 (7) (1997) 1483–1492. [19] /http://cobweb.ecn.purdue.edu/  biehl/MultiSpec/documentation.htmlS.

C.-I Chang, H. Safavi / Pattern Recognition 44 (2011) 2760–2773

2773

Chein-I Chang received his B.S. degree from Soochow University, Taipei, Taiwan, M.S. degree from the Institute of Mathematics at National Tsing Hua University, Hsinchu, Taiwan and M.A. degree from the State University of New York at Stony Brook, all in mathematics. He also received his M.S., M.S.E.E. degrees from the University of Illinois at Urbana-Champaign and Ph.D. degree in electrical engineering from the University of Maryland, College Park. Dr. Chang has been with the University of Maryland, Baltimore County (UMBC) since 1987 and is currently a professor in the Department of Computer Science and Electrical Engineering. He was a visiting research specialist in the Institute of Information Engineering at the National Cheng Kung University, Tainan, Taiwan, from 1994 to 1995. He received an National Research Council (NRC) senior research associateship award from 2002 to 2003 sponsored by the US Army Soldier and Biological Chemical Command, Edgewood Chemical and Biological Center, Aberdeen Proving Ground, Maryland. Additionally, Dr. Chang was a distinguished lecturer chair at the National Chung Hsing University sponsored by the Department of Education in Taiwan, ROC from 2005 to 2006 and is currently holding a chair professorship with the Environmental Restoration and Disaster Reduction Research Center and Department of Electrical Engineering, National Chung Hsing University, Taichung, Taiwan, ROC. He was also a keynote speaker for the 2008 International Symposium on Spectral Sensing Research (ISSSR) in 2008 and will be a plenary speaker for SPIE Opticsþ Applications, Remote Sensing Symposium, 2009. He has four patents and several pending on hyperspectral image processing. He was the guest editor of a special issue of the Journal of High Speed Networks on Telemedicine and Applications (April 2000) and co-guest editor of another special issue of the same journal on Broadband Multimedia Sensor Networks in Healthcare Applications, April 2007. He is also co-guest editor of a special issue on High Performance Computing of Hyperspectral Imaging for International Journal of High Performance Computing Applications, December 2007 and special issue on Signal Processing and System Design in Health Care Applications for EURASIP Journal on Advanced in Signal Processing, 2009. Dr. Chang has authored a book, Hyperspectral Imaging: Techniques for Spectral Detection and Classification published by Kluwer Academic Publishers in 2003 and edited two books, Recent Advances in Hyperspectral Signal and Image Processing, Trivandrum, Kerala: Research Signpost, Trasworld Research Network, India, 2006 and Hyperspectral Data Exploitation: Theory and Applications, John Wiley & Sons, 2007 and co-edited with A. Plaza a book on High Performance Computing in Remote Sensing, CRC Press, 2007. He is currently working on two books, Hyperspectral Imaging: Signal Processing Algorithm Design and Analysis, John Wiley & Sons, 2010 and Real Time Hyperspectral Image Processing: Algorithm Architecture and Implementation, Springer-Vela, 2012. Dr. Chang was an Associate Editor in the area of hyperspectral signal processing for IEEE Transaction on Geoscience and Remote Sensing 2001–2007. He is a Fellow of IEEE and SPIE and a member of Phi Kappa Phi and Beta Kappa Nu.