Semi-supervised classification for hyperspectral imagery based on spatial-spectral Label Propagation

Semi-supervised classification for hyperspectral imagery based on spatial-spectral Label Propagation

ISPRS Journal of Photogrammetry and Remote Sensing 97 (2014) 123–137 Contents lists available at ScienceDirect ISPRS Journal of Photogrammetry and R...

9MB Sizes 9 Downloads 82 Views

ISPRS Journal of Photogrammetry and Remote Sensing 97 (2014) 123–137

Contents lists available at ScienceDirect

ISPRS Journal of Photogrammetry and Remote Sensing journal homepage: www.elsevier.com/locate/isprsjprs

Semi-supervised classification for hyperspectral imagery based on spatial-spectral Label Propagation Liguo Wang a,⇑, Siyuan Hao a, Qunming Wang b,⇑, Ying Wang a a b

College of Information and Communications Engineering, Harbin Engineering University, Harbin, Heilongjiang 150001, China Department of Land Surveying and Geo-Informatics, The Hong Kong Polytechnic University, Kowloon, Hong Kong

a r t i c l e

i n f o

Article history: Received 14 March 2014 Received in revised form 18 August 2014 Accepted 22 August 2014

Keywords: Hyperspectral imagery Semi-supervised classification Spatial-spectral graph Label Propagation Adaptive method Gabor filter

a b s t r a c t Graph-based classification algorithms have gained increasing attention in semi-supervised classification. Nevertheless, the graph cannot fully represent the inherent spatial distribution of the data. In this paper, a new classification methodology based on the spatial-spectral Label Propagation is proposed for semisupervised classification of hyperspectral imagery. The spatial information was used in two aspects: on the one hand, the spatial features extracted by a 2-D Gabor filter were stacked with spectral features; on the other hand, the width of the Gaussian function, which was used to construct graph, was determined with an adaptive method. Subsequently, the unlabeled samples from the spatial neighbors of the labeled samples were selected and the spatial graph was constructed based on spatial smoothness. Finally, labels were propagated from labeled samples to unlabeled samples with spatial-spectral graph to update the training set for a basic classifier (e.g., Support Vector Machine, SVM). Experiments on four hyperspectral datasets show that the proposed Spatial-Spectral Label Propagation based on the SVM (SSLPSVM) can effectively represent the spatial information in the framework of semi-supervised learning and consistently produces greater classification accuracy than the standard SVM, the Laplacian Support Vector Machine (LapSVM), Transductive Support Vector Machine (TSVM) and the Spatial-Contextual Semi-Supervised Support Vector Machine (SCS3VM). Ó 2014 Published by Elsevier B.V. on behalf of International Society for Photogrammetry and Remote Sensing, Inc. (ISPRS).

1. Introduction In remote sensing image classification, traditional supervised learning always requires a number of labeled samples to train classifier. However, the training data collection is generally laborious or expensive (Bovolo et al., 2010). For hyperspectral imagery, Hughes phenomenon (Hughes, 1968) is caused by the unbalance between the high dimensionality of hyperspectral data and the limited labeled training samples in real analysis scenarios. The limited ground-truth samples are always not sufficient for reliable estimate of the classifier parameters. The semi-supervised algorithm is an effective approach to overcome the problem of smallsize labeled samples in high dimensionality data classification (Shahshahani and Landgrebe, 1994). This type of algorithm jointly exploits labeled and unlabeled samples, and has attracted an increasing amount of interest in remote sensing recently.

⇑ Corresponding authors. E-mail addresses: [email protected] (L. Wang), [email protected] (S. Hao), [email protected] (Q. Wang).

Generally, semi-supervised learning can be divided into five groups: (1) Generative model (see, for instance, Krishnapuram et al. (2005), Li et al. (2013a,b)); It uses the labeled samples to construct a probabilistic model to predict the labels of unlabeled samples. (2) Self-training (Rosenberg et al., 2005); This group of algorithm uses the previous classification results to train the classifier iteratively. (3) Co-training (Johnson and Zhang, 2007); It trains two classifiers with the labeled samples in two independent subsets, and then selects the unlabeled samples with high reliability to train the other classifier separately. (4) Transductive Support Vector Machine (TSVM) (Joachims, 1999); TSVM maximizes the margin for the labeled and the unlabeled samples. (5) Graph-based method (Blum and Chawla, 2001); This method exploits labeled and unlabeled samples to construct graph, and thus, minimizing the energy function to predict the labels of unlabeled samples. Although these algorithms can improve the classification performance by the use of the distribution of the unlabeled samples, they suffer from some limitations (Bruzzone et al., 2006; Kim and Crawford, 2010). Specifically, Generative model is constructed based on strong assumptions (e.g., the training samples should follow the Gaussian distribution or other distributions); Self-training

http://dx.doi.org/10.1016/j.isprsjprs.2014.08.016 0924-2716/Ó 2014 Published by Elsevier B.V. on behalf of International Society for Photogrammetry and Remote Sensing, Inc. (ISPRS).

124

L. Wang et al. / ISPRS Journal of Photogrammetry and Remote Sensing 97 (2014) 123–137

suffers largely from the incorrect labels; Co-training requires that the feature set can be divided into two independent subsets, which is not always the case in practice; TSVM may lead to a local minimum because of the non-convex loss function; graph-based semisupervised algorithm is computationally demanding and cannot give a label to a new sample. These problems may limit their application in practical cases to some extent (Yang et al., 2014). Nevertheless, it is worth mentioning that graph-based semisupervised algorithm can provide a relatively high classification accuracy and has become an increasingly active method in pattern recognition (Belkin and Niyogi, 2005; Camps Valls et al., 2007; Gomez Chova et al., 2008). In Belkin et al. (2006), the Laplacian Support Vector Machine (LapSVM) was proposed to estimate the marginal probability distribution with Laplacian matrix. This method needs to calculate the inverse of an n  n matrix, where n is the size of the training set. This requires O(n3) time and O(n2) memory (Li et al., 2009). The training time was sharply reduced by the fast strategy of LapSVM proposed by Melacci and Belkin (2011), which uses Preconditioned Conjugate Gradient coupled with an early stopping criterion. Regarding Label Propagation, Zhu et al. (2003) combined Gaussian Random Field Model with the harmonic function to effectively exploit unlabeled samples. Cheng et al. (2009) proposed the sparse decomposition, which can reflect the neighborhood structure of samples, to construct the graph of Label Propagation. Karasuyama and Mamitsuka (2013) proposed a formulation with the sparsity property to obtain the optimal linear combination of multiple different graphs under the label propagation setting. Label Propagation can also be applied to the field of semi-supervised dimensionality reduction (Nie et al., 2011). At the same time, a growing number of scholars have taken the spatial information into account in hyperspectral imagery classification. Some mathematical tools are used to extract signal correlations in spatial-spectral domains. Morphological Profiles (MPs) were employed to extract spatial information and aid supervised learning as described in Palmason et al. (2005). MPs was improved by Benediktsson et al. (2005), and the Extended Morphological Profiles (EMPs) were introduced to characterize contextual information. Song et al. (2014) used the sparse representation classification framework to exploit the inherent low-dimensional structure of EMPs. Markov Random Fields and the parcels from a segmentation map have also been commonly used to extract spatial information (Geman and Geman, 1984; Richards and Jia, 2008; Li et al., 2012; Tarabalka et al., 2010a,b). Moreover, Gabor filter (Gabor, 1946), which can powerfully joint space-frequency and time–frequency resolutions for signal analysis, has been successfully used in hyperspectral imagery. Shi and Healey (2003) employed Gabor filters to extract texture features at different scales and orientations from hyperspectral imagery and higher classification accuracy was produced when these features were considered. Bau et al. (2010) introduced a 3-D Gabor filter to hyperspectral classification, which is capable of capturing the energy in spectral/spatial data at different orientations and scales. Shen and Jia (2011) presented a Pixel-based Hyperspectral Classification Approach using 3-D Gabor wavelets. In addition, much effort has been directed at developing contextual classifiers recently (Camps Valls et al., 2006; Bruzzone and Persello, 2009; Dopido et al., 2013). For instance, Camps Valls et al. (2006) applied the composite kernels to hyperspectral imagery classification. Kuo et al. (2010) proposed the Spatial-Contextual Semi-Supervised Support Vector Machine (SCS3VM) to incorporate the spatial information into the optimization process of standard SVM. Tarabalka et al. (2010a) used spatial information described by a Markov Random Field to update the classification results produced by the Probabilistic SVM. Another perspective on the development of contextual SVM was proposed by Negri et al. (2014), which used the contextual information to displace the separation

hyperplane defined by the traditional SVM. Niemeyer et al. (2014) integrated a Random Forest Classifier into a Conditional Random Field framework to address the task of the contextual classification of an airborne LiDAR point cloud. Motivated by the recent interest in improving the accuracy of spatial information classification, a novel hyperspectral imagery semi-supervised classification methodology based on spatialspectral Label Propagation is put forward in this paper. The main idea of the proposed methodology is to select unlabeled samples based on the spatial information to construct the spatial-spectral graph (in the framework of Label Propagation) for semi-supervised learning. Different from the conventional graph-based semisupervised algorithms, our methodology has the following three characteristics: (1) Combination of the graph-based semi-supervised classification with a basic classifier (e.g., SVM) can not only alleviate the computational complexity of the graph-based algorithm, but also can classify the new samples. (2) Extracting spatial features with a 2-D Gabor filter and constructing spatial graph based on spatial smoothness can provide a significant increase of the classification accuracy. (3) Determining the width of the Gaussian function for the spatial-spectral graph with the adaptive method can mitigate the effects of parameters on the graph-based semi-supervised algorithm. The main contributions of this paper lie in: (1) A new spatial-spectral graph that simultaneously considers the spectral and spatial information is proposed. (2) A new adaptive method is proposed to determine the width of the Gaussian function for the spatial-spectral graph. (3) Experimental results of the proposed SS-LPSVM algorithm on four well-known hyperspectral datasets are reported. The rest of the paper is organized as follows. Section 2 first briefly introduces the SVM and Gabor filter, and then describes details of the proposed SS-LPSVM strategy. Section 3 evaluates the effectiveness of the proposed algorithm using four hyperspectral datasets. Section 4 further discusses the experimental results and proposed SS-LPSVM. Finally, conclusions are drawn in Section 5. 2. Methodology 2.1. The framework of SVM Due to the advantage of dealing with non-linear problem and high dimensional data (Vapnik, 1995), SVM has been widely used in hyperspectral imagery classification. The goal of SVM is to find a hyperplane that has the largest distance to the nearest training samples of any class. The samples can be mapped into a higher dimensionality space through a non-linear mapping function (i.e., kernel function in SVM) for a better classification. Given a training set D = {(x1, y1), . . ., (xn, yn)} with n training samples, labeled by yi e { + 1,  1}, i = 1, . . ., n, the optimization problem of SVM is characterized as follows: n X 1 min kwk2 þ C ni w 2 i¼1

s:t: yi ðwT  uðxi Þ þ bÞ P 1  ni ni P 0; i ¼ 1; 2;    ; n;

ð1Þ

where w and b are the weight vector and threshold of decision function, ni is the slack variable measuring the degree of

125

L. Wang et al. / ISPRS Journal of Photogrammetry and Remote Sensing 97 (2014) 123–137

misclassification of xi, C is the penalty factor, and u(  ) denotes the non-linear mapping function. The Lagrange function of Eq. (1) is shown as follows:

Lðw; a; bÞ ¼

n n X X 1 kwk2 þ C ni  ai ½yi ðwT  uðxi Þ þ bÞ  1 þ ni  2 i¼1 i¼1



n X bi ni

2.3. The framework of Label Propagation

ð2Þ

i¼1

where ai and bi are Lagrange multipliers, and its dual form is achieved:

maxð a

n X

n X n 1X ai aj yi yj Kðxi ; xj ÞÞ 2 i¼1 j¼1

ai 

i¼1

s:t:

n X

ai yi ¼ 0

i¼1

0 6 ai 6 C; i ¼ 1; 2;    ; n

ð3Þ

T

where K(xi, xj) = u(xi) u(xj), and sample xi is the support vector for P ai > 0. a⁄ can be obtained by solving Eq. (3), and w ¼ ni¼1 ai yi xi . The decision function is shown as follows:

f ðxÞ ¼ sgn

a

 i yi Kðx; xi Þ

þb



where b is a real constant obtained by Kuhn–Tucher conditions: 

maxyi ¼1 ½ðw ÞT xi  þ minyi ¼1 ½ðw ÞT xi  : 2

ð5Þ

The actual classification problems are often multi-classification problems, but the standard SVM is originally formulated for binary classification problems. The ‘‘one against one’’ and ‘‘one against rest’’ strategies can be adopted to address multi-classification cases (Wang et al., 2005). We used the convenient ‘‘one against rest’’ strategy in this paper. 2.2. Gabor filter-based spatial features extraction It is important to consider spatial information in hyperspectral classification. Gabor transform fulfills human visual system and has been widely used to extract the texture features (Shen and Bai, 2006; Shen et al., 2007). In this paper, we take into account spectral information and spatial information to improve the classification performance. The Gaussian kernel function of a 2-D Gabor filter is:

 002 2 002  00 wðx0 ; y0 ; f ; h; c; r0 Þ ¼ exp  x þc02y cosð2pfx þ /Þ 2r

x00 ¼ x0 cos h þ y0 sin h 00

0

;

ð6Þ

0

y ¼ x sin h þ y cos h 0

where x and y0 represent the coordinates of a pixel in the image. f and h are the frequency and rotation of sinusoidal plane wave, 0 respectively. / is the phase of Gabor function, and r and c are the radius and orientation angle of Gaussian. To extract spatial feature from an image (I(x0 , y0 )), a set of Gabor filters are required:

pffiffiffip wp;q ðx0 ; y0 Þ ¼ wðx0 ; y0 ; f p ; hq ; c; r0 Þ; f p ¼ f max = 2 ; hq ¼ 6q p p ¼ 0; . . . ; P  1;

q ¼ 0; . . . ; Q  1

ð7Þ

where fmax is the maximum frequency of sinusoidal plane wave, and P and Q are the number of frequencies and rotations, respectively. The Gabor representation of hyperspectral imagery I(x0 , y0 ) can be obtained by convolving the image with the family of Gabor filters wp,q(x0 , y0 ):

Gp;q ðx0 ; y0 Þ ¼ Iðx0 ; y0 Þ  wp;q ðx0 ; y0 Þ;

min

X

2

ðf i  yi Þ þ 12

i2f1;2...;lg

X

2

Ww ij ðf i  f j Þ

i;j2f1;2...;lþug T

ð4Þ

;

i¼1

b ¼

The main idea of graph-based semi-supervised classification algorithm is to construct graph to assign class labels to unlabeled samples, by minimizing a defined energy function. Two conditions should be satisfied for the energy function: (1) the loss function is minimum, that is, the predicted labels of labeled samples should be as similar to the existing ones as possible; (2) the smoothness function is minimum, which means that two nearby samples are most likely to belong to the same class. Let the vector of labels be y = (yl, yu)T = {y1, y2, . . ., yl, yl+1, . . ., yl+u}, where yi and yu are the vectors composed of l and u labels of the labeled and unlabeled samples, respectively. The energy function of graph-based semi-supervised classification (Rohban and Rabiee, 2012) is: f

!

n X

where ‘‘⁄’’ represents the convolution operator. Fig. 1 presents the diagram that summarizes the steps of the 2-D Gabor filter for a hyperspectral image.

ð8Þ

¼ ðf l  yl Þ ðf l  yl Þ þ 12 f T Df

ð9Þ

D ¼ D  W w; In (9), f = (fl, fu)T is composed of fl and fu, which are the vectors of predicted class labels for labeled and unlabeled sets. D is called the Laplacian graph. The K-Nearest Neighbor method (KNN) can be employed to construct the spectral graph denoted as Ww:

8   < exp  kxi xj k2 if xj 2 NBw 2 w k ðxi Þ 2 e W ij ¼ : 0 otherwise i; j 2 1; 2; . . . ; l þ u 8X w W ij if i ¼ j < j Dij ¼ : 0 otherwise i; j 2 1; 2; . . . ; l þ u

ð10Þ

where e is the parameter to construct graph, and NBw k ðxi Þ is a set of k nearest neighbors of xi obtained by the Euclidean distance of spectral features. Label Propagation, a classical graph-based semi-supervised classification algorithm, propagates the labels from the labeled samples to unlabeled samples until all samples have been assigned with labels. The propagation procedure is shown in Fig. 2, where light gray and dark gray nodes are labeled samples for different classes, and hollow nodes are the unlabeled samples. The labels are propagated from the labeled samples to the unlabeled samples according to the probabilities on the arrows. For example, the unlabeled sample ‘‘1’’ has two probabilities propagated from two labeled samples ‘‘2’’ and ‘‘3’’, respectively, and its label is the same with the labeled sample ‘‘2’’ because of the larger probability. The propagation procedure will stop until all samples have been assigned with labels. P 2 RðlþuÞðlþuÞ is the Label Propagation probability matrix defined by P = D1Ww, and Pij is the Label Propagation probability from node i to node j. The Label Propagation probability   P ll P lu matrix can be divided into four sub-matrix as P ¼ , and P ul P uu the corresponding predicted labels of unlabeled samples using Label Propagation are:

f u ¼ ðI  P uu Þ1 P uu yl :

ð11Þ

2.4. Spatial-spectral graph-based Label Propagation In hyperspectral data processing, most graph-based semi-supervised classification algorithms always focus on spectral information

126

L. Wang et al. / ISPRS Journal of Photogrammetry and Remote Sensing 97 (2014) 123–137

Fig. 1. Diagram of the 2-D Gabor filter for spatial features extraction.

e ¼ ðdi dj =2Þ1=2 ;

0.25

where di = d(xi, xiM) is the Euclidean distance between xi and the Mth spectral neighbor of xi. di varies adaptively with the change of neighborhood distribution. The proposed SS-LPSVM algorithm can be summarized as follows:

0.45 0.12 0.58 0.46

0.41

0.39

0.21

0.71

2

0.56

Algorithm: SS-LPSVM 0.35

0.63 0.53

3

0.21

1

Fig. 2. The procedure of Label Propagation.

without the consideration of spatial information. There is an assumption for the spatial neighbors called the spatial smoothness (Yang et al., 2014), which holds when the spatial neighbors belong to the same class. In this paper, we construct the spatial graph based on the spatial smoothness, and propagate the labels using the spatial-spectral graph. The spatial graph Ws is defined as follows:

W sij

8   < exp  kxi xj k2 if 2 e2 ¼ : 0 otherwise

xj 2 NBsd ðxi Þ

ð12Þ

where NBsd ðxi Þ is a set of the spatial neighbors of xi in a spatial neighborhood system, whose width is d. In order to effectively use the spatial graph, we randomly select the spatial neighbors of the labeled samples as unlabeled samples. The spectral graph Ww is derived from Eq. (10). The spatial-spectral graph is obtained as follows:

W ¼ lW w þ ð1  lÞW s ;

ð14Þ

ð13Þ

where lı½0; 1 controls the influences of spatial and spectral graph. The performance of the graph-based semi-supervised classification depends on the quality of the graph. Meanwhile, as can be seen from Eqs. (10) and (12), the quality of the graph is highly influenced by parameter e. An improper e will lead to a low classification accuracy. Traditional algorithms estimate this parameter via repeated trials, thereby increasing the computational complexity. In order to mitigate the effects of parameter, we use an adaptive method instead:

Inputs: the labeled training set Di = {(x1, y1), . . ., xl, yl}, the weight (l), the parameter (M), the maximum number of spectral neighbors (k), the width of spatial neighborhood system (d), the number of spectral dimension (a), the number of spatial dimension (b), the randomly selected number of spatial neighbors of Dl (nu); 1. Extraction of spectral features: extract spectral features xw 2 Ra using PCA; 2. Extraction of spatial features: extract spatial features xs 2 Rb using a 2-D Gabor filter from the first principal component; concatenate xw and xs, x ¼ ðxw ; xs Þ 2 Raþb ; 3. Selection of unlabeled training set: extract spatial neighbors Uc of Dl in spatial neighborhood system d, and randomly select nu unlabeled samples from Uc; 4. Estimation of parameter: select parameter M and estimate e using the adaptive method; 5. Construction of spatial-spectral graph: construct the spectral graph Ww using the KNN method, construct the spatial graph Ws based on the spatial smoothness, construct spatial-spectral graph using Eq. (13) and calculate probability matrix P; 6. Prediction of the labels of unlabeled samples: predict the labels of the unlabeled samples exploiting Label Propagation f Pf using Eq. (11), and unlabeled training set is Du ¼ fðx1 ; y1 Þ; . . . ; ðxnu ; ynu Þg; 7. Classification: merge training set of Dl and Du, and then train SVM to finally predict the labels of the testing set.

3. Experiments 3.1. Experimental setup and assessment indices In the experiments, the performances of all tested algorithms were quantitatively compared using the overall accuracy (OA, in

L. Wang et al. / ISPRS Journal of Photogrammetry and Remote Sensing 97 (2014) 123–137

percent) and average accuracy (AA, in percent) (Congalton, 1991). The classification process were carried out using MATLAB 7.1 version on a computer equipped with an Intel Core i7 Processor at 3.40-GHz and carried out with 10 times and the averaged results and the standard deviation were reported to avoid biased estimation. In order to evaluate the performance of the proposed SS-LPSVM algorithm, the algorithm together with standard SVM, TSVM, a fast strategy of LapSVM and SCS3VM were tested on four hyperspectral datasets, as shown in Figs. 3–6 and described in Section 3.2. Let Di = {(x1, y1), . . ., xl, yl} be a set of labeled training set and l be the number of labeled training samples. The Gaussian kernel was adopted for the algorithms. Concerning the proposed SS-LPSVM algorithm, the number of spectral dimension (a) and spatial dimension (b) were set to 20 and 7, and according to the references (Shen and Bai, 2006; Shen et al., 2007) and the characteristics of hyperspectral imagery, we set the parameters for the Gabor filter as fmax = 0.25, P = 10, Q = 6, r0 = 1, / = 0, c = 1 to lead to the most satisfactory classification results in terms of accuracy. Standard SVM was selected as a benchmark supervised algorithm, and the penalty factor C and the width of Gaussian kernel function r were tuned via a 10-fold cross-validation, which gave r = 0.9 and C = 128. TSVM and LapSVM were selected as the benchmark semi-supervised algorithms. For TSVM, we used the 10-fold cross-validation to estimate C and r which were updated after 1 each iteration, and the Lagrange multiplier was k ¼ 2C . With respect to LapSVM, cA and cI are the parameters controls the complexity of the function in the associated reproducing kernel Hilbert space and the intrinsic geometry of the marginal distribution, respectively. They were varied with a step of one decade in the range of [104, 104] and Laplacian graph were calculated with 4 spectral neighbors. Simultaneously, the recently proposed SCS3VM was selected as a representative spatial information-based semi-supervised algorithm. The 3-order neighborhood system was adopted, and s, which controls the influences of spatial and spectral information, was set to 0.1. 3.2. Hyperspectral data description The first dataset was obtained by the Airborne Visible Infrared Imaging Spectrometer (AVIRIS) sensor over the Indian Pines region in Northwestern Indiana in 1992. The hyperspectral imagery contains 220 spectral bands and has a spatial size of 145  145 pixels, with each pixel measuring approximately 20 m by 20 m of the ground. 16 mutually exclusive classes containing from 20 to

(a)

127

2468 samples are included. 20 spectral bands were removed due to the noise and water absorption phenomena. The three-channel false-color composition and the reference land-cover of the AVIRIS data of Indian Pines are shown in Fig. 3. The second dataset was obtained by the Reflective Optics System Imaging Spectrometer (ROSIS) sensor during a flight campaign over the Engineering School at University of Pavia. Water absorption bands were removed, and the original 115 bands were reduced to 103 bands. Fig. 4 shows the three-channel false-color composition and the reference land-cover of the ROSIS data of University of Pavia, which mainly comprises 9 classes (i.e., asphalt, meadows, trees, metal sheets, bare soil, bitumen, bricks, shadows and gravel). The third dataset was collected by AVIRIS over Salinas Valley, Southern California, in 1998. It has a high spatial resolution of 3.7 m. The area contains a spatial size of 512  217 pixels and 206 spectral bands from 0.4 to 2.5 lm. We discarded the 20 water absorption bands. This area covers vegetables, bare soils, and vineyard fields and the groundtruth contains 16 classes. The threechannel false-color composition and reference land-cover map are shown in Fig. 5. The last dataset covers two dense residential areas in the Pavia city center: one is on a side of the river Ticino while the other is an open area on the other site. It was collected by ROSIS with a spatial size of 1096  715 pixels. 13 spectral bands have been removed due to noise. The remaining 102 spectral bands are retained for test. Fig. 6 shows the three-channel false-color composition and the reference land-cover, which mainly comprises 9 classes (i.e., water, tiles, meadows, trees, bitumen, bare soil, asphalt, bricks and shadows). 3.3. Experiment 1: The AVIRIS data of Indian Pines In the first experiment, we evaluated the influence of the spatial graph on the classification performance of the proposed SS-LPSVM algorithm using the AVIRIS data of Indian Pines. Initially, Di = {(x1, y1), . . ., xl, yl} of 400 labeled samples (25 samples for each class) was used, and then we selected the spatial neighbors of labeled training set (Uc). 200 unlabeled samples were selected from Uc. The maximum number of spectral neighbors (k) and the width of spatial neighborhood system (d) were set to 4 and 24, respectively. We can see the result in Fig. 7(b) is visually more satisfactory than Fig. 7(a) and Fig. 7(c). Compared with the reference land-cover, most classes are misclassified into the Alfalfa class in Fig. 7(a). In Fig. 7(c), some samples in the corn-no till class are misclassified

(b)

Soybeans -min till

Bldg -Grass -Tree -Drives

Corn -min till

Stone -steel towers

Soybeans -clean till

Grass /pasture -mowed

Corn -no till

Wheat

Soybeans -no till

Grass /pasture

Corn

Alfalfa

Hay -windrowed

Grass /trees

Oats

Woods

Fig. 3. The AVIRIS data of Indian Pines. (a) False-color composition (bands 17, 27, and 50 for RGB), (b) Reference land-cover.

128

L. Wang et al. / ISPRS Journal of Photogrammetry and Remote Sensing 97 (2014) 123–137

(a) Asphalt

Bricks

Gravel

(b) Meadows

Bitumen Metal sheets

Bare soil

Trees

Shadows

Fig. 4. The ROSIS data of University of Pavia. (a) False-color composition (bands 16, 27, and 45 for RGB), (b) Reference land-cover.

(a)

(b)

Lettuce_romaine_4_weeks

Corm_senesced_green_weeds

Grapes_untrained

Lettuce_romaine_5_weeks

Broccoli_green_weeks_1

Vineyard_vertical_trellis

Fallow

Lettuce_romaine_6_weeks Lettuce_romaine_7_weeks

Broccoli_green_weeds_2

Vineyard_untrained

Stubble

Soil_vineyard_develop

Fallow_rourh_plow

Celery

Fallow_smooth

Fig. 5. The AVIRIS data of Salinas Valley. (a) False-color composition (bands 16, 27, and 145 for RGB), (b) Reference land-cover.

into the soybeans-min till class, and the regions of the corn-min till and soybeans-min till are noisy. Statistics show that an OA of 84.11% is achieved when l = 0.01 is applied while 79.37% is achieved when l = 0.01 is applied: the accuracy decreased by 4.74% from l = 0.01 to l = 1. When entirely relied on the spatial graph with l = 0, the result of SS-LPSVM is poor, as shown in Fig. 7(a). It can be observed that spatial graph plays an important role in the proposed SS-LPSVM algorithm. In addition, Fig. 8 presents the OA curves of SS-LPSVM in relation to the weight (l) and the width of spatial neighborhood system (d). We selected 400 labeled samples (25 samples for each class) from the AVIRIS data of Indian Pines. The maximum number of neighbors (k) was set to 4. l and d were tuned with a set of {0.01, 0.1, 0.2, 0.3, 0.4, 0.5, 1} and a set of {4, 8, 16, 24}, respectively. Several observations can be made from Fig. 8: (1) Irrespective of d, the OA curve goes down with the increase of l. As a case for d = 24, the

OA decreases from 84.11% to 79.37% when l varies from 0.01 to 1, and the curve reaches the largest value around l = 0.01. (2) The OA curve roughly rises when d increases, except for the special case of a set of l e {0.01, 0.1, 0.4} for d = 4 and d = 8. This is because a larger d leads to a higher probability of nearby samples in the training set, that is, the more nonzero terms in the spatial graph, the better the performance. However, d = 4 and d = 8 are both relatively small. Fig. 9 presents the OA curve of SS-LPSVM in relation to the parameter M which determines the width of the Gaussian function e for graph. We selected 25 samples for each class from the AVIRIS data of Indian Pines as the labeled samples, and then 200 unlabeled samples were selected from Uc. The weight l, the maximum number of spectral neighbors (k) and the width of spatial neighborhood system (d) were set to 0.01, 4 and 24, respectively. The parameter M was varied in a set of {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10}. In Fig. 9, the

L. Wang et al. / ISPRS Journal of Photogrammetry and Remote Sensing 97 (2014) 123–137

(a) Tiles

Water

Meadows

129

(b) Trees

Bitumen

Bare soil

Asphalt

Bricks

Shadows

Fig. 6. The ROSIS data of Center of Pavia. (a) False-color composition (bands 16, 27, and 45 for RGB), (b) Reference land-cover.

(a)

(b)

(c)

Soybeans -clean till

Bldg -Grass -Tree -Drives Grass /pasture -mowed

Corn -min till Corn -no till

Soybeans -no till

Grass /pasture

Corn

Hay -windrowed

Grass /trees

Oats

Soybeans -min till

Stone -steel towers Wheat Alfalfa Woods

Fig. 7. Influence of the spatial graph on the performance of SS-LPSVM for the AVIRIS data of Indian Pines. (a) OA = 53.79% with l = 0, (b) OA = 84.11% with l = 0.01, (c) OA = 79.37% with l = 1.

curve rises when M increases, and finally stabilize when M falls within the interval [7, 10]. The OA can reach 84.11%. Thereby we fixed the parameter M to 7 in the following experiments. A summary of the OA of SS-LPSVM and the other semi-supervised algorithms in relation to s (i.e., the number of labeled samples for each class) is given in Table 1. We selected a labeled training set of 16s labeled samples from the AVIRIS data of Indian Pines and 200 unlabeled samples from Uc. The weight l, the parameter M, the maximum number of spectral neighbors (k) and the width of spatial neighborhood system (d) were set to 0.01, 7, 4 and 24, respectively. s varied in a set of {3, 5, 10, 15, 20, 25}, and other parameters of the compared algorithms were the same as those in Section 3.1. Checking the values in Table 1, we can see clearly that TSVM, LapSVM and SCS3VM improve the performance of standard SVM, and their classification accuracy is generally proportional to s. Secondly, TSVM can achieve higher classification accuracy than SS-LPSVM when s e {3, 5}. Moreover, the proposed SS-LPSVM algorithm produces consistently higher accuracy than the other semi-supervised classification algorithms with s e {10, 15, 20, 25}. For example, at s = 15, the OA of the proposed SS-LPSVM algorithm is 20.18% greater than that of the

standard SVM. In the table, the optimal OA of the proposed SSLPSVM algorithm can reach 84.11%. Meanwhile, the AA and individual class classification accuracy averaged over ten runs for the five algorithms are tabulated in Table 2. We set s = 25, and 10 samples were selected as labeled samples for the classes with small numbers of samples. Other parameters were set as above. As seen from Table 2, the proposed SS-LPSVM algorithm obtains greater AA, with gains of 16.35%, 15.13%, 12.70% and 9.32% over SVM, LapSVM, TSVM and SCS3VM, respectively. Moreover, SS-LPSVM can produce greater classification accuracy than the other algorithms for individual class, specifically for the Corn-no till, Bldg-Grass-Tree-Drives and Soybeans-min till class. For illustrative purposes, the classification maps are shown in Fig. 10. Compared with Fig. 10(f), the noises in Fig. 10(b)–(e) are more obvious, and Fig. 10(f) is the closest to the reference land-cover in Fig. 10(a). Some regions of corn-no till, corm-min till and bldg-grass-tree-drives are clearly misclassified by standard SVM, TSVM, SCS3VM and LapSVM, but are correctly classified by the proposed SS-LPSVM. Finally, in order to evaluate the computational complexity of SS-LPSVM, we report the computational time of each algorithm

130

L. Wang et al. / ISPRS Journal of Photogrammetry and Remote Sensing 97 (2014) 123–137

3.4. Experiment 2: The ROSIS data of University of Pavia

85 d=4

Clas s ific ation ov erall ac c rac y :O A (% )

d=8 d=16

84

d=24

83

82

81

80

79 0.01

0.1

0.2

0.3

0.4

0.5

1

miu Fig. 8. Influence of l and d on the performance of SS-LPSVM for the AVIRIS data of Indian Pines.

Clas s ific ation ov erall ac c rac y :O A (% )

85

80

75

70

65

60

0

1

2

3

4

5

6

7

8

9

10

M Fig. 9. Influence of M on the performance of SS-LPSVM for the AVIRIS data of Indian Pines.

in Table 3. SS-LPSVM took 193.80s, a time longer than 0.39s of LapSVM and 30.14s of TSVM. This is because SS-LPSVM is more time consuming to extract the spatial information and classify the newly generated unlabeled samples. However, SS-LPSVM can save 266.57s in comparison with SCS3VM.

In the second experiment, the proposed SS-LPSVM algorithm was evaluated using the hyperspectral data covering University of Pavia. Parameters were set to the same values as in Experiment 1. Fig. 11 shows the influence of the spatial graph on SS-LPSVM for the ROSIS data of University of Pavia. As can be seen, Fig. 11(b) is visually more accurate than Fig. 11(a) and (c), especially for the meadows class. This figure clearly shows that the greatest classification result is obtained with l = 0.2. The OA increases from 73.86% to 86.02% when considering the spatial graph. These results again confirm the importance of spatial graph in SS-LPSVM. Furthermore, Fig. 12 shows the OA curves of SS-LPSVM with respect to the weight (l) and the width of spatial neighborhood system (d) for the ROSIS data of University of Pavia. As shown in Fig. 12, the curves rise as l increases to l = 0.2, and afterwards, they decrease gradually as l increases. The greatest classification accuracy can be obtained at around l = 0.2. Moreover, the accuracy of SS-LPSVM increases gradually with the increase of d. When l = 0.2, the OA increases by 0.58% when d increases from 4 to 24. Table 4 shows the OA for standard SVM, LapSVM, TSVM, SCS3VM and SS-LPSVM with different s for the ROSIS data of University of Pavia. According to Table 4, apart from some slight fluctuations, the accuracy of the proposed SS-LPSVM and the benchmark algorithms generally increase as s increases. The proposed SS-LPSVM, LapSVM, TSVM and SCS3VM can produce greater classification accuracy than the standard SVM at the same s. However, the degrees of the improvement of LapSVM, TSVM and SCS3VM are smaller compared with SS-LPSVM. For example, at s = 25, the OA of LapSVM, TSVM and SCS3VM increased by 0.87%, 1.95% and 1.44%, but the improvement of SS-LPSVM is more obvious (over 17%). Especially, for s = 3, in which the OA of SS-LPSVM are improved by 21.02% in comparison with that of the standard SVM. The SS-LPSVM can provide consistently higher accuracy than LapSVM, TSVM and SCS3VM, and has a rise of 18.35% in OA as s increases from 3 to 25. The experimental results reveal that the proposed SS-LPSVM can produce high classification accuracy with a small number of labeled samples, and outperforms the compared semi-supervised algorithms. Table 5 reports the AA and the individual class classification accuracy for different algorithms when applied to the ROSIS data of University of Pavia. The proposed SS-LPSVM can obtain the greatest AA as well as the individual class classification accuracy for most classes. However, TSVM and LapSVM can produce greater accuracy for the asphalt and bricks class, respectively. The advantage of the SS-LPSVM can also be illustrated by the visual comparison of the classification maps in Fig. 13. Focusing on the classification result of meadow, the noises in Fig 13(b)–(e) are more obvious and Fig. 13(f) is more accurate. 3.5. Experiment 3: The AVIRIS data of Salinas Valley and ROSIS data of Center of Pavia In the last experiment, we used the AVIRIS data of Salinas Valley and the ROSIS data of Center of Pavia to evaluate the proposed

Table 1 Overall accuracy of the five semi-supervised algorithms for the AVIRIS data of Indian Pines (Average of 10 runs ± standard deviation; s denotes the number of labeled samples for each class; The bold values indicate the greatest accuracy among the methods in each case). Algorithm

s=3

s=5

s = 10

s = 15

s = 20

s = 25

SVM LapSVM TSVM SCS3VM SS-LPSVM

42.02 ± 0.74 48.78 ± 0.33 60.94 ± 0.71 45.15 ± 1.27 55.25 ± 3.35

50.23 ± 1.74 52.31 ± 0.67 62.57 ± 0.23 55.42 ± 0.35 56.95 ± 0.95

55.56 ± 2.04 56.36 ± 0.71 63.45 ± 0.17 60.86 ± 5.08 64.74 ± 0.39

58.58 ± 0.80 59.99 ± 0.65 65.42 ± 0.02 67.24 ± 0.47 78.76 ± 0.04

62.93 ± 0.64 64.13 ± 1.19 64.43 ± 0.20 68.34 ± 1.57 80.29 ± 0.80

65.12 ± 0.63 65.36 ± 0.62 67.68 ± 1.67 72.42 ± 1.21 84.11 ± 0.08

131

L. Wang et al. / ISPRS Journal of Photogrammetry and Remote Sensing 97 (2014) 123–137

Table 2 Average accuracy and individual classification accuracy of the five semi-supervised algorithms for the AVIRIS data of Indian Pines (Average of 10 runs ± standard deviation; The bold values indicate the greatest accuracy among the methods in each case). No. of Samp.

Alfalfa Corn-no till Corn-min till Corn Grass/pasture Grass/trees Grass/pasture-mowed Hay-windrowed Oats Soybeans-no till Soybeans-min till Soybeans-clean till Wheat Woods Bldg-Grass-Tree-Drives Stone-steel towers AA

Train

Test

25 25 25 25 25 25 10 25 10 25 25 25 25 25 25 25 –

29 1409 809 209 472 722 16 464 10 943 2443 589 187 1269 355 70 –

SVM

LapSVM

TSVM

SCS3VM

SS-LPSVM

95.93 ± 2.16 30.25 ± 7.38 53.55 ± 5.15 77.27 ± 7.46 88.21 ± 3.78 90.01 ± 3.13 86.15 ± 3.92 88.92 ± 2.48 90.00 ± 10.49 60.06 ± 4.40 29.38 ± 9.72 63.52 ± 7.59 98.40 ± 1.35 88.67 ± 3.19 50.00 ± 3.57 93.05 ± 4.69 73.96 ± 0.70

87.78 ± 3.81 37.85 ± 6.32 63.50 ± 7.46 80.60 ± 4.06 82.78 ± 7.83 86.96 ± 2.14 91.54 ± 3.77 88.02 ± 2.91 89.00 ± 8.00 71.49 ± 3.89 45.64 ± 7.25 50.55 ± 6.33 97.26 ± 1.50 85.41 ± 6.44 48.16 ± 4.37 96.42 ± 3.23 75.18 ± 0.91

92.96 ± 3.39 46.75 ± 3.61 60.98 ± 6.77 80.77 ± 3.32 88.93 ± 1.98 89.42 ± 2.85 91.54 ± 4.49 87.48 ± 3.58 99.00 ± 2.00 66.32 ± 5.10 40.30 ± 3.91 60.91 ± 7.16 97.64 ± 1.49 83.42 ± 2.87 59.58 ± 2.74 95.79 ± 2.58 77.61 ± 1.08

91.98 ± 3.15 52.67 ± 1.82 68.15 ± 3.38 78.63 ± 2.63 90.48 ± 2.67 91.48 ± 1.43 89.74 ± 4.80 87.46 ± 2.25 95.00 ± 4.08 71.01 ± 3.88 56.39 ± 6.03 73.40 ± 0.89 99.21 ± 0.44 92.30 ± 1.14 62.81 ± 4.08 95.09 ± 0.99 80.99 ± 0.13

98.89 ± 1.48 75.45 ± 1.87 75.08 ± 8.15 95.30 ± 2.73 93.40 ± 0.96 95.72 ± 1.01 95.38 ± 1.54 97.22 ± 0.80 100 ± 0.00 82.77 ± 2.59 65.37 ± 4.24 87.30 ± 3.06 99.62 ± 0.19 97.74 ± 1.50 88.89 ± 3.00 96.84 ± 2.58 90.31 ± 0.18

(a)

(b)

(d)

(c)

(e)

(f) Corn -min till Corn -no till

Stone -steel towers Wheat Alfalfa Woods

Soybeans -clean till

Bldg -Grass -Tree -Drives Grass /pasture -mowed

Soybeans -no till

Grass /pasture

Corn

Hay -windrowed

Grass /trees

Oats

Soybeans -min till

Fig. 10. Comparison of the semi-supervised algorithms for the AVIRIS data of Indian Pines. (a) Reference land-cover, (b) SVM, (c) LapSVM, (d) TSVM, (e) SCS3VM, (f) SS-LPSVM.

Table 3 Comparison of computational time for the AVIRIS data of Indian Pines. The bold values indicate the greatest accuracy among the methods in each case. Algorithm

LapSVM

TSVM

SVM

SS-LPSVM

SCS3VM

Time (s)

0.39

30.14

186.22

193.80

460.37

algorithm. We selected a labeled training set of 16s and 9s labeled samples (s samples for each class) and 200 unlabeled samples from two datasets. s varied with a set of {3, 5, 10, 15, 20, 25}. Tables 6

and 7 give the classification accuracy obtained by different classification algorithms with different s for two datasets. It can be seen from Table 6 that the proposed SS-LPSVM algorithm can produce the most accurate classification results. The accuracy gain of SSLPSVM over the standard SVM varies from 7.67% to 14.74% in OA, and it is most obvious with s = 10. The LapSVM can improve the classification performance of standard SVM in a set of s e {5, 10, 20, 25}. For illustrative purposes, Fig. 14 presents the obtained classification maps for the AVIRIS data of Salinas Valley. We can observe clearly from Fig. 14 that the grapes_untrained class and

132

L. Wang et al. / ISPRS Journal of Photogrammetry and Remote Sensing 97 (2014) 123–137

(a) Asphalt

(b)

Bricks

Gravel Meadows

(c)

Bitumen Metal sheets Bare soil

Trees

Shadows

Fig. 11. Influence of the spatial graph on the performance of SS-LPSVM for the ROSIS data of University of Pavia. (a) OA = 50.94% with l = 0, (b) OA = 86.02% with l = 0.2, (c) OA = 73.86% with l = 1.

that the distribution of land-cover in the dataset is complex, the spatial neighbors are more likely to be background noise when the number of the labeled samples is relatively small. Afterwards, concerning LapSVM and TSVM, they can improve the classification performance of the standard SVM in several cases, and the degree of the improvement is sometimes not obvious. Fig. 15 illustrates the best classification maps obtained by different algorithms for the ROSIS data of Center of Pavia. Examining the maps, Fig. 15(f) is found to be more accurate than the other classification maps, especially for the bitumen class and the water class. Some speckle artifacts exist in regions of the tiles class and bare soil class in Fig. 15(b)–(d). In Fig. 15(e), some samples belonging to the water class are misclassified to the bitumen class and the trees class.

88

Clas s ific ation ov erall ac c rac y :O A (% )

d=4 d=8

86

d=16 d=24

84 82 80 78 76

4. Discussion 74 72 0.01

0.1

0.2

0.3

0.4

0.5

1

miu Fig. 12. Influence of l and d on the performance of SS-LPSVM for the ROSIS data of University of Pavia.

vineyard_untrained class are not classified satisfactorily by standard SVM, LapSVM, TSVM and SCS3VM. The proposed SS-LPSVM, however, can produce the most accurate classification results for these two regions. Moreover, the results in Table 7 show that SS-LPSVM and SCS3VM impove the accuracy of standard SVM significantly. SS-LPSVM can obtain higher results than that of SCS3VM. However, when s = 3, the greatest accuracy are achieved by SCS3VM. Due to

In this paper, we have proposed the Spatial-Spectral Label Propagation SVM (SS-LPSVM) that considers the spatial information to predict the labels of unlabeled samples for semi-supervised learning. The influences of the spatial graph and the parameters on the classification accuracy of SS-LPSVM were analyzed. The proposed SS-LPSVM algorithm was compared to standard SVM, LapSVM, TSVM and SCS3VM. The experimental results shown in Sections 3.3–3.5 indicate that the proposed SS-LPSVM algorithm is an effective strategy for semi-supervised classification of hyperspectral imagery. In the first experiment, we can obtain a general rank of the five semi-supervised classification algorithms in terms of OA in a descending order: SS-LPSVM, SCS3VM, TSVM, LapSVM and SVM. Although LapSVM, TSVM, SCS3VM can improve the accuracy of SVM, the degree of the improvement is smaller than that of SS-LPSVM. The reason for the advantages of the proposed

Table 4 Overall accuracy of the five semi-supervised algorithms for the ROSIS data of University of Pavia (Average of 10 runs ± standard deviation; The bold values indicate the greatest accuracy among the methods in each case). Algorithm

s=3

s=5

s = 10

s = 15

s = 20

s = 25

SVM LapSVM TSVM SCS3VM SS-LPSVM

46.19 ± 0.65 60.10 ± 2.03 61.84 ± 0.41 53.02 ± 1.21 67.21 ± 2.71

53.73 ± 1.30 65.72 ± 0.34 63.43 ± 1.22 56.76 ± 2.28 69.60 ± 2.30

61.53 ± 1.14 68.26 ± 2.20 63.73 ± 0.45 64.25 ± 0.40 75.88 ± 0.22

60.43 ± 0.94 68.34 ± 0.29 68.45 ± 1.07 66.87 ± 0.37 80.67 ± 1.21

64.89 ± 1.14 65.91 ± 0.45 73.72 ± 0.27 68.24 ± 1.18 78.41 ± 0.26

68.01 ± 2.62 68.88 ± 1.34 69.96 ± 1.39 69.45 ± 2.19 85.56 ± 0.09

133

L. Wang et al. / ISPRS Journal of Photogrammetry and Remote Sensing 97 (2014) 123–137

Table 5 Average accuracy and individual classification accuracy of the five semi-supervised algorithms for the ROSIS data of University of Pavia (Average of 10 runs ± standard deviation; The bold values indicate the greatest accuracy among the methods in each case). No. of Samp.

Asphalt Meadows Gravel Trees Metal sheets Bare soil Bitumen Bricks Shadows AA

Train

Test

25 25 25 25 25 25 25 25 25 -

6606 18624 2074 3039 1325 5004 1305 3657 922 -

SVM

LapSVM

TSVM

SCS3VM

SS-LPSVM

73.56 ± 1.88 61.16 ± 6.79 75.61 ± 6.02 91.58 ± 3.45 97.57 ± 2.65 62.19 ± 7.37 88.90 ± 0.58 57.40 ± 4.79 99.44 ± 0.30 78.60 ± 0.82

61.65 ± 3.26 47.43 ± 10.54 53.29 ± 8.13 84.25 ± 4.56 97.62 ± 0.43 69.38 ± 15.54 87.46 ± 1.59 71.61 ± 7.92 98.99 ± 0.21 74.63 ± 0.72

82.59 ± 4.18 59.94 ± 3.56 71.35 ± 1.78 89.59 ± 1.61 99.49 ± 0.14 63.93 ± 2.07 85.04 ± 4.96 70.32 ± 2.65 96.66 ± 5.57 79.88 ± 0.96

70.51 ± 3.92 64.13 ± 2.79 74.38 ± 3.53 91.05 ± 2.22 99.18 ± 0.46 68.97 ± 3.21 85.11 ± 3.76 67.66 ± 3.97 99.54 ± 0.20 80.06 ± 0.53

73.60 ± 7.25 80.61 ± 2.34 82.16 ± 3.79 94.38 ± 1.28 99.73 ± 0.11 82.49 ± 5.41 95.95 ± 1.27 70.84 ± 2.56 99.70 ± 0.04 86.61 ± 1.13

(a)

(b)

(d)

(e)

Asphalt

Bricks

Gravel Meadows

(c)

(f)

Bitumen Metal sheets Bare soil

Trees

Shadows

Fig. 13. Comparison of the semi-supervised algorithms for the ROSIS data of University of Pavia. (a) Reference land-cover, (b) SVM, (c) LapSVM, (d) TSVM, (e) SCS3VM, (f) SSLPSVM.

Table 6 Overall accuracy of the five semi-supervised algorithms for the AVIRIS data of Salinas Valley (Average of 10 runs ± standard deviation; The bold values indicate the greatest accuracy among the methods in each case). Algorithm

s=3

s=5

s = 10

s = 15

s = 20

s = 25

SVM LapSVM TSVM SCS3VM SS-LPSVM

72.71 ± 2.21 70.46 ± 2.57 55.33 ± 1.90 71.41 ± 3.31 80.38 ± 1.90

73.90 ± 1.91 75.31 ± 2.31 60.43 ± 1.40 74.12 ± 2.44 86.79 ± 1.75

75.62 ± 1.73 76.34 ± 1.77 67.47 ± 1.05 78.49 ± 2.02 90.36 ± 1.35

79.08 ± 1.45 77.93 ± 2.42 69.12 ± 1.32 81.83 ± 0.93 90.86 ± 1.36

77.89 ± 1.20 79.40 ± 0.73 71.03 ± 1.78 81.22 ± 1.27 91.77 ± 0.96

78.05 ± 1.49 80.56 ± 1.33 71.83 ± 1.16 77.08 ± 0.80 92.11 ± 1.07

134

L. Wang et al. / ISPRS Journal of Photogrammetry and Remote Sensing 97 (2014) 123–137

Table 7 Overall accuracy of the five semi-supervised algorithms for the ROSIS data of Center of Pavia (Average of 10 runs ± standard deviation; The bold values indicate the greatest accuracy among the methods in each case). Algorithm

s=3

s=5

s = 10

s = 15

s = 20

s = 25

SVM LapSVM TSVM SCS3VM SS-LPSVM

66.05 ± 2.38 87.10 ± 2.00 67.57 ± 1.67 89.46 ± 2.30 83.94 ± 1.35

79.51 ± 0.64 88.03 ± 0.80 68.00 ± 0.56 91.24 ± 0.70 91.68 ± 1.17

89.33 ± 0.28 89.91 ± 0.97 67.89 ± 0.69 91.72 ± 0.38 93.28 ± 0.70

90.88 ± 1.72 91.66 ± 0.40 72.94 ± 0.63 93.68 ± 1.00 95.76 ± 0.36

92.79 ± 0.42 92.09 ± 1.26 76.78 ± 1.40 93.79 ± 1.00 95.76 ± 0.56

91.23 ± 0.83 92.44 ± 0.63 77.85 ± 2.55 93.70 ± 0.02 95.75 ± 0.64

(a)

(d) Lettuce_romaine_4_weeks Lettuce_romaine_5_weeks Lettuce_romaine_6_weeks Lettuce_romaine_7_weeks

(b)

(c)

(e)

(f)

Corm_senesced_green_weeds Broccoli_green_weeks_1 Broccoli_green_weeds_2 Soil_vineyard_develop

Grapes_untrained Vineyard_vertical_trellis Vineyard_untrained Fallow_rourh_plow

Fallow_smooth Fallow Stubble Celery

Fig. 14. Comparison with the semi-supervised algorithms for the AVIRIS data of Salinas Valley. (a) Reference land-cover, (b) SVM, (c) LapSVM, (d) TSVM, (e) SCS3VM, (f) SSLPSVM.

algorithm over the classical semi-supervised classification algorithms (i.e., LapSVM and TSVM) is that the former utilizes the spatial information while the latter are only based on the spectral information and thus fail to deal with mixed pixel, which contains a variety of spectral information. Moreover, different from SCS3VM, the proposed algorithm avoids solving the optimization function of

standard SVM. As can be found from Tables 4, 6 and 7 , this rank has been changed with the different datasets. SS-LPSVM, however, can produce consistently greater accuracy than LapSVM, TSVM, SCS3VM and SVM. At the same time, the Paired t-tests (Dietterich, 1998; Demsar, 2006; Garcia and Herrera, 2008) at 95% significance level were

135

L. Wang et al. / ISPRS Journal of Photogrammetry and Remote Sensing 97 (2014) 123–137

(a)

(d) Water

Tiles

Meadows

Trees

(b)

(c)

(e)

(f)

Bitumen Bare soil Asphalt

Bricks

Shadows

Fig. 15. Comparison with the semi-supervised algorithms for the ROSIS data of Center of Pavia. (a) Reference land-cover, (b) SVM, (c) LapSVM, (d) TSVM, (e) SCS3VM, (f) SSLPSVM.

carried out. For the original hypotheses where OA of the benchmark methods were compared to the one of SS-LPSVM, p = p1/p2/ p3 are in percent and denoted as p1 (equal to), p2 (higher than or equal to), and p3 (lower than or equal to), respectively. A p-value, which is smaller than 5%, casts doubt on the original hypothesis. Moreover, the results shown in Table 8 also reveal that SS-LPSVM achieves 4 wins, 0 tie and 0 loss when compared to SVM, LapSVM and TSVM, and 3 wins, 1 tie and 0 loss when compared to SCS3VM. The results show that the OA of SS-LPSVM is always significantly greater than the other four algorithms. The proposed algorithm is sensitive to some parameters, including l, d and M. The parameter l and d control the influence of the spatial graph. Focusing on Figs. 8 and 12, it is concluded that the performance of the proposed SS-LPSVM algorithm is mainly influenced by the spatial graph. As for the parameter M in relation to

the adaptive method, the proposed SS-LPSVM algorithm tends to obtain stabile accuracy when M falls within the interval [7, 10] (see Fig. 9). It is worth mentioning the stability of the classification results for the proposed SS-LPSVM algorithm. From Tables 1, 4, 6 and 7, the standard deviation of OA for SS-LPSVM, which can reflect the stability of the algorithm, can be obtained and shown in Table 9. The standard deviation of s = 3 is relatively greater than that of s = 25 for four datasets. This is because a smaller s leads to a lower probability of nearby samples in the training set. Meanwhile, due to the randomness of experiment, the standard deviation of OA is not absolutely inversely proportional to s for each dataset. As learned from Table 3 previously, the proposed SS-LPSVM algorithm took around 194s, and LapSVM and TSVM took less than 1 min. The proposed SS-LPSVM algorithm can be divided into four

Table 8 The p-values and win/tie/loss counts of SS-LPSVM versus other algorithms based on OA at s = 25 (W, T, L denotes win, tie, loss, respectively). Datasets

SVM

LapSVM

TSVM

SCS3VM

W/T/L

Indian Pines University of Pavia Salinas Valley Center of Pavia W/T/L

0/0/100 (W) 0.9/0.4/99.6 (W) 0/0/100 (W) 4.5/2.3/97.7 (W) 4/0/0

0/0/100 (W) 0.1/0.1/99.9 (W) 0/0/100 (W) 1.3/0.7/99.4 (W) 4/0/0

0.4/0.2/99.8 (W) 0.5/0.2/99.8 (W) 0/0/100 (W) 1.9/0.9/99.1 (W) 4/0/0

0/0/100 (W) 0.8/0.4/99.6 (W) 17.5/8.8/91.2 (T) 4.8/2.4/97.6 (W) 3/1/0

4/0/0 4/0/0 3/1/0 4/0/0 –

136

L. Wang et al. / ISPRS Journal of Photogrammetry and Remote Sensing 97 (2014) 123–137

Table 9 The standard deviation of OA (10 runs) for the four datasets. Datasets

s=3

s=5

s = 10

s = 15

s = 20

s = 25

Indian Pines University of Pavia Salinas Valley Center of Pavia

3.35 2.71 1.90 1.35

0.95 2.30 1.75 1.17

0.39 0.22 1.35 0.70

0.04 1.21 1.36 0.36

0.80 0.26 0.96 0.56

0.08 0.09 1.07 0.64

Table 10 The computational time of different parts for the AVIRIS data of Indian Pines. Part

Part 1

Part 2

Part 3

Part 4

sum

Time (s)

0.21

15.39

1.14

177.06

193.80

parts: extraction of spatial information by 2-D Gabor filter, extraction of the spatial neighbors, Label Propagation and classification. The four parts are denoted as Part 1, Part 2, Part 3 and Part 4 here. The computing times of the four parts in the proposed SS-LPSVM algorithm are given in Table 10. The relatively long running time is spent in Part 2 and Part 4 to extract spatial neighbors and process the additional unlabeled samples, respectively. Even though, we can observe further from Tables 1 and 2 that the proposed SS-LPSVM algorithm is capable to produce greater accuracy than the other algorithms. Therefore, the long time is the cost of improving accuracy for SS-LPSVM. Although the results obtained by the proposed SS-LPSVM are encouraging, further experiments with additional scenes and comparison methods should be conducted. Considering the fact that the proposed algorithm is relatively computationally expensive, our further work will focus on the development of computationally efficient schemes for the proposed SS-LPSVM algorithm. In addition, the uncertainty in the mixed pixels limits the classification accuracy. Another important research topic deserving future attention is to assign the mixed pixels with relatively correct label and combine other spatial information extraction strategy for semisupervised classification. 5. Conclusion In this paper, spatial features extracted by a 2-D Gabor filter were stacked with spectral features first, then the width of the Gaussian function to construct graph was determined with an adaptive method and the unlabeled samples were randomly selected from the spatial neighbors of the labeled samples. At the same time, spatial graph and spectral graph based on different rules were constructed. Finally, based on the spatial-spectral graph, labels were propagated from labeled samples to unlabeled samples. Meanwhile, the SVM was trained by the updated training set. Note that apart from the SVM, the proposed methodology could be extended to other classifiers (e.g., the Maximum Likelihood Classifier and the Multilayer Perceptron). To evaluate the effectiveness of the proposed SS-LPSVM method, experiments were conducted to compare its results with those of the SVM, LapSVM, TSVM and SCS3VM using four hyperspectral datasets. It was found that the performance of the proposed method is influenced by the spatial graph, the parameters concerning spatial-spectral graph, and also influenced by the parameter of the adaptive method. It would be an interesting topic to automatically select the optimal parameters for the proposed spatial-spectral graph constructed strategy. Moreover, the classification performance and stability of the proposed method is related to the size of the labeled samples. Both visual and quantitative assessments on experimental results show that the proposed SSLPSVM method can produce greater overall and individual class

classification accuracy than the benchmark algorithms. Paired t-tests were also conducted to further validate the superiority of SS-LPSVM for the different hyperspectral datasets. From a computation viewpoint, the computational time taken for the label allocation is relatively long in the proposed method. This paper provides an effective new option for semi-supervised learning. Further research can be directed at developing more computationally efficient schemes for the proposed SS-LPSVM method and developing some new approaches to characterize spatial information to enhance the semi-supervised algorithm.

Acknowledgements This work was supported by National Natural Science Foundation of China (Grant No 61275010), Ph.D. Programs Foundation of Ministry of Education of China (Grant No. 20132304110007), and the Fundamental Research Funds for the Central Universities (Grant No. HEUCFD1410).

References Bau, T.C., Sarkar, S., Healey, G., 2010. Hyperspectral region classification using a three-dimensional Gabor filer bank. IEEE Trans. Geosci. Remote Sens. 48 (9), 3457–3464. Belkin, M., Niyogi, P., 2005. Semi-supervised learning on manifolds. Machine Learning J. 56, 209–239. Belkin, M., Niyogi, P., Sindhwani, V., 2006. Manifold regularization: a geometric framework for learning from labeled and unlabeled examples. J. Machine Learn. Res. 7, 2399–2434. Benediktsson, J.A., Palmason, J.A., Sveinsson, J., 2005. Classification of hyperspectral data from urban areas based on extended morphological profiles. IEEE Trans. Geosci. Remote Sens. 43 (3), 480–491. Blum, A., Chawla, S., 2001. Learning from Labeled and Unlabeled Data Using Graph Mincuts. In: Proceedings of the Eighteenth International Conference on Machine Learning, San Francisco, CA, USA, pp. 19–26. Bovolo, F., Bruzzone, L., Carline, L., 2010. A novel technique for subpixel image classification based on support vection machine. IEEE Trans. Image Process. 19 (11), 2983–2999. Bruzzone, L., Persello, C., 2009. A novel context-sensitive semisupervised SVM classifier robust to mislabeled training samples. IEEE Trans. Geosci. Remote Sens. 47 (7), 2142–2154. Bruzzone, L., Chi, M., Marconcini, M., 2006. A novel transductive SVM for semisupervised classification of remote sensing images. IEEE Trans. Geosci. Remote Sens. 44 (11), 3363–3373. Camps Valls, G., Gomez Chova, L., Munoz Mari, J., Vila Frances, J., Calpe Maravilla, J., 2006. Composite kernels for hyperspectral image classification. IEEE Geosci. Remote Sens. Lett. 3 (1), 93–97. Camps Valls, G., Bandos Marsheva, T., Zhou, D., 2007. Semi-supervised graph-based hyperspectral image classification. IEEE Trans. Geosci. Remote Sens. 45 (10), 3044–3054. Cheng, H., Liu, Z.C., Yang, J., 2009. Sparsity Induced Similarity Measure for Label Propagation. In: IEEE 12th International Conference on Computer Vision, Kyoto, Japan, 29 September-2 October, pp. 317–324. Congalton, R.G., 1991. A review of assessing the accuracy of classifications of remotely sensed data. Remote Sens. Environ. 37 (1), 35–46. Demsar, J., 2006. Statistical comparisons of classifiers over multiple data sets. J. Machine Learn. Res. 7, 1–30. Dietterich, T., 1998. Approximate statistical tests for comparing supervised classification learning algorithms. Neural Comput. 10 (7), 1895–1923. Dopido, I., Li, J., Marpu, P.R., Plaza, A., Bioucas Dias, J.M., Benediktsson, J.A., 2013. Semi-supervised self-learning for hyperspectral image classification. IEEE Trans. Geosci. Remote Sens. 51 (7), 4032–4044. Gabor, D., 1946. Theory of communication Part1: The analysis of information. J. Institut. Electrical Eng.-Part III: Radio Commun. Eng. 93 (26), 429–441. Garcia, S., Herrera, F., 2008. An extension on ‘‘statistical comparisons of classifiers over multiple data sets’’ for all pairwise comparisons. J. Machine Learn. Res. 9, 2677–2694. Geman, S., Geman, D., 1984. Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images. IEEE Trans. Pattern Anal. Machine Intell. PAMI-6 6, 721–741. Gomez Chova, L., Camps Valls, G., Munoz Mari, J., Calpe, J., 2008. Semisupervised image classification with Laplacian support vector machines. IEEE Geosci. Remote Sens. Lett. 5 (3), 336–340. Hughes, G.F., 1968. On the mean accuracy of statistical pattern recognizers. IEEE Trans. Inform. Theory IT-14 1, 55–63. Joachims, T., 1999. Transductive Inference for Text Classification Using Support Vector Machines. In: Proceedings of the Sixteenth International Conference on Machine Learning, San Francisco, CA, USA, pp. 200–209.

L. Wang et al. / ISPRS Journal of Photogrammetry and Remote Sensing 97 (2014) 123–137 Johnson, R., Zhang, T., 2007. Two-View Feature Generation Model for SemiSupervised Learning. In: The 24th International Conference on Machine Learning, Corvallis, Oregon, USA, 20–24 June, pp. 25–32. Karasuyama, M., Mamitsuka, H., 2013. Multiple graph label propagation by sparse integration. IEEE Trans. Neural Networks Learn. Syst. 24 (12), 1999–2012. Kim, W., Crawford, M.M., 2010. Adaptive classification for hyperspectral image data using manifold regularization kernel machines. IEEE Trans. Geosci. Remote Sens. 48 (11), 4110–4121. Krishnapuram, B., Carin, L., Figueiredo, M., Hartemink, A., 2005. Sparse multinomial logistic regression: fast algorithms and generalization bounds. IEEE Trans. Pattern Anal. Mach. Intell. 27 (6), 957–968. Kuo, B.C., Huang, C.S., Hung, C.C., Liu, Y.L., Chen, I.L., 2010. Spatial information based support vector machine for hyperspectral image classification. Geoscience and Remote Sensing Symposium (IGARSS), Honolulu, HI, 25–30 July, pp. 832–835. Li, Y., James, T. Kwok, Zhou, Z., 2009. Semi-Supervised Learning Using Label Mean. In: The 26th Annual International Conference on Machine Learning, Montreal, Quebec, Canada, 14–18 June, pp. 633–640. Li, J., Bioucas Dias, J.M., Plaza, A., 2012. Spectral-spatial hyperspectral image segmentation using subspace multinomial logistic regression and Markov random fields. IEEE Trans. Geosci. Remote Sens. 50 (3), 809–823. Li, J., Bioucas Dias, J.M., Plaza, A., 2013a. Semi-supervised hyperspectral image classification using soft sparse multinomial logistic regression. IEEE Geosci. Remote Sens. Lett. 10 (2), 318–322. Li, J., Bioucas Dias, J.M., Plaza, A., 2013b. Spectral-spatial classification of hyperspectral data using loopy belief propagation and active learning. IEEE Trans. Geosci. Remote Sens. 5 (2), 844–856. Melacci, S., Belkin, M., 2011. Laplacian support vector machines trained in the primal. J. Machine Learn. Res. 12 (3), 1149–1184. Negri, R.G., Dutra, L.V., Sant’Anna, S.J.S., 2014. An innovative support vector machine based method for contextual image classification. ISPRS J. Photogramm. Remote Sens. 87, 241–248. Nie, F.P., Xu, D., Li, X.L., Xiang, S.M., 2011. Semisupervised dimensionality reduction and classification through virtual label regression. IEEE Trans. Syst., Man, Cybernetics-Part B: Cybernetics 41 (3), 675–684. Niemeyer, J., Rottensteiner, F., Soergel, U., 2014. Contextual classification of lidar data and building object detection in urban areas. ISPRS J. Photogramm. Remote Sensing 87, 152–165. Palmason, J.A., Benediktsson, J.A., Sveinsson, J.R., Chanussot, J., 2005. Classification of hyperspectral data from urban areas using morphological preprocessing and independent component analysis. IEEE International Geoscience and Remote Sensing Symposium, 25–29 July, pp. 176–179.

137

Richards, J.A., Jia, X., 2008. Managing the spectral-spatial mix in context classification using Markov random fields. IEEE Geosci. Remote Sens. Lett. 5 (2), 311–314. Rohban, M.H., Rabiee, H.R., 2012. Supervised neighborhood graph construction for semi-supervised classification. Pattern Recogn. 45 (4), 1363–1372. Rosenberg, C., Hebert, M., Schneiderman, H., 2005. Semi-Supervised Self-Training of Object Detection Models. In: Proceedings of The Seventh IEEE Workshop on Application of Computer Vision, Breckenridge, CO, 5–7 January, pp. 29–36. Shahshahani, B.M., Landgrebe, D., 1994. The effect of unlabeled samples in reducing the small sample size problem and mitigating the Hughes phenomenon. IEEE Trans. Geosci. Remote Sens. 32 (5), 1087–1095. Shen, L., Bai, L., 2006. Mutual Boost learning for selecting Gabor features for face recognition. Pattern Recogn. 27 (15), 1758–1767. Shen, L., Jia, S., 2011. Three-dimensional Gabor wavelets for pixel-based hyperspectral imagery classification. IEEE Trans. Geosci. Remote Sens. 49 (12), 5039–5046. Shen, L., Bai, L., Michael, F., 2007. Gabor wavelets and general discriminant analysis for face identification and verification. Image Vis. Comput. 25 (5), 553–563. Shi, M., Healey, G., 2003. Hyperspectral texture recognition using a multiscale opponent representation. IEEE Trans. Geosci. Remote Sens. 41 (5), 1090–1095. Song, B., Li, J., Dalla Mura, M., Li, P., Plaza, A., Bioucas Dias, J.M., Benediktsson, J.A., Chanussot, J., 2014. Remotely sensed image classification using sparse representations of morphological attribute profiles. IEEE Trans. Geosci. Remote Sens. 52 (8), 5122–5136. Tarabalka, Y., Chanussot, J., Benediktsson, J.A., 2010a. Segmentation and classification of hyperspectral images using minimum spanning forest grown from automatically selected markers. IEEE Trans. Syst. Man Cybern. B Cybern. 40 (5), 1267–1279. Tarabalka, Y., Fauvel, M., Chanussot, J., Benediktsson, J.A., 2010b. SVM- and MRFbased method for accurate classification of hyperspectral images. IEEE Geosci. Remote Sens. Lett. 7 (4), 736–740. Vapnik, V.N., 1995. The Nature of Statistical Learning Theory. Springer Press, New York. Wang, L., Zhang, Y., Gu, Y., 2005. The research of simplification of structure of multiclass classifier of support vector machine. J. Image Graph. 10 (5), 571–572. Yang, L., Yang, S., Jin, P., Zhang, R., 2014. Semi-supervised hyperspectral image classification using spatio-spectral Laplacian support vector machine. IEEE Geosci. Remote Sens. Lett. 11 (3), 651–655. Zhu, X., Ghahramani, Z., Lafferty, J., 2003. Semi-Supervised Learning Using Gaussian Fields and Harmonic FUNCTIONS. In: Proceedings of The 20th International Conference on Machine Learning (ICML’ 03), Washington, DC, USA, 21–24 August, pp. 912–919.