A spectral-spatial kernel-based method for hyperspectral imagery classification

A spectral-spatial kernel-based method for hyperspectral imagery classification

Accepted Manuscript A spectral-spatial kernel-based method for hyperspectral imagery classification Li Li, Hongwei Ge, Jianqiang Gao PII: DOI: Referen...

1MB Sizes 0 Downloads 48 Views

Accepted Manuscript A spectral-spatial kernel-based method for hyperspectral imagery classification Li Li, Hongwei Ge, Jianqiang Gao PII: DOI: Reference:

S0273-1177(16)30632-9 http://dx.doi.org/10.1016/j.asr.2016.11.006 JASR 12963

To appear in:

Advances in Space Research

Received Date: Revised Date: Accepted Date:

15 October 2015 29 October 2016 4 November 2016

Please cite this article as: Li, L., Ge, H., Gao, J., A spectral-spatial kernel-based method for hyperspectral imagery classification, Advances in Space Research (2016), doi: http://dx.doi.org/10.1016/j.asr.2016.11.006

This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

A spectral-spatial kernel-based method for hyperspectral imagery classification Li Li1∗, Hongwei Ge1†, Jianqiang Gao2‡

1 Key

Laboratory of Advanced Process Control for Light Industry (Ministry of Education),

School of Internet of Things Engineering, Jiangnan University, Wuxi 214122, China 2 College

of Computer and Information, Hohai University, Nanjing 210098, China

Abstract Spectral-based classification methods have gained increasing attention in hyperspectral imagery classification. Nevertheless, the spectral cannot fully represent the inherent spatial distribution of the imagery. In this paper, a spectral-spatial kernelbased method for hyperspectral imagery classification is proposed. Firstly, the spatial feature was extracted by using area median filtering (AMF). Secondly, the result of the AMF was used to construct spatial feature patch according to different window sizes. Finally, using the kernel technique, the spectral feature and the spatial feature were jointly used for the classification through a support vector machine (SVM) formulation. Therefore, for hyperspectral imagery classification, the proposed method was called spectral-spatial kernel-based support vector machine (SSF-SVM). To evaluate ∗

E-mail: [email protected], [email protected] (Li Li)



Corresponding. E-mail: [email protected] (Hongwei Ge)



Corresponding. E-mail: [email protected], [email protected] (Jianqiang Gao)

1

the proposed method, experiments are performed on three hyperspectral images. The experimental results show that an improvement is possible with the proposed technique in most of the real world classification problems. Keywords: Hyperspectral imagery classification (HIC); area median filtering (AMF); spectral-spatial kernel (SSK); support vector machine (SVM)

1

Introduction

Hyperspectral imagery technology as a kind of new earth observation technology has been widely applied in many fields, this is due to hyperspectral sensors can provide abundant spectral information with hundreds of spectrally continuous bands. At the same time, the large data volume and high redundancy take great difficulties for image processing. Thus, the traditional multispectral imagery classification method [1-4] is not suitable for the requirements of planning and scheduling of digital earth. Recently, numerous classifiers and techniques have been proposed in this regard, and the classification techniques can be broadly divided into two classes are supervised and unsupervised techniques. Many pattern recognition approaches have been widely applied to hyperspectral imagery processing tasks, such as, supervised classification [5-6], unsupervised classification [7], target identification [8] and change detection [9-10]. In hyperspectral remotely sensed data, some ground truth can be easily ascertained by inspection, therefore supervised classification technique is more suitable which can train our classifier with some training data and then use it to classify the remaining data. It is worth noting that support vector machines (SVM) as one of the research

2

focuses of machine learning, has attracted more attention in remote sensing [11-13]. SVM has the inherent advantages such as less rigid requirements for prior knowledge and training data, fitness to high dimensional data, and more robustness to noise [14]. However, it is often difficult for a traditional SVM classifier to offer satisfactory performance in HIC. Actually, for HIC, there is an important Hughes phenomenon [15] that is caused by the unbalance between the high dimensionality of hyperspectral data and the limited labeled training samples in real analysis scenarios. Moreover, hyperspectral imagery is usually short of training sets, because sample collection generally involves extensive and time consuming fieldwork [16-17]. Hence, the limited ground truth samples are always not sufficient for reliable estimation of the classifier parameters. The performance of semi-supervised classifier may be stronger than other classifiers due to semi-supervised techniques utilize a large number of unlabeled data. So, semi-supervised techniques can effectively solve hyperspectral imagery classification problem, such as transductive SVM [18], graph-based method [19-20] and self-learning models [21]. In addition, semi-supervised techniques require less human efforts in sample collection. There are many pixel-wise classification approaches. However, these methods only used spectral information of each pixel but ignored the spatial information from its neighborhood. But, refs. [12,22] have demonstrated that the classification performance of high spatial resolution image can be improved by combining spectral and spatial information. Hence, the spectral-spatial classification techniques can be broadly divided into three classes as follows: (a) Using filtering technique to obtain spatial information. Fauvel et al. [12] has demonstrated that morphological profiles can analyze the remote sensing images. But, the

3

main limitation of morphological profiles is that it is not sufficient to model other geometrical features except the size of objects [23]. In order to solve the problem, Dalla Mura et al. [2425] proposed morphological attribute profiles for HIC. (b) Structuring composite kernel learning framework. Camps-Valls et al. [13] proposed a composite kernel learning framework to efficiently integrate spectral and contextual information. Although, multiple kernel learning framework (MKL) [26] can be used to deal with the spectral and contextual information, most of MKL techniques (because of the computational complexity) are not suitable for HIC. Tuia et al. [27] built a MKL model by introducing kernel alignment to efficiently choose a few important kernels for this problem. In [28], the authors proposed a representative multiple kernel learning algorithm to reduce the computational load. (c) Extracting spatial contextual information based on pixel-wise. Most of methods which based on pixel-wise were carried on by refining the classification result with spatial contextual information. One way is Markov-random-field-based regularization [29]. Another is to fuse the two maps which were provided by based on pixel-wise classification and pre-segmentation [30]. In [31], the authors proposed to use spectral-spatial classifiers at the preliminary step of the marker selection procedure, each of them combining the results of a pixelwise classification and a segmentation map. In [32], the authors proposed a hybrid method. The method combined the results of a pixel wise support vector machine classification and the segmentation map obtained by partitional clustering using majority voting. In [33], the authors proposed to integrate spectral-spatial information for hyperspectral image classification and exploit the benefits of using spatial features for the kernel based extreme learning machine (KELM) classifier. In this method, Gabor filtering and multihypothesis

4

(MH) prediction preprocessing are two approaches employed for spatial feature extraction. In this paper, our main goal is to present an efficient classification approach based on spectral-spatial kernel to improve the quality of HIC. The proposed classification method belongs to the first category. First, the spectral features can be obtained by PCA. Second, the spatial features can be obtained by using AMF with different window sizes. Third, the spectral features and spatial features are jointly used for the classification through a SVM formulation. The advantage of the proposed method is verified by three hyperspectral images acquired by the Airborne Visible Infrared Imagining Spectrometer (AVIRIS) sensor over the Indian Pines region, the Reflective Optics System Imaging Spectrometer (ROSIS) sensor during a flight campaign over the engineering school at university of Pavia, and the AVIRIS over Salinas Valley, Southern California, respectively. For clarity, we refer them as Indian Pines, Pavia University, and AVIRIS data of Salinas Valley, respectively. The rest of this paper is organized as follows. Section 2 briefly describes the SVM and AMF, and then introduces the proposed SSF-SVM strategy. Section 3 demonstrates the experimental results. Finally, conclusions are drawn in Section 4.

2 2.1

Methodology The framework of support vector machine (SVM)

In this subsection, we briefly review the SVM classifier. SVM has been widely applied in HIC, because it can solve high-dimensional data and non-linear problem. The main idea of SVM is to construct a hyperplane as the decision surface in such a way that the margin of separation between positive and negative examples is maximized. That is to say, in order to complete

5

a better classification, the original samples need to be mapped into a higher dimensionality space via a non-linear mapping function. Precisely, SVM is an implementation of the method of structural risk minimization. Let D = {(x1 , y1 ), · · · , (xn , yn )} be a training set which has n training samples. yi ∈ {−1, +1}, (i = 1, 2, · · · , n) denotes labeled, the optimization problem of SVM can be found as follows:  n    1  2  min w + C ξi ,   w 2   i=1  

(1)

s.t.yi (wT · ϕ(xi ) + b) ≥ 1 − ξi ,          ξi ≥ 0, ∀i = 1, 2, · · · , n,

where w and b are the weight vector and threshold of decision function, ξi is the slack variable, C is the penalty factor, and ϕ(·) represents the non-linear mapping function. Then, the Lagrange function of eq. (1) can be calculated as: n

n

n

     1 L(w, α, b) = w2 + C ξi − αi yi (wT · ϕ(xi ) + b) − 1 + ξi − βi ξi , 2 i=1 i=1 i=1

(2)

where αi and βi are Lagrange multipliers, and the dual form of eq. (2) can be obtained as follows:  n

n  n     1   αi − αi αj yi yj K(xi , xj ) , max   α 2   i=1 i=1 j=1   n s.t.  i=1 αi yi = 0,         0 ≤ αi ≤ C, ∀i = 1, 2, · · · , n,

(3)

where K(xi , xj ) = ϕ(xi )T ϕ(xj ), xi is the support vector for αi > 0. α can be obtained via eq. (3), w = ni=1 αi yi xi . Then, the decision function can be obtained as follows: n

 f (x) = sgn αi yi K(x, xi ) + b , i=1

6

(4)

where b can be obtained by using KKT conditions: b = −

maxyi =1 [(w )T xi ] + minyi =−1 [(w )T xi ] . 2

(5)

In fact, multi-classification problem is ubiquitous in practice, but the original SVM is only for binary classification problem. Therefore, many strategies such as ”one vs. one” and ”one vs. rest” have been proposed to solve multi-classification problem. In this paper, we adopt ”one vs. one” strategy.

2.2

Area median filtering (AMF)-based spatial features extraction

In this subsection, a key issue is to consider spatial information in HIC. Therefore, the classification performance can be improved by taking into account spectral and spatial information. In fact, the AMF cannot be used directly on multispectral or hyperspectral remotely sensed images, because of lack of ordering relation. In addition, for HIC analysis, Chang et al. [34] has demonstrated that the redundancy from inter-band correlation is very high and the data structure in the spectral dimension can be reduced without a significant loss of the useful information for subsequent utilizations. Hence, in order to extract the spatial information by using AMF, the dimensionality reduction must be considered. In proposed method, the AMF is computed on the d principal components to extract the neighborhood of each pixel. Then, the neighbors mask is applied on each band of the data. Spatial information extraction as follows: Let I (I = (Ii )T ∈ Rd×(m∗n) ) be a hyperspectral image which has d principal components. I = (Ii )T ∈ Rd×(m∗n) . Ii ∈ R1×(m∗n) (i = 1, · · · , d) is the row vector. The spatial features representation for d principal components are constructed as follows:

7

First, the Ii row vector of the d principal components are converted to a matrix with the size of m × n (i.e. I i ∈ Rm×n ). And then, the spatial features representation for I i are constructed by eq. (6) vi = median(I i ),

(6)

where vi ∈ Rm×n , dim(vi ) = dim(I i ) = m ∗ n. The median operation with different window sizes (M = 3, 5) will be carried out.

Second, we convert the matrix vi into a one-dimension row vector V0 by stacking the rows of the matrix vi . And finally we use V0 to represent the spatial features of the input image I.

 

 v1      v   2      V0 =   v3  ,   . . .     vd

(7)

where V0 ∈ Rd×(m∗n) denotes the spatial features of input image I. Fig.1 shows the diagram that summarizes the steps of the extract spatial features for a hyperspectral image. We can clearly to see that the process for extracting spatial features in Fig.1. Let us suppose that the input image is I. The size of I is m × n. The reduced dimension data can be obtained by using PCA, and then, we can obtain the reconstruction image (the size is m × n). Finally, the spatial features of the reconstruction image can be obtained via using AMF operation and different window sizes (M = 3, 5). We treat the hyperspectral imagery as an area. So, the AMF operation is reasonable. However, the adaptive median filters may cost the necessary information. Because, the 8

Figure 1. Diagram of the AMF for spatial features extraction. adaptive possible damage the spatial information. The relevant references [35-37] have shown that point. Based on two types of image models corrupted by impulse noise, there are two algorithms for the adaptive median filters were proposed (viz. ranked-order based adaptive median filter (RAMF) and impulse size based adaptive median filter (SAMF)). The RAMF is superior to the nonlinear mean LP filter in removing positive and negative impulses while simultaneously preserving sharpness and the SAMF is superior to adaptive scheme because it is simpler with better performance in removing the high density impulsive noise as well as non-impulsive noise and in preserving the fine details. In our proposed AMF operation, the median operation is very important for the noise pixels during image classification. Of course, the de-noising preprocessing is necessary. But, when we encounter the pixels during image classification, we first consider the decomposition of mixed pixels, that is to say, the principal component of the mixed pixels is very important. So, the PCA operation and the AMF operation are two main aspects. In addition, there are many techniques to tackle the noise and mixed pixels. If there are severe noise or mixed

9

pixels during image classification, our method is disabled. In addition, we main focus on the feature extraction of hyper-spectral image.

2.3

Spectral-spatial AMF-based support vector machine (SSFSVM)

This subsection is dedicated to the definition of a spectral-spatial AMF-based SVM classifier. The classical setting is used to learn the SVM in the dual formulation. In order to use SVM, one has to define a kernel function between samples. For n-valued pixels: k : Rn × Rn → R.

(8)

One classical effective kernel is the Gaussian radial basis kernel: 

||x − z||2 kσ (x, z) = exp − 2σ 2



(9)

,

where the norm is the Euclidean-norm and σ ∈ R+ tunes the variance of the Gaussian kernel. x represents a pixel-vector where each component contains spectral information which is provided by an original hyperspectral image. However, many classification methods always focus on spectral feature without the consideration of spatial feature in HIC processing. So far, there are few references [11,12,22,32] about it. Therefore, a spectral-spatial kernel-based classification method is proposed. In this approach, the spectral and spatial features are jointly used for the classification through a SVM formulation. The details can be found as follows: (a) Extract spectral feature by using PCA. Step 1: For original data matrix X, we can obtain the standardized data Z according to

10

eq. (10). zij = 

xij − xj , n 1 2 k=1 (xkj − xj ) n−1

(10) 

 x11 x12   x  21 x22 n 1 where xj = n i=1 xij , i = 1, · · · , n, j = 1, · · · , p, k = 1, · · · , n, X =   . ..  .. .    xn1 xn2    z11 z12   z  21 z22 ,Z=  . ..  .. .    zn1 zn2

···

z1p    · · · z2p   . ..  ··· .     · · · xnp

···



x1p    · · · x2p    ..  ··· .     · · · xnp

Step 2: We can obtain the correlation matrix R according to eq. (11). n (zij − z j )(zjk − z k ) rjk =  n i=1 , (11) n 2 2 i=1 (zij − z j ) i=1 (zik − z k ) where z j = n1 ni=1 zij , j = 1, · · · , p, k = 1, · · · , p, i denotes the sample label, R =    r11 r12   r  21 r22   . ..  .. .    rn1 rn2

···

r1p    · · · r2p   . ..  ··· .     · · · rnp

Step 3: We can obtain the eigenvalues and eigenvectors of R by solving eq. (12).

|R − λIp | = 0,

(12)

where λi , (i = 1, · · · , p) denotes the eigenvalues. Step 4: We can obtain d principal components according to eq. (13). d λi i=1 ≥ 0.9. p i=1 λi 11

(13)

Under the constraint condition eq. (13), we can get d eigenvectors (u1 , · · · , ud ) corresponding to d largest eigenvalues of eq. (12). The spectral feature vector is formed as follows:    u1      u   2      U0 =   u3  .   .  ..        ud

(14)

(b) The spectral-spatial feature is obtained as follows:  

 µu1        µu 2       .   . .         µu   d      = [µU0 ; (1 − µ)V0 ] = (1 − µ)v1  ,     (1 − µ)v   2       (1 − µ)v3      .   ..       (1 − µ)vd

(15)

where µ ∈ [0, 1] controls the influences of spectral and spatial information. So, the spectralspatial kernel can be obtained via using eqs. (9) and (15) (see the eq. (16)).   || − z||2  kσ ( , z) = exp − . 2σ 2 Finally, the new decision function can be obtained via eq. (17). n

 f (z) = sgn αi yi kσ ( i , z) + b . i=1

The classification flow diagram of proposed SSF-SVM algorithm is shown in Fig.2. 12

(16)

(17)

Figure 2. The classification flow of proposed SSF-SVM algorithm.

3 3.1

Experiments and analysis Experimental setup and assessment indices

In this subsection, in order to demonstrate the effectiveness of the proposed SSF-SVM method for HIC, the overall accuracy (OA) and kappa value quantitatively criteria were used in our experiment. Indian Pines, Pavia University, and AVIRIS data of Salinas Valley hyperspectral databases (see [31-33]), as shown in Figs.3-5 and introduced in section 3.2, were used in the experimental part of the study to evaluate the performance of the proposed SSF-SVM algorithm with different window sizes.

3.2

Hyperspectral data description

The Indian Pines dataset was gathered by AVIRIS sensor over the Indian Pines test site in North-western Indiana in 1992, and consists of 145 × 145 pixels and 220 spectral reflectance bands in the wavelength range 0.4 − 2.5 × 10−6 meters. This scene is a subset of a larger one. The ground truth available is designated into 16 classes and is not all mutually exclusive. In the experiment, 20 spectral bands were removed because of the water absorption phenomena

13

Figure 3. The AVIRIS data of Indian Pines. (a) False-color composition (bands 17, 27, and 50 for RGB), (b) Reference land-cover.

Figure 4. The ROSIS data of Pavia University. (a) False-color composition (bands 16, 27, and 45 for RGB), (b) Reference land-cover. 14

Figure 5. The AVIRIS data of Salina-A. (a) False-color composition (bands 16, 27, and 145 for RGB), (b) Reference land-cover. and noise. Finally, the sub-dataset of the Indian Pines was used in our experiment. The original three bands synthetic false color image and the reference land-cover image are listed in Fig.3. The Pavia University dataset was gathered by ROSIS sensor over the engineering school at university of Pavia. It is 610 × 340 pixels, and the spatial resolution is 1.3 m per pixel. 12 spectral bands were removed due to the noise, and the remaining 103 spectral bands were processed. 9 classes of interest are considered: Tree, asphalt, bitumen, gravel, metal sheets, shadows, bricks, meadows, and soil. The original three bands synthetic false color image and the reference land-cover image are shown in Fig.4. The AVIRIS dataset of Salinas Valley was gathered by AVIRIS sensor over Salinas Valley, California, and is characterized by high spatial resolution (3.7-meter pixels). It is 512 × 217

15

pixels, and the spatial resolution is 3.7 m per pixel, 206 spectral bands from 0.4 to 2.5 um. In our experiment, a small subscene of Salinas image, denoted Salinas-A, is used to the experiment. It comprises 86 × 83 pixels located within the same scene and includes six classes. Fig.5 shows the bands synthetic false color image and the reference land-cover of the Salinas-A.

3.3

Parameter selection

In proposed SSF-SVM algorithm, there are three parameters needed to be manually set. The first one is the window size M (M ∈ {3, 5}) used in the AMF operation, the second one is the weight parameter µ (µ ∈ [0, 1] controls the influences of spectral and spatial information), and the third one is the spectral or spatial dimension d. To statistically examine the effect of different parameters on the final image classification performance, the OA and kappa value of each approach is estimated by the standard fivefold cross-validation methodology.

3.4

The AVIRIS data of Indian Pines

In order to illustrate the validity and feasibility of the proposed SSF-SVM method, on the sub-AVIRIS data of Indian Pines data set, we carried out an experiment to investigate the influence of the spatial feature on the classification performance. Available training and test set for sub-Indian Pines are given in Table 1, and assess the performance of classification by computing the OA and the Kappa value on the available reference data. Tables 2-4 report the classification scores and Kappa values achieved by the SSF-SVM method with different M , µ, and d for the sub-AVIRIS data of Indian Pines data set, where the OA results are displayed for different parameters µ. The best OA and Kappa results of row were marked 16

bold. Table 1. Information classes and training-test samples for the sub-Indian Pines data set. No

Name

Train

Test

1

Corn

100

137

2

Grass-trees

100

630

3

Grass-pasture-mowed

10

18

4

Hay-windrowed

100

378

5

Wheat

100

105

6

Woods

100

1165

7

Buildings-Grass-Trees-Drives 100

286

8

Stone-Steel-Towers

83

10

Table 2. OA (%) and Kappa results for sub-AVIRIS Indian Pines data set using SSF-SVM algorithm with d = 25. µ

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

M=3 OA

84.30

85.47

87.51

88.44

88.54

88.79

89.33

89.69

89.94

0.7918

0.8073

0.8336

0.8460

0.8476

0.8511

0.8578

0.8624 0.8655

84.94

86.55

86.76

86.65

87.40

88.47

89.54

89.69

0.8010

0.8214

0.8245

0.8230

0.8325

0.8467

0.8607

0.8623 0.8637

Kappa M=5 OA Kappa

89.79

As shown in Fig.6, the OA results are increased at first when the value of d ∈ [20, 40] in terms of SSF-SVM (5 × 5) method. It indicates that the proposed SSF-SVM approach can significantly improve classification accuracy, in comparison with the SVM. In order to show the classification results in more details, Table 2 lists the OA and Kappa 17

Table 3. OA (%) and Kappa results for sub-AVIRIS Indian Pines data set using SSF-SVM algorithm with d = 40. µ

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

M=3 OA

85.72

86.19

86.69

86.76

87.94

89.11

89.54

89.90

89.83

0.8115

0.8174

0.8237

0.8245

0.8394

0.8549

0.8604 0.8648 0.8638

88.44

88.54

89.08

89.40

88.94

89.33

89.61

0.8463

0.8475

0.8546

0.8588

0.8529

0.8579

0.8614 0.8695 0.8670

Kappa M=5 OA Kappa

90.26

90.08

Table 4. OA (%) and Kappa results for sub-AVIRIS Indian Pines data set using SSF-SVM algorithm with d = 45. µ

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

M=3 OA

85.55

81.48

86.55

87.97

88.62

89.04

89.11

90.04

89.97

0.8096

0.7616

0.8220

0.8398

0.8485

0.8540

0.8549 0.8667 0.8657

88.58

88.94

89.26

89.51

89.19

89.26

89.29

89.65

90.04

0.8485

0.8529

0.8571

0.8604

0.8562

0.8571

0.8574

0.8617

0.8666

Kappa M=5 OA Kappa

90.4 90.2

Overall accuracy (%)

90 89.8 89.6 89.4 89.2 89 SVM SSF−SVM (3x3) SSF−SVM (5x5)

88.8 88.6 20

25 30 35 40 45 Number of reduced spectral vector (AVIRIS Indian Pines)

50

Figure 6. Overall classification accuracies obtained for the AVIRIS Indian Pines data set using SSF-SVM and SVM methods. 18

results in different µ by the proposed SSF-SVM approach with d = 25, and Tables 3 and 4 show the results with d = 40 and d = 45, respectively. According to the tables, the M = 3 (or M = 5) indicates the AMF by using a 3×3 (or 5×5) window run across the hyperspectral imagery from left to right and top to bottom. The OA value rises slowly when the value µ ∈ [0.1, 0.5] whenever M = 3 or M = 5. It is noticeable that, first, by including spectral and spatial information, the classification results are greatly improved in all cases, second, the OA results with M = 5 is superior to with M = 3 when d = 40 and d = 45. For example, with d = 40 and µ = 0.8, the classification accuracy is 89.90% as shown in Table 3, while the classification accuracy is 90.26% (with M = 5). Therefore, the AMF plays an important role in the classification. Fig.7 shows the classification maps obtained by SSF-SVM and SVM approaches for the sub-AVIRIS Indian Pines data set. It is noticeable that the classification maps obtained by SSF-SVM methods have better performance in the homologous areas, especially in the Corn, grass-trees and Woods. Based on the above analysis, the optimal choice of M and µ is still an open question. So, the classification rates with respect to M and µ are shown in Fig.8. As shown in Fig.8, with the increase of M (window) and µ, the accuracy is increased because of the spatial information is introduced in the training process. However, when µ reaches 0.9 there is a peak in the accuracy curve. As µ continues to increase, the accuracy begins to fall off slowly.

19

Figure 7. Classification maps for all the methods with the sub-AVIRIS Indian Pines data set. (a) Reference land-cover, (b) Classification map using the SVM, (c) Classification map using the proposed SSF-SVM algorithm with M = 3, µ = 0.8, and d = 45, (d) Classification map using the proposed method with M = 5, µ = 0.8, and d = 40.

20

Overall accuracy (%)

90 91

89.5

90

89

89

88.5

88 88 87 87.5

86

87

85 5 4.5

1 4

0.5

3.5 window

3

0

86.5 86

miu

Figure 8. Classification rates with respect to parameter M and µ for the AVIRIS Indian Pines data set using SSF-SVM method.

3.5

The ROSIS data of Pavia University

In the second experiment, the proposed SSF-SVM method was evaluated using the hyperspectral data of Pavia University. Available training and test set for the Pavia University are given in Table 5. Tables 6-8 summarize the accuracies produced by the proposed method (SSF-SVM with M = 3 and SSF-SVM with M = 5). According to Tables 6-8, with d = 20, the accuracy of the proposed SSF-SVM algorithm generally increase as µ increases, while, with d = 30 and d = 50, the accuracy of the proposed SSF-SVM algorithm generally slight fluctuate as µ increases. However, the best OA and Kappa results are given in Table 7 with d = 30, µ = 0.5. In addition, the spatial information plays an important role in classification as M increases, see the Table 8. Fig.9 shows the classification maps. It can be seen that the proposed SSF-SVM algorithm outperforms, providing smoother classification maps than SVM. In order to show the classification results in more details, Fig.10 shows the OA curves of SSF-SVM and SVM methods with respect to the number of reduced spectral vector (viz. M ) 21

Table 5. Information classes and training-test samples for the Pavia University data set. No Name

Train

Test

1

Asphalt

548

6304

2

Meadows

540

18146

3

Gravel

392

1815

4

Trees

524

2912

5

Metal sheets 265

1113

6

Bare soil

532

4572

7

Bitumen

375

981

8

Bricks

514

3364

9

Shadow

231

795

Table 6. OA (%) and Kappa results for Pavia University data set using SSF-SVM algorithm with d = 20. µ

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

M=3 OA

79.72

79.52

80.23

79.52

80.30

80.32

80.39

80.40

80.52

0.7437

0.7412

0.7495

0.7412

0.7502

0.7505

0.7512

0.7514 0.7528

79.72

79.52

80.23

80.22

80.30

80.32

80.39

80.41

0.7437

0.7412

0.7495

0.7494

0.7502

0.7505

0.7512

0.7514 0.7528

Kappa M=5 OA Kappa

80.52

for the ROSIS data of University of Pavia. The proposed SSF-SVM algorithm can achieve more competitive results compared to the SVM in terms of Fig.10. Similarly, the optimal choice of M and µ will be discussed in the Fig.11. From the Fig.11, we found that fluctuations in the overall accuracies as the M and µ. However, the OA with

22

Table 7. OA (%) and Kappa results for Pavia University data set using SSF-SVM algorithm with d = 30. µ

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

M=3 OA

80.56

80.04

79.51

80.04

80.90

79.51

80.85

80.04

80.87

0.7534

0.7474

0.7405

0.7474 0.7573 0.7405

0.7568

0.7434

0.7570

80.56

80.04

80.04

80.04

80.85

80.85

80.87

0.7534

0.7474

0.7474

0.7474 0.7573 0.7475

0.7568

0.7568

0.7570

Kappa M=5 OA Kappa

80.90

80.05

Table 8. OA (%) and Kappa results for Pavia University data set using SSF-SVM algorithm with d = 50. µ

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

M=3 OA

79.92

79.58

79.58

79.58

79.47

79.15

79.47

79.58

79.43

0.7421

0.7421

0.7405

0.7364

0.7404

0.7420

0.7401

79.58

79.59

79.58

79.47

79.48

79.58

79.44

0.7421

0.7422

0.7422

0.7404

0.7405

0.7420

0.7402

Kappa 0.7461 0.7421 M=5 OA

79.92

79.59

Kappa 0.7461 0.7422

µ ≥ 0.5 is higher than OA with µ < 0.5. In addition, it is noticeable that the biggest range of OA is less than 1.5 in terms of Fig.11. That is to say, the OA slight fluctuate as M and µ.

3.6

The AVIRIS data of subscene of Salinas Valley

In the last experiment, we used the AVIRIS data of subscene of Salinas Valley (we call it Salinas-A) to evaluate the proposed SSF-SVM algorithm. Available training and test set for the AVIRIS data of subscene of Salinas Valley are given in Table 9. Tables 10-12 show the

23

Figure 9. Classification maps for all the methods with the ROSIS data of University of Pavia. (a) Reference land-cover, (b) Classification map using the SVM, (c) Classification map using the proposed SSF-SVM algorithm with M = 3, µ = 0.5, and d = 30, (d) Classification map using the proposed SSF-SVM algorithm with M = 5, µ = 0.5, and d = 30.

24

81 80.8

Overall accuracy (%)

80.6 80.4 80.2 80 79.8

SVM SSF−SVM (3x3) SSF−SVM (5x5)

79.6 79.4 20

25 30 35 40 45 50 Number of reduced spectral vector (ROSIS data of University of Pavia)

Figure 10. Overall classification accuracies obtained for the ROSIS data of University of Pavia using SSF-SVM and SVM methods.

80.8

Overall accuracy (%)

81

80.6

80.4

80.5

80.2 80 80 79.5 5

79.8 4.5

1 4

0.5

3.5 window

3

0

79.6

miu

Figure 11. Classification rates with respect to parameter M and µ for the ROSIS data of University of Pavia using SSF-SVM method.

25

classification and Kappa values achieved by the proposed SSF-SVM algorithm with different M , µ and d for the AVIRIS data of subscene of Salinas Valley. Table 9. Information classes and training-test samples for the AVIRIS data of subscene of Salinas Valley. No Name

Train

Test

1

Brocoli-green-weeds-1

100

291

2

Corn-senesced-green-weeds 100

1243

3

Lettuce-romaine-4wk

100

516

4

Lettuce-romaine-5wk

100

1425

5

Lettuce-romaine-6wk

100

574

6

Lettuce-romaine-7wk

100

699

Table 10. OA (%) and Kappa results for the AVIRIS data of subscene of Salinas Valley using SSF-SVM algorithm with d = 20. µ

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

M=3 OA

90.25

89.45

97.41

97.70

96.63

87.36

92.02

84.69

87.30

0.8756

0.8655

0.9671 0.9707 0.9574

0.8411

0.8994

0.8086

0.8408

83.30

88.35

84.90

88.65

92.44

84.94

87.34

0.7865

0.8515

0.8078 0.9754 0.9648

0.8571

0.9047

0.8119

0.8414

Kappa M=5 OA Kappa

98.06

97.22

It can be seen from Tables 10-12 that the parameter µ plays an important role in classification for the AVIRIS data of subscene of Salinas Valley and the proposed SSF-SVM algorithm can produce the most accurate classification results are OA: 97.70% and OA: 98.06% with d = 20, µ = 0.4, M = 3 and M = 5, respectively. In addition, when the 26

Table 11. OA (%) and Kappa results for the AVIRIS data of subscene of Salinas Valley using SSF-SVM algorithm with d = 30. µ

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

M=3 OA

97.22

95.35

97.56

97.14

97.26

89.15

93.13

85.36

88.40

0.9647

0.9408 0.9690 0.9637

0.9653

0.8632

0.9134

0.8172

0.8546

96.29

96.40

97.26

97.60

88.10

91.95

94.31

85.30

88.52

0.9529

0.9543

0.9652

0.9696 0.8498

0.8985

0.9282

0.8162

0.8561

Kappa M=5 OA Kappa

Table 12. OA (%) and Kappa results for the AVIRIS data of subscene of Salinas Valley using SSF-SVM algorithm with d = 40. µ

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

M=3 OA

96.21

96.55

82.14

82.29

91.79

84.75

85.11

85.85

88.84

0.9518 0.9561 0.7794

0.7811

0.8968

0.8118

0.8150

0.8233

0.8601

96.76

96.74

96.97

82.81

85.05

97.62

95.89

86.12

88.92

0.9588

0.9586

0.9615

0.7884

0.8155 0.9698 0.9481

0.8264

0.8611

Kappa M=5 OA Kappa

27

value of parameter µ does not exceed 0.5, the proposed SSF-SVM approach can produce the best classification results. That is to say, the spatial features plays an important role in classification for HIC.

Figure 12. Classification maps for all the methods with the AVIRIS data of subscene of Salinas Valley. (a) Reference land-cover, (b) Classification map using the SVM, (c) Classification map using the proposed method SSF-SVM with M = 3, µ = 0.4, and d = 20, (d) Classification map using the proposed method with M = 5, µ = 0.4, and d = 20.

Fig.12 shows the classification maps. We can observe clearly from Fig.12 that the Cornsenesced-green-weed class, Lettuce-romaine-4wk class, and Lettuce-romaine-5wk class are not classified satisfactorily by SVM. The proposed SSF-SVM method, however, can produce

28

Overall accuracy (%)

100

95

90

SVM SSF−SVM (3x3) SSF−SVM (5x5) 85 20 25 30 35 40 45 50 Number of reduced spectral vector (AVIRIS data of subscene of Salinas Valley)

Figure 13. Overall classification accuracies obtained for the AVIRIS data of subscene of Salinas Valley using SSF-SVM and SVM methods.

96 100 Overall accuracy (%)

94 95 92 90

90

85

88

80 5

86 4.5

1 4 3.5

window

84

0.5 3

0

miu

Figure 14. Classification rates with respect to parameter M and µ for the AVIRIS data of subscene of Salinas Valley using SSF-SVM method.

29

the most accurate classification results for these regions. Fig.13 shows the OA curves of proposed SSF-SVM and SVM approaches with respect to the number of reduced spectral vector (viz. M ) for the AVIRIS data of subscene of Salinas Valley. It can be seen that the proposed SSF-SVM method can achieve more competitive results compared to the SVM in terms of Fig.13. The optimal choice of M (viz. window) and µ issue of Fig.14, which can be similar to the issue we previously discussed in experiment 1 and 2. As shown in Fig.14, for the parameter µ, the accuracy is increasing with decrease of µ when M is big, which explains the importance of spatial information. From above mentioned analysis, it is obviously that the performance of proposed SSFSVM method outperforms original SVM method. The use of spatial information in a classifier, by incorporating the spatial features and performing AMF operation, significantly reduces noise in the original classification image. The classification maps obtained by the SSF-SVM approach are different in terms of Figs.7(b, c and d), 9(b, c and d), 12(b, c and d). That is explained by the fact that the obtained image by using PCA and AMF operation has already removed most of the noise in the original data and added more spatial information. These results have shown that the proposed SSF-SVM classification algorithm, using PCA and AMF operation, leads to improved classification accuracies and more homogeneous objects in the resulting classification maps when compared to the SVM classification method. The approach is particularly suitable for classification of large spatial structures in the image. In this paper, the classification accuracies of proposed SSF-SVM method are higher than SVM method. In particular, when we compare the obtained results with the SVM classification results, the proposed method leads to significantly higher global accuracies and

30

to higher class-specific accuracies for most of the classes.

4

Conclusion

In this paper, an efficient spectral-spatial kernel-based method is proposed for hyperspectral imagery classification. The spatial feature was extracted by using area median filtering (AMF). And then, the result of the AMF, was used to construct spatial feature patch according to different window sizes. Finally, using the kernel technique, the spectral feature and the spatial feature are jointly used for the classification through a support vector machine (SVM) formulation. To evaluate the effectiveness of the proposed SSF-SVM method, experiments were conducted to compare its results with standard SVM using three hyperspectral imagery. It was found that the performance of the proposed approach is influenced by the window patches size (viz. M ) and µ. At the same time, the proposed SSF-SVM method is applied to achieve better classification. It should be noted that the improvement is possible with the proposed SSF-SVM method and can provide excellent performance for hyperspectral imagery classification.

Acknowledgments The authors would like to thank the anonymous referees and the editor for their valuable opinions. And this work is supported by the Graduate Innovation Foundation of Jiangsu Province under Grant Nos. KYLX16 0781 and CXZZ13 0239, the 111 Project under Grant No. B12018, and PAPD of Jiangsu Higher Education Institutions. 31

References [1] Bioucas-Dias, J., Plaza, A., Camps-Valls, G., Scheunders, P., Nasrabadi, N., Chanussot, J., Hyperspectral remote sensing data analysis and future challenges. Geosci. Remote Sens. Mag., IEEE 1 (2), (2013), 6-36. [2] Tan, K., Li, E., Du, Q., Du, P., Hyperspectral image classification using band selection and morphological profiles. Sel. Top. Appl. Earth Observations Remote Sens., IEEE J. 7 (1), (2013), 40-48. [3] van der Meer, F.D., van der Werff, H., van Ruitenbeek, F.J., Hecker, C.A., Bakker, W.H., Noomen, M.F., van der Meijde, M., Carranza, E.J.M., Smeth, J., Woldai, T., Multi-and hyperspectral geologic remote sensing: a review. Int. J. Appl. Earth Observation Geoinf. 14 (1), (2012), 112-128. [4] Du, P.J., Xia, J.S., Zhang, W., Tan, K., Liu, Y., Liu, S.C., Multiple classifier system for remote sensing image classification: a review. Sensors 12 (4), (2012), 4764-4792. [5] Li, J., Marpu, P.R., Plaza, A., Bioucas-Dias, J.M., Benediktsson, J.A., Generalized composite kernel framework for hyperspectral image classification. IEEE Trans. Geosci. Remote Sens. 51 (9), (2013), 4816-4829. [6] Zhong, Y.F., Zhang, L.P., An adaptive artificial immune network for supervised classification of multi-/hyperspectral remote sensing imagery. IEEE Trans. Geosci. Remote Sens. 50 (3), (2012), 894-909. [7] Villa, A., Chanussot, J., Benediktsson, J.A., Jutten, C., Dambreville, R., Unsupervised methods for the classification of hyperspectral images with low spatial resolution. Pattern Recognit. 46 (6), (2013), 1556-1568. [8] Du, B., Zhang, L.P., Target detection based on a dynamic subspace. Pattern Recognit. 47 (1), (2014), 344-358. [9] Camps-Valls, G., Gomez-Chova, L., Munoz-Mari, J., Rojo-Alvarez, J.L., MartinezRamon, M., Kernel-based framework for multitemporal and multisource remote sensing data classification and change detection. IEEE Trans. Geosci. Remote Sens. 46 (6), (2008), 1822-1835. [10] Bioucas-Dias, J.M., Nascimento, J.M.P., Hyperspectral subspace identification. IEEE Trans. Geosci. Remote Sens. 46 (8), (2008), 2435-2445. [11] Plaza, A., Benediktsson, J.A., Boardman, J.W., Brazile, J., Bruzzone, L., Camps-Valls, G., Chanussot, J., Fauvel, M., Gamba, P., Gualtieri, A., Marconcini, M., Tilton, J.C., Trianni, G., Recent advances in techniques for hyperspectral image processing. Remote Sens. Environ. 113, (2009), S110-S122. [12] Fauvel, M., Benediktsson, J.A., Chanussot, J., Sveinsson, J.R., Spectral and spatial classification of hyperspectral data using SVMs and morphological profiles. IEEE Trans. Geosci. Remote Sens. 46 (11), (2008), 3804-3814.

32

[13] Camps-Valls, G., Gomez-Chova, L., Munoz-Mari, J., Vila-Frances, J., Calpe-Maravilla, J., Composite kernels for hyperspectral image classification. IEEE Geosci. Remote Sens. Lett. 3 (1), (2006), 93-97. [14] Du, P.J., Tan, K., Xing, X.S., Wavelet SVM in reproducing kernel hilbert space for hyperspectral remote sensing image classification. Opt. Commun. 283 (24), (2010), 49784984. [15] Hughes, G., On the mean accuracy of statistical pattern recognizers. IEEE Trans. Inf. Theory 14 (1), (1968), 55-63. [16] Dopido, I., Li, J., Marpu, P.R., Plaza, A., Dias, J.M.B., Benediktsson, J.A., Semisupervised self-learning for hyperspectral image classification. IEEE Trans. Geosci. Remote Sens. 51 (7), (2013), 4032-4044. [17] Li, J., Bioucas-Dias, J.M., Plaza, A., Semisupervised hyperspectral image classification using soft sparse multinomial logistic regression. IEEE Geosci. Remote Sens. Lett. 10 (2), (2013), 318-322. [18] Bruzzone, L., Chi, M.M., Marconcini, M., A novel transductive SVM for semisupervised classification of remote-sensing images. IEEE Trans. Geosci. Remote Sens. 44 (11), (2006), 3363-3373. [19] Bai, J., Xiang, S.M., Pan, C.H., A graph-based classification method for hyperspectral images. IEEE Trans. Geosci. Remote Sens. 51 (2), (2013), 803-817. [20] Camps-Valls, G., Marsheva, T.V.B., Zhou, D.Y., Semi-supervised graph-based hyperspectral image classification. IEEE Trans. Geosci. Remote Sens. 45 (10), (2007), 3044-3054. [21] Dopido, I., Li, J., Marpu, P.R., Plaza, A., Dias, J.M.B., Benediktsson, J.A., Semisupervised self-learning for hyperspectral image classification. IEEE Trans. Geosci. Remote Sens. 51 (7), (2013), 4032-4044. [22] Fauvel, M., Tarabalka, Y., Benediktsson, J.A., Chanussot, J., Tilton, J.C., Advances in spectral-spatial classification of hyperspectral images. Proc. IEEE 101 (3), (2013), 652-675. [23] Benediktsson, J.A., Palmason, J.A., Sveinsson, J.R., Classification of hyperspectral data from urban areas based on extended morphological profiles. IEEE Trans. Geosci. Remote Sens. 42 (3), (2005), 480-491. [24] Dalla Mura, M., Benediktsson, J.A., Waske, B., Bruzzone, L., Morphological attribute profiles for the analysis of very high resolution images. IEEE Trans. Geosci. Remote Sens. 48 (10), (2010), 3747-3762. [25] Dalla Mura, M., Villa, A., Benediktsson, J.A., Chanussot, J., Bruzzone, L., Classification of hyperspectral images by using extended morphological attribute profiles and independent component analysis. IEEE Geosci. Remote Sens. Lett. 8 (3), (2011), 542-546. [26] Bach, F., Lanckriet, G., Jordan, M., Multiple kernel learning, conic duality, and the SMO algorithm. in Proc. Int. Conf. Mach. Learn. (2004), 41-48.

33

[27] Tuia, D., Camps-Valls, G., Matasci, G., Kanevski, M., Learning relevant image features with multiple-kernel classification. IEEE Trans. Geosci. Remote Sens. 48 (10), (2010), 3780-3791. [28] Gu, Y., Wang, C., You, D., Zhang, Y., Wang, S., Zhang, Y., Representative multiple kernel learning for classification in hyperspectral imagery. IEEE Trans. Geosci. Remote Sens. 50 (7), (2012), 2852-2865. [29] Tarabalka, Y., Fauvel, M., Chanussot, J., Benediktsson, J.A., SVM and MRF-based method for accurate classification of hyperspectral images. IEEE Geosci. Remote Sens. Lett. 7 (4), (2010), 736-740. [30] Tarabalka, Y., Chanussot, J., Benediktsson, J.A., Segmentation and classification of hyperspectral images using minimum spanning forest grown from automatically selected markers. IEEE Trans. Syst., Man, Cybern. B, Cybern. 40 (5), (2010), 1267-1279. [31] Tarabalka Y., Benediktsson J.A., Chanussot J., et al., Multiple spectral-spatial classification approach for hyperspectral data. Geoscience and Remote Sensing, IEEE Transactions on. 48 (11), (2010), 4122-4132. [32] Tarabalka Y., Benediktsson J.A., Chanussot J., Spectral-spatial classification of hyperspectral imagery based on partitional clustering techniques. Geoscience and Remote Sensing, IEEE Transactions on. 47 (8), (2009), 2973-2987. [33] Chen C., Li W., Su H., et al., Spectral-spatial classification of hyperspectral image based on kernel extreme learning machine. Remote Sensing. 6 (6), (2014), 5795-5814. [34] Chang, C.-I., Safavi, H., Progressive dimensionality reduction by transform for hyperspectral imagery. Pattern Recognit. 44 (10/11), (2011), 2760-2773. [35] Hwang H., Haddad R.A., Adaptive median filters: new algorithms and results. IEEE Transactions on image processing, 4 (4), (1995), 499-502. [36] Loupas T., McDicken W.N., Allan P.L., An adaptive weighted median filter for speckle suppression in medical ultrasonic images. Circuits and Systems, IEEE Transactions on, 36 (1), (1989), 129-135. [37] Ibrahim H., Kong N.S.P., Ng T.F., Simple adaptive median filter for the removal of impulse noise from highly corrupted images. Consumer Electronics, IEEE Transactions on, 54 (4), (2008), 1920-1927.

34