Cross-task extreme learning machine for breast cancer image classification with deep convolutional features

Cross-task extreme learning machine for breast cancer image classification with deep convolutional features

Biomedical Signal Processing and Control 57 (2020) 101789 Contents lists available at ScienceDirect Biomedical Signal Processing and Control journal...

2MB Sizes 0 Downloads 19 Views

Biomedical Signal Processing and Control 57 (2020) 101789

Contents lists available at ScienceDirect

Biomedical Signal Processing and Control journal homepage: www.elsevier.com/locate/bspc

Cross-task extreme learning machine for breast cancer image classification with deep convolutional features Pin Wang a,∗ , Qi Song a , Yongming Li a , Shanshan Lv a , Jiaxin Wang a , Linyu Li a , HeHua Zhang b a b

College of Communication Engineering, Chongqing University, Chongqing, 400030, PR China Institute of Surgery Research, Daping Hospital, Army Medical University(Third Military Medical University), Chongqing, 400038, PR China

a r t i c l e

i n f o

Article history: Received 25 June 2019 Received in revised form 14 October 2019 Accepted 16 November 2019 Keywords: Breast cancer histopathology images Uninvolved images Convolutional neural networks Double deep transfer learning Interactive cross-task extreme learning machine

a b s t r a c t Automatic classification of breast histopathology images plays a key role in computer-aided breast cancer diagnosis. However, feature-based classification methods rely on the accurate cell segmentation and feature extraction. Due to overlapping cells, dust, impurities and uneven irradiation the accurate segmentation and efficient feature extraction are still challenging. In order to overcome the above difficulties and limited breast histopathology images, in this paper, a hybrid structure which includes a double deep transfer learning (D2 TL) and interactive cross-task extreme learning machine (ICELM) is proposed based on feature extraction and representation ability of CNN and classification robustness of ELM. First, high level features are extracted using deep transfer learning and double-step deep transfer learning. Then, the high level feature sets are jointly used as regularization terms to further improve classification performance in interactive cross task extreme learning machine. The proposed method was tested on 134 breast cancer histopathology images. Results show that our method has achieved remarkable performance in classification accuracy (96.67%, 96.96%, 98.18%). From the experiment result, the proposed method is promising for providing an efficient tool for breast cancer classification in clinical settings. © 2019 Elsevier Ltd. All rights reserved.

1. Introduction Breast cancer is the most frequently diagnosed cancer with high morbidity and mortality among women [1]. Breast cancer patients account for 25.2%, which is ranked first place among women patients according to the International Agency for Research on Cancer, WHO. Early diagnosis and treatment can assist breast cancer to be effectively treated [2]. The morbidity and mortality are expected to decrease significantly. Histopathology examination remains the gold-standard for cancer detection. Nevertheless, manual classification for breast cancer relies on the professional background and rich experience of pathologists which is costly and may cause misdiagnosis. In order to improve the clinicians’ accuracy, the Computer Aided Diagnosis (CAD) have been developed as an effective method to diagnose breast cancer. Previously, the most common CAD methods for breast cancer screening are feature-based methods based on prior knowledge and expert guidance. The feature-based methods comprise three

∗ Corresponding author. E-mail address: [email protected] (P. Wang). https://doi.org/10.1016/j.bspc.2019.101789 1746-8094/© 2019 Elsevier Ltd. All rights reserved.

steps: breast cancer cell segmentation, feature extraction and classification. These methods first separate the cell nuclei by segmentation, then extract the features of the region of interest and perform feature selection, and finally classify the images based on the features. Accurate cell segmentation is crucial to the performance of the methods [3–5]. Moreover, the meaningful and representative features are the key for breast cancer cell classification [6–8]. Then, the methods like K-means [9], Rotation Forest [10] and neural network [11] methods have been applied to the classification of histopathology images. However, accurate segmentation still remains a challenge due to overlapping cells, dust, impurities and uneven irradiation. The extracted features are hand-crafted features which can’t fully descript and represent the cell nuclei. Compared to the conventional machine learning methods for feature extraction, Convolutional Neural Network (CNN) can automatically learn and discover discriminative and representative information from raw data [12]. The CNN models have been widely used in medical image classification and yielded satisfied classification performance [13–15]. Large datasets are crucial to the high-performance of CNN. However, the amount of breast cancer histopathology image is limited. Transfer learning based on CNN has been developed to deal with the problem of limited histopathol-

2

P. Wang, Q. Song, Y. Li et al. / Biomedical Signal Processing and Control 57 (2020) 101789

Fig. 1. The overall diagram of the proposed D2 TL ICELM method. The convolution layer, activation layer, batch normalization layer and pool layer are embedded. The output y denotes the prediction label (1 or 0) for breast cancer histopathology images.

ogy images [16–19]. However, there are still some problems for classification of breast histopathology images using single deep transfer learning. The fully connection layer of CNN is likely to be over trained leading to degradation of its generalization performance when it’s trained on limited breast cancer images. Moreover, the traditional transfer learning usually utilizes natural images which are not related to the target images. To deal with the problems in single transfer learning, the double-step transfer learning (DSTL) which takes the similarity between source domain and target domain into consideration is proposed. Because of the classification robustness of ELM, ELM is used. In order to further improve the classification performance, an interactive cross-task extreme learning machine (ICELM) which jointly utilizes the feature sets from double step transfer learning and single transfer learning is proposed. Moreover, in the proposed ICELM method, both of the source loss and target loss have been taken into account for the classification. In this paper, a hybrid structure which includes a double deep transfer learning (D2 TL) and interactive cross-task extreme learning machine (ICELM) is proposed for classification of breast histopathology images. This method jointly utilizes feature representation ability of CNN and classification robustness of ELM [20,21]. Firstly, transfer learning (TL) and double-step transfer learning (DSTL) of CNN are utilized to extract two deep hierarchical feature representations for the same breast cancer target data. After high level feature extraction, these two feature sets are applied as regulation terms in interactive cross-task extreme learning machine which can make full use of the extracted discriminative

features to improve the stability and classification accuracy of the proposed method. Our contributions are summarized as follows. 1) A new hybrid structure including a double-deep transfer learning (D2 TL) and interactive cross-task extreme learning machine (ICELM) is proposed for classification of breast histopathology images.2) CNN models with transfer learning and double-step transfer learning are designed for feature extraction of breast cancer cell images. 3) Interactive cross-task ELM is designed to improve the classification accuracy based on the designed loss functions with deep convolutional features. 2. Methods 2.1. Notations In this paper, the subscript ‘S’ and ‘T’ represent source and target domains, respectively. Transfer learning feature (TL feature) denotes the feature set generalized by transfer learning algorithm   d×N represents the ∈  based on CNN. XTL F = X1TL F , ......, XN TL F TF feature. Double-step transfer learning feature (DSTL feature) denotes the feature set generalized by double-step transfer learning algorithm based on CNN. XDSTL



F

= X1DSTL F , ......, XN DSTL



F

∈ d×N represents the DSTL feature

P. Wang, Q. Song, Y. Li et al. / Biomedical Signal Processing and Control 57 (2020) 101789

3

2.2. Proposed double deep transfer learning joints interactive cross-task extreme learning machine (D2 TL ICELM) method The overall approach of our method is designed based on feature learning and cross-task joint classification manner. The proposed method includes a deep hierachical convolution feature extraction stage and an interactive cross-task ELM classification stage. In the deep hierachical convolution feature extraction stage, we utilize deep residual network which is an improved convolution neural network as the deep network. In detail, first high level features are extracted using transfer learning and double-step transfer learning. Transfer learning utilizes the breast cancer target data to fine-tune the CNN model which has been pre-trained on ImageNet dataset [22] and extract the high level feature representations of fully connection layer. The proposed double-step transfer learning firstly transfers the pre-trained parameters on ImageNet dataset to the network for BreaKHis dataset [23] named transition source dataset. Then, the breast cancer target data is used to finetune the pre-trained ImageNet BreaKHis CNN model. Finally, the high-level feature sets are jointly used as regularization terms to further improve classification performance in interactive cross task extreme learning machine. The overall structure diagram of the proposed method is shown in Fig. 1. 2.3. Double deep transfer learning of CNN Double deep transfer learning (D2 TL) is designed as deep models with residual modules based on deep residual network and transfer learning theories. 2.3.1. Deep residual networks Deep residual networks have shown impressive representative ability in recent large-scale natural image classification tasks [24,25]. In deep learning, deeper networks are expected to extract deeper hierarchical and more abstract features. The residual modules show high ability and high convergence in solving the problem of poor training in limited image data and gradient disappearance when the network depth increases. The residual module consists of a convolution branch and an identity mapping. As shown in Fig. 1, deep residual network is composed of many residual modules which are mainly convolution layers and other assisted layers. The schematic diagram of the residual module is shown in Fig. 2. The batch normalization layer and pooling layer are embedded. Especially, only the first residual module consists of a pooling layer. Other residual modules are composed of convolution layers and batch normalization layers. In this paper, we focus on the discussion on the backward propagation about loss function. In the backward propagation of deep residual networks, if identity mappings are optimal, the solvers may simply drive the weights toward identity mappings. This can effectively avoid gradient disappearance in training when the gradients propagate backward. Each residual unit can be computed as:





xi+1 = xi + F xi , i =

xi + iT xi

(1)

Fig. 2. Residual module structure diagram.

Where xi denotes the input feature to the i-th residual module and i is the parameters of weights and biases between hidden layers. Minimize the loss function is a crucial step in deep learning. Most deep convolution neural networks utilize maximum likelihood functions to train the networks [25]. The loss function is defined as:





 



J x, y,  = −Ex, y log P y x; 



(2)

d×N is the input data of N samples, Where  X =[x1 , x2 , ......, xN ] ∈  yi ∈ 0, 1 which is binary classification. The hypothesis function denotes the probability of the xi belonging to class 0 or 1. The negative logarithmic function of hypothesis function is exactly the errors between the predicted values and the true values. The hypothesis function h (xi ) is:

⎡    1 p yi = 1 xi ;  = log h (xi ) = log T ⎢ 1 + e− xi h (xi ) = ⎢ T ⎣    e− xi p yi = 0 xi ;  = log 1 − h (xi ) = log 1 + e−

⎤ ⎥ ⎥ (3) ⎦

Tx i

Maximization likelihood function means loss function is negative likelihood function. The used loss function in this paper is defined as the following based on Eq. (2).





J x, y,  = −

1 yi log (h (xi )) + (1 − yi ) log (1 − h (xi )) N N

(4)

i=1

In deep residual network, input xi is replaced by residual unit described in Eq. (1). The loss function in deep residual network is defined as:





J x, y,  = −

1 yi log N N



1 + e−

i=1

 + (1 − yi ) log

e−

Tx

1 + e−

i +xi Tx



1 T x +x i i



i +xi

Fig. 3. The clear setting of the source data and target data in ICELM. The setting of black line denotes setting 1 and the setting of red line denotes setting 2.

(5)

4

P. Wang, Q. Song, Y. Li et al. / Biomedical Signal Processing and Control 57 (2020) 101789

2.3.2. Transfer learning and double-step transfer learning Double deep transfer learning (D2 TL) is composed of transfer learning and double-step transfer learning. Transfer learning is to fine-tune the CNN models pre-trained on large-scale (e.g. ImageNet) dataset [26–28]. It can complete new training tasks on limited training data. According to the existing research results, transfer learning models are advantageous and yield promising results when the annotated data are limited. Double-step transfer learning (DSTL) is an improvement of transfer learning (TL). The source data in TL and DSTL are ImageNet dataset and ImageNet BreaKHis dataset, respectively. The target data in TL and DSTL is the same breast cancer histopathology images. TL model is firstly pre-trained on ImageNet and then the model is fine-tuned on the breast cancer histopathology images. DSTL model firstly applies transfer learning step in which ImageNet is set as the source data and BreaKHis is set as the transition source data. The pretrained model on ImageNet is fine-tuned on BreakHis. Then, the pre-trained TL model is utilized to double fine-tune on the breast cancer histopathology images to yield better classification accuracy. Mathematically, the loss function of TL and DSTL can be calculated based on the mathematical theory of deep learning and the above Eq. (5).





1 yi log N N

J x, y, TL = −





1 + e−

i=1

+ (1 − yi ) log

 

e−



1

Tx

1 + e−

i +xi

Tx



T x +x i i

s.tTL = arg min J xj , yj , ImageNet 

 

ImageNet = arg min J xi , yi , 

i +xi



(6)

 2.4. Interactive cross-task extreme learning machine



Where ImageNet is the optimal parameter of initially training CNN model which provide initial model parameter in transfer learning of TL and transfer learning step one of DSTL. (xi , yi ) denotes the data from ImageNet dataset. TL is the optimal parameter in transfer learning   step of TL and transfer learning step one of DSTL. In TL, xj , yj denotes the data of our breast cancer histopathology





images. In DSTL, xj , yj denotes the data of BreaKHis dataset. And finally, the optimal parameter in DSTL is calculated as:

 

DSTL = arg min J x, y, TL



Based on D2 TL method, the TL feature set and DSTL feature can be obtained.

(7)



Where DSTL denotes the optimal parameter in transfer learning step two of DSTL, TL denotes the optimal parameter pre-trained by previous transfer learning step one of DSTL, (x, y) denotes the data of our breast cancer histopathology images. The complete algorithm of DSTL is summarized in Algorithm 1. Algorithm 1: DSTL Algorithm.

2.4.1. Extreme learning machine ELM is a rapid learning algorithm with a single layer feed-ward neural network (SLFN) that minimizes training error and obtains the minimum weight norm with good generalization performance and a high running speed [29,30]. The only free parameter studied in this algorithm is the weight coefficient ␤ between the hidden layer and the output layer. Based on the parameter, the ELM model can be trained. Different from the traditional SLFN algorithm, ELM randomly generates the input weight w and the hidden layer biases b. Only by determining the number of neurons in the hidden layer and the infinitely differentiable activation function, the output ␤ can be calculated. The ELM learning objective is to get output weight ␤, where Hˇ = T ⇒ ˇ = H+ T

(8)

H is the hidden output matrix. H+ is the Moore-Penrose generalized inverse of H.To solve H+ , ELM can be introduced as the following. The hidden layer output matrix H with L hidden neurons can be computed as

⎛  ⎜ ⎝

H=⎜

h w1T x1 + b1





.. .

h w1T xN + b1





h w2T x1 + b2





h w2T xN + b2



... ..

.

···



h wLT x1 + bL



.. .

h wLT xN + bL

⎞ 

⎟ ⎟ ⎠ N×L

(9)

P. Wang, Q. Song, Y. Li et al. / Biomedical Signal Processing and Control 57 (2020) 101789

5

where X = [x1 , x2 , ......, xN ] ∈ d×N is the input data of N samples, T = [t1 , t2 , ......, tN ] ∈ N×c is the input label, W = [w1 , w2 , ......, wL ] ∈ L×c is the input weight, B = [b1 , b2 , ......, bL ] ∈ L is the hidden layer biases, h (·) is the activation function. With the hidden layer output matrix H, ELM can be formulated as follows: 1   2 1   2 |ˇ | + C · |i | 2 ˇ ∈ L×c 2 N

min

(10)

i=1

s.t.h (xi ) ˇ = ti − i , i = 1, ..., N ⇔ Hˇ = T − 





where  = 1 , ......, N denotes the prediction error matrix to the training data, and C is a penalty constant on the training errors.

ˇ∗ = H + T =

⎧ ⎫  IL×L −1 T ⎪ ⎪ ⎪ H T if N ≥ L ⎪ ⎨ HT H+ C ⎬   ⎪ ⎪ ⎪ ⎩ HT HHT + IN×N T if N < L ⎪ ⎭

(11)

C

where IL×L denotes the identity matrix with size of L.

2.4.2. interactive cross-task ELM algorithm In this paper, we proposed interactive cross-task ELM algorithm based on TL feature and the proposed DSTL feature. ELM aims to solve the output weight problem of single task with only one prediction error matrix. The novel interactive cross-task method (ICELM) aims to solve the problem with prediction error matrix of interactive cross-task named source data and target data (target task). The proposed ICELM inherits the advantage of ELM and further effective use of two kinds of high level features. In the proposed D2 TL method, the transition source domain data (BeraKHis) which is similar to the target data is added to solve the problem. Especially, the source domain data (TL feature or DSTL feature) is another representation of the target data (DSTL feature or TL feature) to assist solve the learning problem. These two data are interactive to generate two prediction error matrix which are set as prediction error matrix of source domain and target domain, respectively. For the same prediction task, we can get two prediction results. Both results jointly vote on the final results to improve the stability and accuracy of the algorithm. The detailed information is in the following. After D2 TL method, we got TL feature set and DSTL feature set. Actually, TL feature set and DSTL feature set are two representations of the same target data in TL and DSTL. They are two different feature sets but represent the same prediction task. It is the key point that the new proposed ICELM algorithm can be achieved. In traditional transfer learning theory, the source data and the target data can’t be the same data [31]. TL feature set and DSTL feature obviously satisfies the condition. So, the TL feature set and DSTL feature set can both be the source data and target data respectively based on the above theories. The clear setting of the source data and target data is shown in the following Fig. 3: As to setting 1, we make the TF feature as the source data and DSTF feature as the target data. As to setting2, we make the DSTF feature as the source data and TF feature as the target data. According to the two setting results, the generalized formulation of the proposed ICELM can be shown as the following problem in Fig. 4. The most critical issue in the above Fig. 4 is the minimization problem marked in red dotted line. The traditional ELM aims to solve a single task learning problem. In this paper, united prediction

Fig. 4. The flow chart of the proposed ICELM method.

method based on ELM was used to solve the minimization problem [32–34]. The method can be shown in the following. 1 

min

ˇ ∈ L×c 2



 

 

|ˇ | +  · ES ˇ +  · ET ˇ 2

(12)

Where the first term denotes the regularization of hyperparameters, the second term denotes the prediction error of source domain and the third term denotes the prediction error of target domain. In this paper, the minimization problem 1 can be solved in the following formulas.

2 1  |ˇ 1 | +  · ES 1 ∈ L×c 2 min

ˇ

where ES





ˇ 1 +  · ET

 DSTL



ˇ1

(13)



ˇ 1 is the prediction error of the TL feature as the



TL

 TL



source data, ET DSTL ˇ 1 is the prediction error of the DSTL feature as the target data corresponding to setting 1. The minimization problem 2 can be solved in the following formulas. min

ˇ

2 ∈ L×c

2 1  |ˇ 2 | +  · ES 2

where ES



DSTL

 DSTL



ˇ 2 +  · ET

 TL



ˇ2

(14)

  

ˇ 2 is the prediction error of DSTL feature as the

source data, ET TL ˇ 2 is the prediction error of TL feature as the target data corresponding to setting 2. According to the ELM theories, the minimization problem 1 can be solved in the following models.

2 1  i 1  |ˇ 1 | +  · |S 2 1 ∈ L×c 2 NS

min

ˇ



Si TL





=h



j

T

DSTL



xSi TL





ˇ

1 − ySi TL , i

 DSTL

T DSTF

2

2  |

j=1

i=1

j

= h xT

NT  2 1  j  | +  · | TF

= 1, ......NS j

ˇ 1 − yT

DSTL

(15)

, j = 1, ......NT

where ˇ 1 denotes the output weights between hidden layer and output layer corresponding to setting1, Si TL denotes the prediction

6

P. Wang, Q. Song, Y. Li et al. / Biomedical Signal Processing and Control 57 (2020) 101789



error of the TL feature as the source training data, h xSi

 TL

ˇ 1 is

the prediction label of the TL feature as the source data, ySi



S y1S , y2S , ......, yN S









j

source data, h xT

TL

∈ NS ×C is the label of the TL feature as the ˇ 1 is the prediction label of DSTL feature

DSTL



j as the target data, yT DSTL



T ∈ y1T , y2T , ......, yN ∈ NT ×C is the label T of the DSTL feature as the target data, C is the number of classes,  and  is the regulation coefficients.

2 1  1  i min |ˇ 2 | +  · |S 2 2 ∈ L×c 2 NS

ˇ



Si





DSTL



j

T

TL





NT  2 1  j  | DSTL | +  ·

2

i=1

= h xSi

DSTL





j

= h xT

TL

T TL

2  |

j=1

ˇ 2 − ySi j

ˇ 2 − yT

TL

DSTL

, i = 1, ......NS

(16)

, j = 1, ......NT

where ˇ 2 denotes the output weights between hidden layer and output layer corresponding to setting 2. Other parameters are similar as above but corresponding to setting 2. The proposed ICELM can be solved by similar solver in ELM. By the following formulas, ˇ 1 and ˇ 2 can be gotten. Based on the problem solved in traditional ELM with regulation and the problem solved in cross-task ELM with regulation. The problem solver in ICELM can be solved. When NS > L, the problem is an over-determined least square problem. The solver of classifier ␤ can be easily solved as



ˇ = 1 +  · HTS HS +  · HTT HT

−1 

 · HTS YS + HTT YT



(17)

When NS < L, the problem is a under-determined least square problem. The solved of classifier ␤ can be easily solved as

3. Experiments and results 3.1. Data set: acquisition and categorization of cell images



ˇ = HTS HS HTT R 1−1 HT HTS − HS HTS



HTT



−1 

HS HTT R 1−1 YT − YS

−1 

R−1 YT − R 1−1 HT HTS HS HTT R 1−1 HT HTS − R 2



+ ···

HS HTT R 1−1 YT − YS

 (18)

where HS denotes the hidden layer matrix of source data, HT denotes the hidden layer matrix of target data, YS denotes the input label of source data, YS denotes the input label of target data, R 1 = HT HTT + I and R 2 = HS HTS + I . After the problem solved, we can get ˇ 1 and ˇ 2 which represent the output weights of setting 1 and setting 2, respectively. Note that they are from two different feature sets which represent the same target data. The predicted output of a new observation can be computed as





max h (z) ˇ 1 h (z) ˇ 2

(19)

The pseudocode of the proposed ICELM can be seen in Algorithm 2. Algorithm 2: ICELM

The breast cancer histopathology images were captured by a digital camera mounted on a microscope. The sections mounted on the microscope were stained by hematoxylin and eosin(H&E) to make the nuclei and cytoplasm visible. The captured RGB images were compressed with JPEG 2000. A total of 134 histopathology images were included, which were categorized as 87 normal cell images (normal cells from normal patients), 24 uninvolved cell images (cytopathological normal cells from malignant patients) and 23 malignant cell images (malignant cells from malignant patients). The implement of classification is categorized into 3 groups, namely, normal VS uninvolved, normal VS malignant and normal VS malignant add with uninvolved images. We utilized 80% of data as the training set which were randomly chosen and remaining 20% data were hold-out. The representative histopathology images of each class are shown in Fig. 5. The breast cancer histopathology image dataset (BreaKHis) is a challenging large-scale dataset that includes 7909 images and eight sub-classes of breast cancer [23]. To match the classification tasks with our target breast cancer image, we merged eight sub-classes of the public dataset into two classes, normal and malignant. The implement of classification is categorized into normal VS malignant. We finally chose 1125 cell images corresponding to the same magnification with our target breast cancer images. The same as the target breast cancer images, 80% of BreaKHis data were chosen as training set and remaining 20% were hold-out.

P. Wang, Q. Song, Y. Li et al. / Biomedical Signal Processing and Control 57 (2020) 101789

7

Fig. 5. The representative histopathology images of each class: (a) normal; (b)uninvolved; (3) malignant.

Table 1 Classification accuracy, sensitivity, specificity and their standard deviation of 10 randomly generated train/test splits for different methods: R, TL, DSTL; classifiers: ResNet, SVM, ELM; and the proposed method: D2 TL ICELM. Type

Normal VS Malignant

Normal VS Uninvolved

Normal VS Malignant + Uninvolved

Method

Acc(%)

Sen(%)

Spe(%)

R-ResNet R-SVM [20] R-ELM [21] TL-ResNet TL-SVM [35] TL-ELM DSTL-ResNet DSTL-SVM DSTL-ELM D2 TL ICELM R-ResNet R-SVM [20] R-ELM [21] TL-ResNet TL-SVM [35] TL-ELM DSTL-ResNet DSTL-SVM DSTL-ELM D2 TL ICELM R-ResNet R-SVM [20] R-ELM [21] TL-ResNet TL-SVM TL-ELM [35] DSTL-ResNet DSTL-SVM DSTL-ELM D2 TL ICELM

78.64 ± 2.98 89.09 ± 0.26 91.82 ± 0.24 93.64 ± 0.21 94.55 ± 0.12 96.36 ± 0.07 95.91 ± 0.18 97.27 ± 0.09 96.82 ± 0.13 98.18 ± 0.05 79.13 ± 4.08 96.52 ± 0.07 94.34 ± 0.76 93.91 ± 0.23 96.96 ± 0.08 95.22 ± 0.13 95.65 ± 0.08 96.09 ± 0.20 96.09 ± 0.05 96.96 ± 0.08 81.48 ± 2.55 92.22 ± 0.09 87.04 ± 0.36 96.30 ± 0.03 95.93 ± 0.01 95.93 ± 0.04 94.81 ± 0.14 96.50 ± 0.02 96.57 ± 0.07 96.67 ± 0.04

58.57 ± 9.44 54.57 ± 6.77 74.00 ± 2.44 96.00 ± 0.64 76.00 ± 2.24 84.00 ± 1.44 92.00 ± 1.76 90.00 ± 1.8 88.00 ± 1.76 92.00 ± 0.96 90.00 ± 9.00 90.00 ± 9.00 98.00 ± 0.36 82.00 ± 13.16 93.00 ± 2.41 88.00 ± 9.00 88.89 ± 4.32 88.00 ± 9.00 96.00 ± 1.44 94.00 ± 1.64 92.72 ± 1.29 91.30 ± 0.44 86.75 ± 0.68 90.39 ± 0.19 90.91 ± 0.17 91.82 ± 0.24 90.00 ± 0.74 90.91 ± 0.17 92.73 ± 0.30 91.82 ± 0.24

85.07 ± 7.42 99.44 ± 0.03 97.06 ± 0.09 92.74 ± 0.53 1.0 ± 0 1.0 ± 0 97.06 ± 0.23 1.0 ± 0 99.41 ± 0.0.3 1.0 ± 0 77.54 ± 5.38 97.54 ± 0.06 94.28 ± 0.94 96.19 ± 0.04 97.62 ± 0.06 96.11 ± 0.04 96.19 ± 0.04 98.97 ± 0.04 96.67 ± 0.0.5 97.58 ± 0.06 72.56 ± 2.21 94.50 ± 0.27 87.00 ± 1.07 1.0 ± 0 99.50 ± 0.02 98.87 ± 0.05 98.13 ± 0.08 1.0 ± 0 99.38 ± 0.04 1.0 ± 0

3.2. Experiment environment

of the model, the accuracy, sensitivity, and specificity are calculated.

The experimental environment includes the software part and the hardware part. For the software part, all the methods were implemented by python language. The convolutional neural network used was based on Keras2.0.8 and tensorflow1.6.0. The system was operated under Ubuntu16.04. For the hardware part, the system was run on an Intel(R)Core(TM) i7-8700 @3.2 GHz and Geforce GTX 1080ti 11GB GPU.

Accuracy = (TP + TN)/(TP + FP + TN + FN

3.3. Results 3.3.1. Evaluation methods We evaluate the breast cancer cell images classification performance using ten random train/test splits to train the D2 TL model. This can eliminate the impact of manual division of train/test sets and get convincing results. To assess the classification performance

Sensitivity = TP/(TP + FN) Specificity = TN/(TN + TP) Where TP, TN, FP and FN indicate the numbers of true positive, true negative, false positive, and false negative, respectively. 3.3.2. Classification results In this paper, the proposed method jointly classifies the breast cancer histopathology images. For convolution neural network training, several experiments are designed as follows: the breast cancer histopathology images were trained on ResNet50 model with randomly initialized parameters(named R), fine-tuned ResNet50 model with pre-trained parameters on ImageNet (named TL) and double step fine-tuned ResNet50 model with pre-trained parameters on ImageNet BreaKHis (named DSTL). For comparison classifiers, the classification results using ResNet network of R, TL

8

P. Wang, Q. Song, Y. Li et al. / Biomedical Signal Processing and Control 57 (2020) 101789

Fig. 6. The classification accuracy results of all methods for different classification tasks: (a)normal VS malignant; (b) normal VS uninvolved; (c) normal VS malignant + uninvolved.

and DSTL are directly applied on the histopathology breast cancer cell images. For the general classifiers, the extracted TL feature and DSTL feature were trained and tested using SVM (named R-SVM, TL-SVM, DSTL-SVM) and ELM (named R-ELM, TL-ELM, DSTL-ELM) on the histopathology breast cancer cell images. Finally, the extracted features were trained and tested using the proposed ICELM based on double deep learning (named D2 TL ICELM) to jointly improve the classification accuracy. For DSTL, the source data is ImageNet BreaKHis which is divided into first step transfer learning and second step transfer learning, respectively. The classification accuracy in first step transfer learning for BreaKHis dataset is 97%. The experimental results of compassion classifiers are reported in Table 1. For better visualization, we provide a Fig. 6 which describes the classification accuracy of all methods. As shown in Table 1, the proposed D2 TL ICELM method gets the highest classification accuracy (96.67%, 96.96%, 98.18%) among

all the methods. In detail, compared to ResNet50 network with random initialized parameters (78.64%, 79.13%, 81.48%), we got significant improvement of classification accuracy with the extracted features from fully connection layer of ResNet 50 by SVM or ELM (SVM: 89.09%, 96.52%, 92.22%; ELM: 91.82%, 94.34%, 87.04%). This shows that the limited histopathology images cannot be effectively distinguished using mere deep convolution neutral network. The combination of high level features extracted from deep models and the outstanding classification properties of the traditional classifiers (SVM or ELM) can full used both advantages of them. Moreover, from the results analysis, this joint method got better and more stable classification performance with lower standard deviation. From Table 1, we can observe that TL method obviously outperforms the R method with mere random initialized parameters. For the task normal VS malignant, the classification accuracy increased from 78.64%, 89.09%, 91.82% to 93.64%, 94.55% and 96.36% using

P. Wang, Q. Song, Y. Li et al. / Biomedical Signal Processing and Control 57 (2020) 101789

ResNet50, SVM and ELM, respectively. The other two tasks also get improvement for classification accuracy. Transfer learning shows great advantages in limited histopathology image classification through the fine tuning of pre-trained parameters. Furthermore, the proposed DSTL method mostly outperforms TL method. DSTL method shows higher classification accuracy and stability compared to TL method. It is demonstrated that the utilization of the similar target data and source data in double step transfer learning get better classification performance.

4. Discussion From the above results analysis, it can be seen that DSTL method shows higher classification accuracy and stability compared to TL method. It demonstrated that the double-step operation with the transition dataset BreakHis can solve the over-training and model degradation problems of limited dataset in single transfer learning. The proposed interactive cross-task extreme learning machine (ICELM) jointly utilizes the TL feature set and DSTL feature set. Moreover, the source loss and target loss are both taken into consideration. Compared to the DSTL method, the proposed ICELM show better classification performance. Moreover, the proposed D2 TL ICELM get the highest classification accuracy which demonstrates that the joint utilization of DSTL and ICELM is effective. For future work, since the feature extraction and classification task are separated, the future investigation should be coherent to reduce computational complexity. Another investigation could concentrate on the improvement of CNN model which can reduce computation and training time.

5. Conclusion In this paper, we propose a novel breast cancer image classification method based on double deep transfer learning and interactive cross-task ELM. The proposed double step transfer learning method is achieved by fine-tuning the pre-trained ImageNet BreaKHis model on target breast cancer histopathology images. Then, the high level features from full connection layers of TL and DSTL are extracted for the same target data (breast cancer histopathology images). In order to achieve better classification performance and stability, these two high level features are jointly utilized in the proposed interactive cross-task ELM. The experimental results demonstrate that the proposed D2 TL ICELM outperforms all the methods and classifiers. We demonstrate the promise of this technique for improving breast cancer diagnosis through the extraction of high level features and the interactive classification method, especially it’s potential in detecting malignant from pathological normal-appearing cell image.

Acknowledgements This research is funded by National Natural Science Foundation of China NSFC (No: 61771080, 61571069), the Fundamental and Advanced Research Project of Chongqing (cstc2018jcyjAX0779), the Fundamental Research Funds for the Central Universities (2019CDQYTX019, 2019CDCGTX306), the Open Project Program of the National Laboratory of Pattern Recognition (NLPR)(201800011).

Declaration of Competing Interest None declared.

9

References [1] S. Mcguire, World Cancer Report 2014. Geneva, Switzerland: World Health Organization, International Agency for Research on Cancer, WHO Press, 2015, Adv. Nutr. 7 (2) (2016) 418. [2] R. Sivaramakrishna, R. Gordon, Detection of breast cancer at a smaller size can reduce the likelihood of metastatic spread: a quantitative analysis, Acad. Radiol. 4 (1) (1997) 8–12, http://dx.doi.org/10.1016/S1076-6332(97)80154-7. [3] A. Mouelhi, M. Sayadi, F. Fnaiech, et al., Automatic image segmentation of nuclear stained breast tissue sections using color active contour model and an improved watershed method, Biomed. Signal Process. Control 8 (5) (2013) 421–436, http://dx.doi.org/10.1016/j.bspc.2013.04.003. [4] S. Hussain, Q. Chun, M.R. Asif, et al., Active contours for image segmentation using complex domain-based approach, IET Image Process. 10 (2) (2016) 121–129, http://dx.doi.org/10.1049/iet-ipr.2014.0730. [5] P. Filipczuk, M. Kowal, A. Obuchowicz, Fuzzy clustering and adaptive thresholding based segmentation method for breast cancer diagnosis, Comput. Recognit. Syst. 95 (4) (2011) 613–622, http://dx.doi.org/10.1007/ 978-3-642-20320-6 64. [6] T. Wan, X. Liu, J. Chen, et al., Wavelet-based statistical features for distinguishing mitotic and non-mitotic cells in breast cancer histopathology, in: IEEE International Conference on Image Processing, IEEE, 2015, pp. 2290–2294, http://dx.doi.org/10.1109/ICIP.2014.7025464. [7] P. Wang, X. Hu, Y. Li, et al., Automatic cell nuclei segmentation and classification of breast cancer histopathology images, Signal Process. 122 (9–10) (2016) 1–13, http://dx.doi.org/10.1016/j.sigpro.2015.11.011. [8] Mustafa Zuhaer AL-Dabagh, Firas H. AL-Mukhtar, Breast cancer diagnostic system based on MR images using KPCA-Wavelet transform and support vector machine, Int. J. Adv. Eng. Res. Sci. 4 (3) (2017) 258–263, http://dx.doi. org/10.22161/ijaers.4.3.41 (ISSN : 2349-6495(P) | 2456-1908(O)). [9] H. Xu, M. Mandal, Epidermis segmentation in skin histopathological images based on thickness measurement and k-means algorithm, Int. J. Image Video Process. 2015 (1) (2015) 1–14, http://dx.doi.org/10.1186/s13640-015-0076-3. ´ Abdulhamit Subasi, Breast cancer diagnosis using GA feature [10] Emina Aliˇckovic, selection and Rotation Forest, Neural Comput. Appl. 28 (4) (2017) 753–763, http://dx.doi.org/10.1007/s00521-015-2103-9. [11] M.L. Huang, Y.H. Hung, W.Y. Chen, Neural network classifier with entropy based feature selection on breast cancer diagnosis, J. Med. Syst. 34 (5) (2010) 865, http://dx.doi.org/10.1007/s10916-009-9301-x. [12] D. Mishkin, N. Sergievskiy, J. Matas, Systematic evaluation of CNN advances on the ImageNet, Comput. Vis. Image Underst. 116 (2017) 11–19, http://dx. doi.org/10.1016/j.cviu.2017.05.007. [13] F.A. Spanhol, L.S. Oliveira, C. Petitjean, et al., Breast cancer histopathological image classification using Convolutional Neural Networks, Proc. Int. Conf. Neural Netw. (2016) 2560–2567, http://dx.doi.org/10.1109/IJCNN.2016. 7727519. [14] A. Rakhlin, A. Shvets, V. Iglovikov, et al., Deep convolutional neural networks for breast Cancer histology image analysis, International Conference Image Analysis and Recognition (2018) 737–744, http://dx.doi.org/10.1007/978-3319-93000-8 83. [15] P. Kisilev, E. Sason, E. Barkan, et al., Medical image description using multi-task-loss CNN[C], International Workshop on Large-Scale Annotation of Biomedical Data and Expert Label Synthesis International Workshop on Deep Learning in Medical Image Analysis (2016) 121–129. [16] H.C. Shin, H.R. Roth, M. Gao, et al., Deep convolutional neural networks for computer-aided detection: CNN architectures, dataset characteristics and transfer learning, IEEE Trans. Med. Imaging 35 (5) (2016) 1285–1298, http:// dx.doi.org/10.1109/TMI.2016.2528162. [17] M. Oquab, L. Bottou, I. Laptev, et al., Learning and Transferring Mid-level Image Representations Using Convolutional Neural Networks[C]// Computer Vision and Pattern Recognition, 2014, pp. 1717–1724, http://dx.doi.org/10. 1109/CVPR.2014.222. [18] H. Ravishankar, P. Sudhakar, R. Venkataramani, et al., Understanding the Mechanisms of Deep Transfer Learning for Medical Images[C]// International Workshop on Large-Scale Annotation of Biomedical Data and Expert Label Synthesis, 2016, pp. 188–196, http://dx.doi.org/10.1007/978-3-319-46976-8 20. [19] Z. Han, B. Wei, Y. Zheng, et al., Breast cancer multi-classification from histopathological images with structured deep learning model, Sci. Rep. 7 (1) (2017) 4172, http://dx.doi.org/10.1038/s41598-017-04075-z. [20] X. Sun, J. Park, K. Kang, et al., Novel hybrid CNN-SVM model for recognition of functional magnetic resonance images, in: IEEE International Conference on Systems, Man and Cybernetics (SMC), IEEE, 2017. [21] M. Duan, K. Li, K. Lia, An Ensemble CNN2ELM for Age Estimation, Ieee Trans. Inf. Forensics Secur. (2017) 99, http://dx.doi.org/10.1109/TIFS.2017.2766583. [22] J. Deng, W. Dong, R. Socher, et al., ImageNet: a large-scale hierarchical image database, IEEE Conference on Computer Vision and Pattern Recognition, 2009. CVPR 2009 (2009) 248–255, http://dx.doi.org/10.1109/CVPR.2009.5206848. [23] F.A. Spanhol, L.S. Oliveira, C. Petitjean, et al., A dataset for breast cancer histopathological image classification, IEEE Trans. Biomed. Eng. 63 (7) (2016) 1455–1462, http://dx.doi.org/10.1109/TBME.2015.2496264. [24] K. He, X. Zhang, S. Ren, et al., Deep Residual Learning for Image Recognition, 2015, pp. 770–778, http://dx.doi.org/10.1109/CVPR.2016.90. [25] S. Pouyanfar, S.C. Chen, M.L. Shyu, An efficient deep residual-inception network for multimedia classification, IEEE International Conference on

10

[26] [27] [28] [29]

[30]

[31]

P. Wang, Q. Song, Y. Li et al. / Biomedical Signal Processing and Control 57 (2020) 101789 Multimedia and Expo (2017) 373–378, http://dx.doi.org/10.1109/ICME.2017. 8019447. S.J. Pan, Q. Yang, A survey on transfer learning, IEEE Trans. Knowl. Data Eng. 22 (10) (2010) 1345–1359, http://dx.doi.org/10.1109/TKDE.2009.191. K. Simonyan, A. Zisserman, Very deep convolutional networks for large-scale image recognition, Comput. Sci. (2014), arXiv:1409.1556v6. J.R. Hershey, J.L. Roux, F. Weninger, Deep unfolding: model-based inspiration of novel deep architectures, Comput. Sci. (2014), arXiv:1409.2574v4. G.B. Huang, Q.Y. Zhu, C.K. Siew, Extreme learning machine: theory and applications, Neurocomputing 70 (1) (2006) 489–501, http://dx.doi.org/10. 1016/j.neucom.2005.12.126. G.B. Huang, H. Zhou, X. Ding, et al., Extreme learning machine for regression and multiclass classification, IEEE Trans. Syst. Man Cybern. B Cybern. 42 (2) (2012) 513–529, http://dx.doi.org/10.1109/tsmcb.2011.2168604. W. Dai, Y. Chen, G.R. Xue, et al., Translated learning: transfer learning across different feature spaces, in: International Conference on Neural Information Processing Systems, Curran Associates Inc, 2008, pp. 353–360.

[32] L. Zhang, Z. He, Y. Liu, Deep object recognition across domains based on adaptive extreme learning machine, Neurocomputing 239 (2017) 194–203, http://dx.doi.org/10.1016/j.neucom.2017.02.016. [33] Y. Liu, L. Zhang, P. Deng, et al., Common subspace learning via cross-domain extreme learning machine, Cognit. Comput. 9 (3) (2017) 1–9, http://dx.doi. org/10.1007/s12559-017-9473-5. [34] I.E.E.E. Fellow, L. Zhang, et al., Domain adaptation extreme learning machines for drift compensation in E-nose systems, IEEE Trans. Instrum. Meas. 64 (7) (2015) 1790–1801, http://dx.doi.org/10.1109/tim.2014.2367775. [35] N. Antropova, B. Huynh, M. Giger, SU-D-207B-06: predicting breast cancer malignancy on dce-mri data using pre-trained convolutional neural networks, Med. Phys. 43 (6) (2016) 3349–3350, http://dx.doi.org/10.1118/1.4955674.