Predicting patient survival after liver transplantation using evolutionary multi-objective artificial neural networks

Predicting patient survival after liver transplantation using evolutionary multi-objective artificial neural networks

G Model ARTMED-1275; No. of Pages 13 ARTICLE IN PRESS Artificial Intelligence in Medicine xxx (2013) xxx–xxx Contents lists available at SciVerse Sci...

700KB Sizes 0 Downloads 23 Views

G Model ARTMED-1275; No. of Pages 13

ARTICLE IN PRESS Artificial Intelligence in Medicine xxx (2013) xxx–xxx

Contents lists available at SciVerse ScienceDirect

Artificial Intelligence in Medicine journal homepage: www.elsevier.com/locate/aiim

Predicting patient survival after liver transplantation using evolutionary multi-objective artificial neural networks Manuel Cruz-Ramírez a,∗ , César Hervás-Martínez a , Juan Carlos Fernández a , ˜ b,c , Manuel de la Mata b,c Javier Briceno a Department of Computer Science and Numerical Analysis, University of Córdoba, Rabanales Campus, Albert Einstein Building 3rd Floor, 14071 Córdoba, Spain b Liver Transplantation Unit, University Hospital Reina Sofía, Avenue Menéndez Pidal, 14004 Córdoba, Spain c CIBERehd, Spain

a r t i c l e

i n f o

Article history: Received 14 September 2011 Received in revised form 4 February 2013 Accepted 5 February 2013 Keywords: Making decisions rule-based Multi-objective evolutionary algorithm Radial basis function neural networks Liver transplantation Organ allocations

a b s t r a c t Objective: The optimal allocation of organs in liver transplantation is a problem that can be resolved using machine-learning techniques. Classical methods of allocation included the assignment of an organ to the first patient on the waiting list without taking into account the characteristics of the donor and/or recipient. In this study, characteristics of the donor, recipient and transplant organ were used to determine graft survival. We utilised a dataset of liver transplants collected by eleven Spanish hospitals that provides data on the survival of patients three months after their operations. Methods and material: To address the problem of organ allocation, the memetic Pareto evolutionary nondominated sorting genetic algorithm 2 (MPENSGA2 algorithm), a multi-objective evolutionary algorithm, was used to train radial basis function neural networks, where accuracy was the measure used to evaluate model performance, along with the minimum sensitivity measurement. The neural network models obtained from the Pareto fronts were used to develop a rule-based system. This system will help medical experts allocate organs. Results: The models obtained with the MPENSGA2 algorithm generally yielded competitive results for all performance metrics considered in this work, namely the correct classification rate (C), minimum sensitivity (MS), area under the receiver operating characteristic curve (AUC), root mean squared error (RMSE) and Cohen’s kappa (Kappa). In general, the multi-objective evolutionary algorithm demonstrated a better performance than the mono-objective algorithm, especially with regard to the MS extreme of the Pareto front, which yielded the best values of MS (48.98) and AUC (0.5659). The rule-based system efficiently complements the current allocation system (model for end-stage liver disease, MELD) based on the principles of efficiency and equity. This complementary effect occurred in 55% of the cases used in the simulation. The proposed rule-based system minimises the prediction probability error produced by two sets of models (one of them formed by models guided by one of the objectives (entropy) and the other composed of models guided by the other objective (MS)), such that it maximises the probability of success in liver transplants, with success based on graft survival three months post-transplant. Conclusion: The proposed rule-based system is objective, because it does not involve medical experts (the expert’s decision may be biased by several factors, such as his/her state of mind or familiarity with the patient). This system is a useful tool that aids medical experts in the allocation of organs; however, the final allocation decision must be made by an expert. © 2013 Elsevier B.V. All rights reserved.

1. Introduction Liver transplantation is an accepted treatment for patients with end-stage chronic liver disease, but it is strongly limited due to the

∗ Corresponding author. Tel.: +34 957 218 349; fax: +34 957 218 630. E-mail address: [email protected] (M. Cruz-Ramírez).

limited availability of suitable liver donors. The imbalance between supply and demand unfortunately results in many waiting-list deaths. Several efforts have been made to expand the donor pool and to better prioritise recipients on waiting lists. Some examples of these efforts are the use of extended criteria donors (donors with extreme values of age, days in the intensive care unit (ICU), inotrope usage, body mass index (BMI) and cold ischemia time) and the adoption of the model for end-stage liver disease (MELD) score [1].

0933-3657/$ – see front matter © 2013 Elsevier B.V. All rights reserved. http://dx.doi.org/10.1016/j.artmed.2013.02.004

Please cite this article in press as: Cruz-Ramírez M, et al. Predicting patient survival after liver transplantation using evolutionary multi-objective artificial neural networks. Artif Intell Med (2013), http://dx.doi.org/10.1016/j.artmed.2013.02.004

G Model ARTMED-1275; No. of Pages 13 2

ARTICLE IN PRESS M. Cruz-Ramírez et al. / Artificial Intelligence in Medicine xxx (2013) xxx–xxx

In recent years, more relaxed criteria have been used for donors, with an accompanying increased risk of recipient and/or graft losses compared with the risk associated with the use of livers from non-extended criteria donors [2]. These risks should be carefully analysed because the combination of several of these marginal factors can result in graft loss [3]. Based on this point of view, Child [4] proposed the Child–Turcotte–Pugh (CTP) score for assessing the severity of a patient’s liver disease. Feng [5] proposed a donor risk index (DRI), with the aim of establishing the quantitative risk associated with various combinations of donor characteristics. The MELD score model, which is based on the sickest-first principle, is the cornerstone of the current allocation policy and has been widely validated [6]. Nevertheless, these methods do not predict mortality after transplantation well. Rana et al. [7] devised a scoring system (SOFT) that predicts recipient survival three months following liver transplantation, which can complement the MELD-predicted waiting-list mortality rates. These methods only consider either the characteristics of the donors or the characteristics of the recipients and do not jointly consider the characteristics of the donor, recipient and transplant organ. Statistics show that donor–recipient pairs (D–R pairs) after three months can be considered a classification problem of a nonbalanced system with two possible solutions: the survival class, which is the most frequent class (approximately 90% of patterns), and the non-survival class, which is an infrequent class. In classifications with two classes, one of the most commonly used methods in biomedicine is logistic regression. Systems based on logistic regression can adequately classify the majority class (favourable events), but their predictive ability for the minor class (adverse events) is poor [8]. The CTP, DRI, MELD score and SOFT score are all based on logistic regression analysis, where we consider the linearity between the characteristics of the D–R pairs and the odds ratio of the survival probability after three months. Donor and graft acceptance (considering organ shortage and pool expansion), prioritisation of candidates (based on waiting-list mortality), and the allocation policy (combining the principles of equity, efficiency and fairness) depict a complex scenario that is not easily modelled. A large number of variables can be considered in a given clinical decision regarding donor and organ acceptance, allocation and donor–recipient matching. The risk of subjectivity and the likelihood of making an erroneous decision cannot be underestimated. Artificial intelligence tools for the decision-making process in liver transplantation can be useful despite their inherent complexity. These logistic regression models have been developed to estimate the risk of death by considering the underlying disease and the urgency of a transplant for a recipient patient, assuming that all donor livers carry the same risk of failure. However, this does not always hold; specifically, it has been shown in recent years that the risk of graft failure and even patient death after transplantation differ among recipients. While some patients may “tolerate” and overcome the initially poor functioning of a compromised donor organ (for example, one received from an extended criteria donor), others may not. Increasing awareness of the diversity in donor organ quality has stimulated the debate regarding the matching of specific recipient and donor factors to avoid not only futility but also personal and institutional differences in organ acceptance. The insufficient supply of deceased-donor livers for transplantation has motivated the expansion of acceptance criteria; the additional organs that are available due to the extension of these criteria are known as “marginal” and “expanded criteria” livers. This policy of aggressive liver utilisation has motivated the derivation of a donor risk index that is a quantitative, objective, and continuous metric of liver quality based on factors that are known or knowable at the time of an organ offer. The aim of this study was to develop a liver allocation system based on donor and recipient matching. There are numerous

motivations for developing this system: (1) current selection/allocation systems are based on the risk of waiting-list patient death and do not recognise distinctions in donor organ quality; (2) efforts to increase the number of organ donations are likely to result in a relatively high proportion of extended criteria donors; (3) matching donors and recipients may offer the prospect of predicting outcomes at the time when a specific donor liver is allocated to a specific recipient; (4) differences in local acceptance rates and policies may be diminished; and (5) overall outcome and efficacy may improve. This liver allocation system was developed using artificial intelligence methods that offer significant advantages over conventional statistical techniques that are limited by several hypotheses associated with the distributions of predictor variables and the relationships that may exist between them. In this study, we used artificial neural network models (ANNs) trained by a multiobjective evolutionary algorithm (MOEA) [9]. The use of ANNs in biomedicine as an alternative to other classification methods has been very common in the last two decades. As a result, ANNs have been used to detect tumours in the small bowel [10], to predict graft survival for heart-lung and thoracic transplantation patients [11,12] and to diagnose cytomegalovirus disease [13]. With the ANN models obtained from the Pareto fronts built by the MOEA, a rule-based system was developed to help medical experts make decisions about liver transplants. This system determines the best match between different D–R pairs, with the aim of maintaining graft survival for three months after the transplant. The goal of this study was to develop a rule-based system for allocating donors to recipients, using all the ANN models extracted from the extremes of the Pareto fronts obtained by the memetic Pareto evolutionary non-dominated sorting genetic algorithm 2 (MPENSGA2). In this MOEA, a local optimisation process is used to improve the prediction of individuals in the population during the evolutionary process. The paper is organised as follows: Section 2 presents a brief description of the materials used, Section 3 describes the MPENSGA2 method, Section 4 describes the experimental design and presents the results obtained, and the conclusions and future research are outlined in Section 5. 2. Materials 2.1. Evolutionary artificial neural networks ANNs [14] have been an object of renewed interest among researchers in statistics and computer science owing to the significant results obtained in a wide range of classification and pattern recognition problems. The research in neural classification has established that neural networks are a promising alternative to various conventional classification methods [15]. Evolutionary computation (EC) is a subfield of artificial intelligence that involves combinatorial optimisation problems. EC uses iterative progress, such as growth or development in a population. This population is then selected in a guided random search until the desired end is achieved. Such processes are often inspired by evolutionary biological mechanisms. In the EC, there are two main operators that form the basis of evolutionary systems: recombination (the generation of a new individual from parents) and mutation (the modification of an individual in the population). EC has been widely used in recent years to evolve neural network architectures and weights. These evolutionary artificial neural networks (EANNs) have many applications [16,17]. EANNs provide a more successful method for optimising network performance and architecture simultaneously. A major advantage of the evolutionary approach over traditional learning algorithms such as the back-propagation algorithm (BP) is the ability to escape a

Please cite this article in press as: Cruz-Ramírez M, et al. Predicting patient survival after liver transplantation using evolutionary multi-objective artificial neural networks. Artif Intell Med (2013), http://dx.doi.org/10.1016/j.artmed.2013.02.004

ARTICLE IN PRESS

G Model ARTMED-1275; No. of Pages 13

M. Cruz-Ramírez et al. / Artificial Intelligence in Medicine xxx (2013) xxx–xxx

local optimum. More advantages include robustness and an ability to adapt to changing environments. In the literature, research into EANNs has usually taken one of three approaches: evolving the weights of the network, evolving the architecture, or evolving both simultaneously [18]. The major disadvantage of the EANN approach is that it is computationally expensive, and the evolutionary approach is therefore usually slow. Hybrid techniques were used to speed up the slow convergence of the evolutionary approach by augmenting evolutionary algorithms with a local search technique (i.e., memetic approach) such as BP [19], iRprop+ [20] or Levenberg–Marquardt algorithms [21]. 2.2. Radial basis function The hidden neurons in an ANN have associated a transfer function. This function generates the output of the hidden neurons from the inputs. Examples of transfer functions include linear, sigmoidal, product or radial basis functions. The transfer function used by the MOEA in this paper is the radial basis function (RBF) [22]. Let the number of nodes in the input layer, the hidden layer and the output layer be K, M and J respectively. For any sample x = [x1 , x2 , . . ., xK ], the output of the RBF neural network is f(x) = [f1 (x), f2 (x), . . ., fJ (x)] (in our case there are only two classes, so f(x) = [f1 (x), f2 (x)]). The model of an RBF neural network can be described by the following equation: f1 (x) = ˇ01 +

M 

ˇi1 · i (x, wi ) and

f2 (x) = 0

i=1

where i (x, wi ) is a RBF non-linear mapping from the input layer to the hidden layer, ˇ1 = [ˇ11 , ˇ21 , . . ., ˇM1 ] is the connection weight between the hidden layer and the output layer, and ˇ01 is the bias value for the one class. f2 (x) is equal to zero because we use a neuron less than the number of classes in the problem, due to a probabilistic output that we define below. The radial basis function i (x, wi ) can be defined as: i (x, wi ) = e

−x−wi 2 /r 2 i

,

where wi = {wi1 , wi2 , . . . , win } are the centers of the RBFs, ri is a scalar parameter to define the radius of the ith neuron, ·  is the Euclidean distance and x is the input pattern. The output layer is interpreted from the point of view of probability through the use of the softmax activation function [23], which is given by: gj (x, ) =

exp fj (x, )

2

l=1

exp fl (x, )

,

for j = 1, 2,

where fj (x, ) is the output of the jth output neuron for pattern x, and gj (x,  j ) is the probability that pattern x has of belonging to jth class. In our case: g1 (x, ) =

exp f1 (x, )

2

l=1

exp fl (x, )

and g2 (x, ) = 1 − g1 (x, ).

2.3. Multi-objective evolutionary algorithms MOEAs or Pareto-based evolutionary algorithms [9] try to satisfy two or more opposing objective functions simultaneously. These algorithms have to provide a well-distributed, non-dominated front and provide diversity (in the objective space) to explore the fitness landscape, although it is difficult to define the appropriate quality of a Pareto front. MOEAs have been used in several biomedical applications, such as RNA sequence alignment [24], radiation

3

therapy [25] and temperature estimation of therapeutic instrumentation [26]. These techniques present an uncountable set of solutions that, when evaluated, produce vectors whose components represent a trade-off in objective space. A decision-maker then implicitly chooses an acceptable solution (or solutions) by selecting one or more of these vectors [27,28]. The use of ANNs together with evolutionary Pareto-based algorithms is known as multi-objective evolutionary artificial neural networks analysis [29]. This technique is being used to solve classification tasks with several competing objectives and is able to find multiple solutions in a single execution [30,31]. For this reason, we used MOEAs with ANNs in this study to classify the D–R pairs according to graft survival after three months. 2.4. Accuracy and minimum sensitivity This section presents two measures to evaluate a classifier: the correct classification rate or accuracy (C) and minimum sensitivity (MS). The machine-learning community has traditionally used C to measure its default performance in order to evaluate a classifier. However, C cannot capture all of the different behavioural aspects found in two different classifiers in a multiclass classification or in unbalanced problems where there are one or more classes with a small number of patterns compared to the number of patterns in other classes. This is our case, because the majority class has 890 patterns while the other has 113. For these problems, two performance measures are considered: traditionally-used C, as the number of patterns correctly classified: 1 njj , N Q

C=

j=1

where Q is the number of classes, N is the number of patterns in training or testing and njj is the number of patterns from class jth that are correctly classified. With respect to MS, that is, the lowest percentage of examples correctly predicted as belonging to each class with respect to the total number of examples in the corresponding class, MS = min {Si }, i = 1 . . . Q, where Si is the sensitivity of the class i. The pair (MS, C) expresses two features associated with a classifier: global performance (C) and the rate of the worst classified class (MS). The selection of MS as a complementary measure of C can be justified by considering that C is the weighted average of the sensitivities of each class. One point in (MS, C) space dominates another if it is above and to the right, i.e. it has greater C and the best MS. Let C and MS be associated with a classifier g, then MS ≤ C ≤ 1 − (1 − MS)p* , where p* is the minimum for estimated prior probabilities, p∗ =

Number of minority class patterns . Number of total patterns

A priori, it could seem that MS and C objectives could be positively correlated, but while this may be true for small values of MS and C, it is not so for values close to 1 on both MS and C, where the objectives are competitive and conflicting. This fact justifies the use of a MOEA in this research. For a more detailed description of these measures, please see [32,33]. 2.5. Other metrics used The use of receiver operating characteristic (ROC) curves is a common technique for comparing the performance of two or more binary classifiers [34] and is especially common in medical decision-making. This technique is often used to determine whether one classifier is better than another with respect to the

Please cite this article in press as: Cruz-Ramírez M, et al. Predicting patient survival after liver transplantation using evolutionary multi-objective artificial neural networks. Artif Intell Med (2013), http://dx.doi.org/10.1016/j.artmed.2013.02.004

G Model ARTMED-1275; No. of Pages 13

ARTICLE IN PRESS

4

M. Cruz-Ramírez et al. / Artificial Intelligence in Medicine xxx (2013) xxx–xxx

minority class. An ROC curve depicts the relative trade-offs between the benefits and costs of a classifier. An ROC curve is a graphic metric that enables different classifiers to be compared visually. The area under an ROC curve (AUC) is used to make numerical comparisons, as well as to perform probabilistic analyses. The root mean squared error (RMSE) indicates the absolute fit of the model to the data (i.e., how close the expected values are to the model’s predicted values). The RMSE is a good measure of how accurately the model predicts the response, and it is a very important fit criterion if the main purpose of the model is prediction [14]. Finally, Cohen’s kappa (Kappa) measures the proportion of agreement corrected by chance between the real classes of the problem and the classes predicted by the classifier [35]. In this study, we used these metrics to compare different classifiers, as the purpose of these measures is similar to the purpose of the objective functions used by the MOEA. 2.6. Rule-based systems In our daily lives, many complex situations are governed by deterministic rules, such as traffic control systems, medical decisions, security systems, and bank transactions. Rule-based systems [36,37] automate problem solving and provide a means of capturing and refining human expertise. Experts tend to express their knowledge in terms of a set of situation-action rules. These rules, which are the foundations of any rule-based system, consist of two parts: the condition (situation, antecedent) and the conclusion (action, consequent), i.e.: IF (conditions) THEN (actions) The basic function of a rule-based system is to provide a solution for the given problem, starting from an initial state (in our case, the initial state is a set of D–R pairs). According to the characteristics of the pairs in this set, the rules are appropriately applied to reach the final state (i.e., assigning the organ to a specific recipient). The rule-based system developed is described in Section 4.4. 3. Methods This section describes the MOEA used for training ANNs. The algorithm is called MPENSGA2. This algorithm is based on the NSGA2 algorithm developed by Deb et al. [38]. 3.1. Objectives for the proposed MOEA One of the objectives (or non-cooperative metrics) used in this study, explained in Section 2.4, is the discontinuous C metric. Such discontinuity is particularly evident in datasets with few patterns. This metric makes convergence difficult in neural network optimisation; therefore, instead of C, we consider the first objective to be the continuous function given by cross-entropy, E, associated with C: 1 [yn log g(xn , ) + (1 − yn ) log (1 − g(xn , ))]. N

The second objective to maximise is the MS of the classifier. So, the second fitness function is: A2 (g) = MS(g).

3.2. MPENSGA2 algorithm The steps of the MPENSGA2 (hereafter referred to as M2) algorithm can be summarised graphically in Fig. 1. The M2 algorithm begins with the random generation of N individuals (ANNs). The weights of the links are established randomly within a certain interval ([− 2, 2] for the input layer-hidden layer weight and [− 10, 10] for the hidden layer-output layer weight. The range of these intervals has been established experimentally). When the initial population is generated, it is evaluated on both objective functions, that is, A1 and A2 as defined in Section 3.1. Once individuals have been evaluated, the population is sorted depending on the Pareto dominance concept [39], assigning a fitness value to each solution equal to its non-domination level. The non-dominated individuals will be the parents from which new individuals are generated. From these non-dominated individuals, one is selected by binary tournament (two randomly selected individuals are compared and the better one is chosen). To create a new child, mutation operators are applied to the parent selected. There are five mutation operators, four structural mutations and one parametric mutation, and the probability of choosing a type of mutator and applying it to an individual is 1/5. Parametric mutation adds Gaussian noise to each of the weights of the parent links. Structural mutation introduces diversity in the population that leads to different locations in the search space. Specifically, the operators used are “add/delete neurons” and “add/delete connections”. For further details about these mutations, the reader can consult [40]. The child created is added to an offspring population. This process is repeated until the offspring population has a size of N. Then, the offspring population is evaluated on both objective functions and is combined with the parent population. The resulting population is sorted according to the Pareto dominance concept, the N best individuals are selected as parents for the next generation, and the process starts again. A local search procedure is applied when we combine parent and offspring populations in M2. Then, only the individuals from the first Pareto front (obtained from the non-dominated sort) of this combined population are optimised by iRprop+ [20], considerably reducing computational cost, since the local procedure is not applied to the whole mutated offspring population. After the optimisation process, the fitness value of each individual is updated with regard to the approximation error. The iRprop+ algorithm is applied at the beginning, in the middle of the evolution and at the end of the evolutionary process; that is, only three local search procedures are carried out throughout the evolutionary process. This procedure can be seen in Fig. 2. A much more detailed description of the M2 algorithm can be viewed in [41,42].

N

E(g, ) = −

4. Results and discussion

n=1

The advantage of using error function E instead of C is that this is a continuous function, which makes the convergence more robust. Then, as a first objective, we propose a strictly decreasing transformation of the E(g, ) as the fitness measure to maximise: A1 (g, ) =

1 , 0 < A1 (g, ) ≤ 1, 1 + E(g, )

where g is the multivalued function: g(x, ) = (g1 (x,  1 ), g2 (x,  2 )).

4.1. Dataset description A multi-centric retrospective analysis of eleven Spanish liver transplantation units was conducted that was based on all of the consecutive liver transplants performed between January 1, 2007, and December 31, 2008. We included all transplant recipients 18 years of age or older. Recipient and donor characteristics were reported at the time of transplant. Patients undergoing partial, split or living donor liver transplantation and patients undergoing combined or multi-visceral transplants were excluded from the study.

Please cite this article in press as: Cruz-Ramírez M, et al. Predicting patient survival after liver transplantation using evolutionary multi-objective artificial neural networks. Artif Intell Med (2013), http://dx.doi.org/10.1016/j.artmed.2013.02.004

G Model ARTMED-1275; No. of Pages 13

ARTICLE IN PRESS M. Cruz-Ramírez et al. / Artificial Intelligence in Medicine xxx (2013) xxx–xxx

5

Fig. 1. Framework for MPENSGA2 (M2) algorithm.

All patients were followed from the date of transplant until death, graft loss or the completion of the three months after their liver transplant. Liver transplantation units were distributed throughout Spain. A total of 16 recipient characteristics, 20 donor characteristics and 3 transplant factors (characteristics of recipients, donors and transplant factors can be seen in Section 4.4.1) were reported for each donor–recipient pair. The end-point variable for ANN modelling was three-month graft mortality. A total of 1031 liver

transplants were initially included, and the follow-up period was fulfilled in 1003 liver transplants. A total of 28 cases were excluded due to an absence of graft survival data. All of the losses were well distributed among the participating institutions. 4.2. Experimental design To determine whether the models obtained by the evolutionary algorithm are efficient, the RBF neural network models are trained

Please cite this article in press as: Cruz-Ramírez M, et al. Predicting patient survival after liver transplantation using evolutionary multi-objective artificial neural networks. Artif Intell Med (2013), http://dx.doi.org/10.1016/j.artmed.2013.02.004

G Model ARTMED-1275; No. of Pages 13 6

ARTICLE IN PRESS M. Cruz-Ramírez et al. / Artificial Intelligence in Medicine xxx (2013) xxx–xxx

Fig. 2. Local optimisation procedure.

with a subset of the dataset (training set) and are tested with the rest of the dataset (generalisation set). For this purpose, a stratified holdout cross-validation procedure is applied. We built 30 different training sets consisting of 75% of the randomly chosen D–R pairs and 30 different generalisation sets with the remaining 25% of the D–R pairs for each case. During the creation of these sets, the 75/25% proportion used for the creation of training-generalisation patterns was maintained in each of the participating liver transplantation units. In addition, the 75/25% proportion was maintained between the patterns of the survival class and the non-survival class. In the experiments, the population size for M2 was established as M = 100 individuals (ANNs). The mutation probability for each operator is 1/5 because there are five different types of mutation operators (four structural and one parametric). To start processing data, each of the input variables was scaled to within the interval [− 2, 2] to prevent some inputs from overshadowing others. In addition, categorical variables were transformed into many binary variables as possible categories. Table 1 shows the following features of the dataset: the total number of patterns, the number of patterns in the training and testing sets, the number of input variables, the total number of instances per class and the p* value (as defined in Section 2.4). During the experiment, RBF models are trained using fitness functions A1 (g, ) (based on E, see Section 3.1) and A2 (g) (MS) as objective functions; when validated, however, we use C and MS because C is a metric commonly used to measure the performance of a classifier. Once the Pareto front is built, two methods are considered to obtain the best neural network model from the Pareto front, which then includes information about the models within it. These methods are called M2-E and M2-MS. These methods provide models that can be compared with other classification methods found in the literature. The process followed in these methods is the following: once the first Pareto front is calculated using training set patterns, the best individual belonging to the Pareto front based on the E measure (E Individual (EI)), using the fitness function A1 (g, ), is chosen for M2-E, and the best individual in terms of MS (MS Individual (MSI)) is selected for M2-MS. Therefore, we obtain an individual EI G = (CGEI , MS EI G ) and an individual MSI G = (CGMSI , MS MSI G ). This process is repeated three times for each holdout (30 × 3 =90 runs in total). In this way, the experimental design takes into account 30 different dataset designs, and for each of them, a

stochastic algorithm is used three times. Estimations are then carried out using the averages and standard deviations obtained from EI

EI

MSI

MSI

the individuals EI G = (C G , MS G ) and MSI G = (C G , MS G ). The first expression is the average obtained, taking E into account as the primary objective, and the second is obtained by taking MS into account as the primary objective. The opposite extremes of the Pareto front are thereby utilised in each of the executions. In Fig. 3, the process is shown graphically. Additionally, the individual with the highest separation index (SI) [43] of each Pareto front is selected. The SI is calculated for each of the individuals in the front, and the individual with the highest value is selected. The SI value is calculated as follows: first, the positive ideal solution and the negative ideal solution are defined. The positive ideal solution is the solution that maximises the benefit criteria and minimises the cost criteria; the negative ideal solution is the solution that maximises the cost criteria and minimises the benefit criteria. In our problem, the positive ideal solution is the (1, 1) point in (MS, C) space, and the negative ideal solution is the point (0, 0). Then, the Euclidean distance is calculated between each individual in the Pareto front (its representation in (MS, C) space) and the two ideal solutions (positive and negative). These distances are denoted as d+ for the distance to the positive ideal solution and d− for the distance to the negative ideal solution. For example, Fig. 4 shows the distance to the positive ideal solution of the point C and the distance to the negative ideal solution of the point D. Finally, the SI is calculated according to the following expression: SI =

d+

d− . + d−

As a last step, the average of all individuals selected in this way is calculated (one for each run). This process will be called M2-SI. The M2 algorithm with RBF neural networks was compared with different logistic regression models and with artificial intelligence methods. We used WEKA [44] to obtain the results of these logistic regression and artificial intelligence methods. Additionally, the M2 algorithm was compared with a mono-objective evolutionary algorithm, guided by E and MS (in different runs). The methods used for comparison were: • MultiLogistic (MLogistic): MLogistic is an algorithm, based on the work by le Cessie and van Houwelingen [45], used to build a multinomial logistic regression model with a ridge estimator to

Table 1 Characteristics for the dataset. #Patterns

#Training patterns

#Test patterns

#Input variables

#Classes

#Patterns per class

p*

1003

751

252

64

2

(890,113)

0.1126

Please cite this article in press as: Cruz-Ramírez M, et al. Predicting patient survival after liver transplantation using evolutionary multi-objective artificial neural networks. Artif Intell Med (2013), http://dx.doi.org/10.1016/j.artmed.2013.02.004

G Model ARTMED-1275; No. of Pages 13

ARTICLE IN PRESS M. Cruz-Ramírez et al. / Artificial Intelligence in Medicine xxx (2013) xxx–xxx

7

Fig. 3. Achieving statistical results.

guard against overfitting by penalising large coefficients. A quasiNewton method of local optimisation based on the gradient is used to find the coefficient vector. • SimpleLogistic (SLogistic): SLogistic is based on applying the logitboost algorithm with simple regression functions and determining the optimum number of iterations by five-fold crossvalidation. The data are equally split five times into training and test pools, and logitboost is run on every training set for a given maximum number of iterations. The classification error for the respective test set is logged. Afterward, logitboost is run again on all data using the number of iterations that gave the smallest error on the test set averaged over the five folds. Further details regarding the algorithm have been previously published [46].

• C4.5: The C4.5 classification tree inducer [47] is run with the standard options; the confidence threshold for pruning is 0.25, and the minimum number of instances per leaf is 2. For pruning, both subtree replacement and subtree raising are considered. • Logistic model tree (LMT): LMT is an algorithm used to address classification tasks that use a combination of a tree structure and logistic regression models, resulting in a single tree. Another advantage of this method is that, using logistic regression, explicit class probability estimates are produced rather than just classifications [46]. • Support vector machine (SVM): SVM is a common supervised learning method that generates input-output mapping functions from a set of training data. They belong to a family of generalised linear models that achieve a classification or regression decision based on the value of the linear combination of features. They are also said to belong to the class of kernel methods [48]. • Mono-E and Mono-MS: These two methods are the same evolutionary algorithm but are guided by different fitness functions; specifically, Mono-E is guided by E, and Mono-MS is guided by MS. The evolutionary algorithm used is described in the first part of [49]. This algorithm is a stochastic algorithm, such that each of the 30 holdouts is used three times (30 × 3 =90 runs in total). 4.3. Experimental results

Fig. 4. Example of distances to the positive ideal solution and negative ideal solution.

Table 2 presents the values of the mean and standard deviation (SD) in generalisation for C, MS, AUC, RMSE and Kappa for all runs of the experiments performed. The C, AUC, RMSE and Kappa measures represent the four most commonly used metrics and correspond to the threshold metric, probability metric, rank metric and agreement metric, respectively [50], while the MS measure, defined by us, is associated with good classification ability in all classes. From a descriptive point of view, the best mean CG value is obtained by the SLogistic and LMT methods, followed by the SVM method. For MSG , the best and second best mean values are obtained by the M2-MS method (48.22) and the Mono-MS method (47.57), respectively; these values are much higher than those

Please cite this article in press as: Cruz-Ramírez M, et al. Predicting patient survival after liver transplantation using evolutionary multi-objective artificial neural networks. Artif Intell Med (2013), http://dx.doi.org/10.1016/j.artmed.2013.02.004

G Model ARTMED-1275; No. of Pages 13 8

ARTICLE IN PRESS M. Cruz-Ramírez et al. / Artificial Intelligence in Medicine xxx (2013) xxx–xxx

Table 2 Statistical results for different methods in generalisation. Method

CG (%) Mean ± SD

MSG (%) Mean ± SD

AUCG Mean ± SD

RMSEG Mean ± SD

KappaG Mean ± SD

MLogistic SLogistic C4.5 LMT SVM Mono-E Mono-MS M2-E M2-MS M2-SI

87.64 ± 0.73 88.45 ± 0.00 86.84 ± 1.46 88.45 ± 0.00 88.35 ± 17.14 88.27 ± 0.22 58.73 ± 2.31 87.80 ± 0.56 59.37 ± 2.12 76.53 ± 4.65

2.87 ± 2.57 0.11 ± 0.63 4.94 ± 4.31 0.11 ± 0.63 0.00 ± 0.00 0.92 ± 1.06 47.57 ± 5.56 3.83 ± 3.28 48.98 ± 5.18 24.08 ± 8.76

0.5079 ± 0.0128 0.5005 ± 0.0027 0.5124 ± 0.0203 0.5005 ± 0.0027 0.4995 ± 0.0010 0.5618 ± 0.0342 0.5608 ± 0.0423 0.5653 ± 0.0509 0.5659 ± 0.0339 0.5518 ± 0.0362

0.3299 ± 0.0079 0.3234 ± 0.0031 0.3407 ± 0.0163 0.3234 ± 0.0031 0.3413 ± 0.0025 0.3207 ± 0.0048 0.4981 ± 0.0214 0.3274 ± 0.0081 0.4952 ± 0.0134 0.4141 ± 0.0278

0.0251 ± 0.0405 0.0017 ± 0.0092 0.0362 ± 0.0588 0.0017 ± 0.0092 −0.0018 ± 0.0034 0.0101 ± 0.0158 0.0498 ± 0.0406 0.0400 ± 0.0501 0.0502 ± 0.0234 0.0647 ± 0.0330

The best result is in bold face and the second best result is in italics.

obtained by other methods. Analysing the AUCG metric, the best performance is achieved by the M2-MS and M2-E methods. For the RMSEG measure, the Mono-E method yields the best mean value, followed by the SLogistic and LMT methods. The best and second best mean values of KappaG are achieved by the M2-SI method and the M2-MS method, respectively. As observed, overall, no single method is superior to the other methods. However, the best methods for obtaining the CG value can be discarded because they yield MSG and KappaG values close to zero (or zero) and AUCG values close to 0.50 (or lower). An MSG value of zero means that the method does not classify any pattern in a class (in this problem, a non-survival class). In this case, these classifiers are known as trivial classifiers. A KappaG value of zero means that there is no correlation between the real and predicted classes, which implies that knowledge of the real values of classification does not provide any information about the values predicted by the classifiers. AUCG values very close to 0.50 indicate that the classifiers are random, i.e., they classify data worse than a random classifier. The other two methods in the literature (MLogistic and SVM) have the same disadvantages. The M2-SI method provides a trade-off between the M2-E and M2-MS methods, which occurs because, in general, the M2-SI method selects an individual situated in an intermediate position of the Pareto front. To analyse the performance of the proposed multi-objective method compared with the performance of a mono-objective evolutionary algorithm, statistical tests were applied for each metric (CG , MSG , AUCG , RMSEG and KappaG ). Two sets of tests were performed; specifically, the first set of tests was performed on the methods guided by E together with the M2-SI method, and the second set of tests was performed on the methods guided by MS together with the M2-SI method. The M2-SI method was included in the two sets because it is a trade-off between the M2-E and M2-MS methods. A Kolmogorov–Smirnov test (KS-test) at a significance level of ˛ = 0.05 was used to evaluate whether different performance metrics for all of the methods followed a normal distribution. Based on the results of the KS-test, a normal distribution could not be assumed in any of the cases. As a consequence, the statistical analysis ended by applying the Wilcoxon signedrank test to all pairs of algorithms, and the results are shown in Table 3. These results include, for each method, the number of algorithms that statistically outperformed (wins), the number of draws (non-significant differences) and the number of losses (number of algorithms that outperformed the method). For the methods guided by E and M2-SI, the M2-E is superior in five cases (one for each metric) and appears to be the most reliable method, as the Mono-E method is superior only in terms of CCRG and RMSEG (precision measures that ignore the minority class) and the M2-SI method is only superior in terms of MSG and KappaG (obtains good values for the minority class but not for the overall result). The results obtained using the methods guided by MS show

that the M2-MS and Mono-MS methods exhibit similar behaviour (for all metrics). The M2-SI method is superior with respect to CCRG and RMSEG and inferior with respect to MSG and KappaG . The behaviour of the M2-SI method (good results for MSG and KappaG compared with methods guided by E and good values for CCRG and RMSEG compared with methods guided by MS) is due to a trade-off between the extreme methods. The problem with this method is that, in general, it never yields the best value for any metric. 4.4. Proposed rule-based system Using the two sets composed of 90 ANN models obtained by the M2 algorithm (one of them formed by models guided by E (setE) and the other composed of models guided by MS (set-MS)), we designed a simple rule-based system [51]. In our case, the errors obtained with the two sets are used as input into the rule-based system to determine which of the D–R pairs should be assigned the organ. An additional value of this rulebased system is the possibility of considering the recipient with the highest MELD score when the system cannot identify a significant best match. The rule-based system consists of five rules and a preparation phase. In this preparation phase, the errors of each D–R pair are calculated for both sets. The error produced by each model is calculated as follows: if the predicted probability is between 0 and 0.5, the error is calculated with respect to zero (value associated with the non-survival class); otherwise, it is calculated with respect to one (value associated with the survival class). The equation used to calculate the error is:



error =

p

for 0 ≤ p ≤ 0.5

  p − 1 for 0.5 < p ≤ 1 ,

where p is the predicted probability for a given D–R pair. When the errors for each set of 90 models have been calculated, the mean and SD are obtained for each D–R pair. The best D–R pair is the pair with the lowest mean error. The system then examines whether there are significant differences between the best D–R pair and Table 3 Number of wins (W), draws (D) and losses (L) when comparing the different methods using the Wilcoxon’s signed-rank test with ˛ = 0.05. CG (%) W/D/L Methods guided by E 2/0/0 Mono-E M2-E 1/0/1 M2-SI 0/0/2 Methods guided by MS Mono-MS 0/1/1 0/1/1 M2-MS M2-SI 2/0/0

MSG (%) W/D/L

AUCG W/D/L

RMSEG W/D/L

KappaG W/D/L

0/0/2 1/0/1 2/0/0

0/2/0 1/1/0 0/1/1

2/0/0 1/0/1 0/0/2

0/0/2 1/0/1 2/0/0

1/1/0 1/1/0 0/0/2

0/2/0 1/1/0 0/1/1

0/1/1 0/1/1 2/0/0

0/2/0 0/1/1 1/1/0

Please cite this article in press as: Cruz-Ramírez M, et al. Predicting patient survival after liver transplantation using evolutionary multi-objective artificial neural networks. Artif Intell Med (2013), http://dx.doi.org/10.1016/j.artmed.2013.02.004

ARTICLE IN PRESS

G Model ARTMED-1275; No. of Pages 13

M. Cruz-Ramírez et al. / Artificial Intelligence in Medicine xxx (2013) xxx–xxx

9

Table 4 Recipients from dataset used for examples (recipient characteristics).

Rec1 Rec2 Rec3 Rec4 Rec5 Rec6 Rec7 Rec8 Rec9 Rec10 Rec11 Rec12 Rec13 Rec14 Rec15 Rec16 Rec17 Rec18 Rec19 Rec20

M

A

S

D

BMI

MD

AH

DM

E

HC

PT

WT

MO

TI

HS

UAS

23 22 22 21 20 26 25 25 25 24 39 34 32 29 28 27 27 27 27 27

64 31 62 45 57 63 31 22 49 61 55 50 45 60 58 44 43 53 56 51

1 0 0 0 0 1 1 1 0 0 0 1 0 1 0 0 0 1 0 0

0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0

32.00 26.00 36.73 32.60 30.00 21.63 23.73 19.68 29.00 23.05 26.00 27.00 23.00 28.00 32.00 30.46 29.94 36.36 23.31 20.00

0 0 0 0 0 2 0 2 1 0 0 2 1 1 0 0 0 2 0 2

0 0 0 0 1 1 0 0 1 0 0 0 0 0 0 1 0 0 0 0

0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0

6 3 0 1 1 6 0 6 1 1 0 6 0 6 0 4 0 6 6 3

0 0 1 0 1 0 0 0 0 1 1 0 0 0 0 0 0 0 1 0

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 2 0

71 87 166 148 270 35 324 166 5 81 167 205 3 1 230 350 343 156 72 3

27 19 25 27 22 28 25 30 24 24 35 34 26 29 19 28 29 29 28 27

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

1 0 0 1 0 0 0 0 0 0 0 0 0 1 1 0 1 0 0 0

0 0 0 0 0 0 0 1 0 0 0 1 1 0 0 0 1 0 0 0

Abbreviations: M: MELD score at listing; A: age; S: sex; D: dialysis at transplant; BMI: body mass index; MD: main diagnosis; AH: arterial hypertension; DM: diabetes mellitus; E: etiology; HC: hepatocellular carcinoma; PT: portal thrombosis; WT: waiting list time; MO: MELD score at the operation; TI: transjugular intrahepatic portosystemic shunt; HS: hepatorrenal syndrome; UAS: upper abdominal surgery.

the other four D–R pairs. First, Friedman’s test is applied to determine whether there are significant differences between the errors obtained by the best D–R pair and one of the other D–R pairs (with a significance level of ˛ = 0.05). When there are significant differences, the Bonferroni–Dunn’s test is used to determine which D–R pairs are significantly different from the best D–R pair (control pair). This test considers that the qualities of any two D–R pairs are significantly different if their means differ by at least the critical difference (CD):



CD = q

K(K + 1) , 6D

where K is the number of D–R pairs, D is the number of models in the set and the q value can be computed as suggested in [52]. Once the preparation phase is established, the following system of rules is implemented: 1. IF a D–R pair has a significantly lower error in the set-E and IF it has a significantly lower error in the set-MS, THEN that D–R pair is chosen. 2. IF a D–R pair has a significantly lower error in the set-E and IF there is no significant difference from other D–R pairs with respect to the error in the set-MS, THEN that D–R pair is chosen.

3. IF a D–R pair is not significantly different from other D–R pairs with respect to the error in the set-E and IF it has a significantly lower error in the set-MS, THEN that D–R pair is chosen. 4. IF a D–R pair is not significantly different from other D–R pairs with respect to the error in the set-E and IF there is no significant difference from other D–R pairs with respect to the error in the set-MS, THEN the D–R pair that contains the receiver with the highest MELD is chosen. 5. IF a D–R pair has a significantly lower error in the set-E and IF the other D–R pair has a significantly lower error in the set-MS, THEN the D–R pair that contains the receiver with the highest MELD is chosen.

4.4.1. Examples To better understand the new method and its application in allocating extended criteria donor livers, four different groups of five randomly selected recipients are chosen from the dataset with MELD values between 20 and 39 points. The characteristics of the recipients are presented in Table 4 (recipients 1–20). The following groups were tested: recipients with MELD values between 20 and 23, recipients with values between 24 and 26, recipients with values between 28 and 39 and recipients with the same MELD value (in

Table 5 Donors from dataset used for examples (donor characteristics).

Don1 Don2 Don3 Don4 Don5 Don6 Don7 Don8 Don9 Don10

A

S

BMI

CE

DM

AH

ICU

Hy

In

Cr

NA

AST

ALT

TB

AHB

HCV

MU

AB0

CIT

I/R

81 79 78 76 67 35 42 83 68 85

1 1 1 1 0 0 0 0 1 1

31.25 29.52 41.12 31.11 31.14 31.14 29.39 31.25 31.11 27.06

1 1 1 1 1 0 1 1 1 1

0 0 0 1 0 0 0 0 0 0

0 0 0 1 1 0 1 1 1 0

5 8 8 1 30 6 13 2 9 0

0 0 0 0 0 0 1 0 0 0

1 1 0 1 1 1 1 1 1 0

1.0 1.0 0.9 0.6 1.1 0.8 4.4 0.8 0.6 1.0

141 159 159 138 158 150 156 149 149 143

43 24 42 40 363 105 82 53 27 1

43 26 30 15 300 80 66 97 20 9

0.3 0.3 0.3 0.8 2.6 0.7 4.2 0.7 0.3 1.2

1 1 1 1 1 1 1 1 0 1

0 0 0 0 0 0 0 0 0 0

0 1 1 0 1 1 1 0 1 0

0 0 0 0 0 0 0 0 0 0

1 0 1 1 0 2 2 1 1 2

1 1 1 1 2 1 2 1 1 2

Abbreviations: A: age; S: sex; BMI: body mass index; CE: cause of exitus; DM: diabetes mellitus; AH: arterial hypertension; ICU: hospitalisation length in intensive care unit; Hy: hypotension episodes > 1 h < 60 mmHg; In: high inotropic drug use; Cr: creatinine plasma level; NA: sodium plasma level; AST: aspartate transaminase level; ALT: alanin aminotransferase plasma level; TB: total bilirubin; AHB: hepatitis B (core Ab positive); HCV: hepatitis C (positive serology); MU: multi-organ harvesting; AB0: AB0 incompatible transplant; CIT: cold ischemia time; I/R: grade of ischemia-reperfusion injury.

Please cite this article in press as: Cruz-Ramírez M, et al. Predicting patient survival after liver transplantation using evolutionary multi-objective artificial neural networks. Artif Intell Med (2013), http://dx.doi.org/10.1016/j.artmed.2013.02.004

G Model

ARTMED-1275; No. of Pages 13

10

Example 1: recipients with MELD values between 20 and 23 MELD

Don1

Don3

Don4

Don5

Don6

Don7

Don8

Don9

Don10

. 2613 ± .1263 .2414 ± .1464 . 2660 ± .1273 . 2356 ± .1374 . 2683 ± .1340

. 2688 ± .1308 .2273 ± .1341 . 2698 ± .1353 . 2270 ± .1298 . 2636 ± .1329

. 2710 ± .1332 .2728 ± .1371 .2819 ± .1307 .2745 ± .1410 .2850 ± .1362

. 2389 ± .1392 .2299 ± .1469 . 2435 ± .1418 . 2223 ± .1408 . 2468 ± .1444

.2887 ± .1302 . 2713 ± .1336 . 3044 ± .1233 .2831 ± .1387 .2886 ± .1270

.2151 ± .1240 .2221 ± .1339 .2253 ± .1287 . 2218 ± .1364 .2325 ± .1404

. 2425 ± .1319 . 1906 ± .1362 . 2408 ± .1414 .1953 ± .1350 . 2400 ± .1437

. 2591 ± .1370 . 1954 ± .1340 . 2489 ± .1389 .2000 ± .1314 . 2459 ± .1411

. 2677 ± .1303 . 2323 ± .1338 . 2660 ± .1251 .2355 ± .1273 . 2703 ± .1347

. 2418 ± .1407 . 2684 ± .1261 .2511 ± .1354 .2578 ± .1307 .2461 ± .1347

.2622 ± .1399 . 2940 ± .1276 .2681 ± .1318 . 2817 ± .1292 .2620 ± .1327

. 2174 ± .1397 . 2466 ± .1313 .2281 ± .1393 . 2358 ± .1340 .2196 ± .1354

. 2249 ± .1488 . 2403 ± .1413 .2348 ± .1494 .2296 ± .1379 .2329 ± .1473

.2298 ± .1338 . 2624 ± .1349 .2397 ± .1374 .2503 ± .1386 . 2281 ± .1342

.1979 ± .1528 . 2146 ± .1482 .2052 ± .1540 . 2068 ± .1480 . 1942 ± .1417

. 2774 ± .1336 . 3123 ± .1208 .2865 ± .1206 .2943 ± .1182 .2824 ± .1228

. 2898 ± .1340 . 3198 ± .1185 .2950 ± .1213 .3073 ± .1214 .2927 ± .1260

.2381 ± .1388 . 2734 ± .1410 .2456 ± .1357 .2525 ± .1360 . 2364 ± .1357

Rec4

Rec4

Rec1

Rec4

Rec1

Rec1

Rec4

Rec4

Rec4

Example 2: recipients with MELD values between 24 and 26 MELD

Don1

Errors of set-E (M2-E models) Rec6(26) . 2936 ± .1321 Rec7(25) .2238 ± .1332 Rec8(25) . 2935 ± .1409 . 2696 ± .1364 Rec9(25) . 2162 ± .1286 Rec10(24) Errors of set-MS (M2-MS models) Rec6(26) . 2555 ± .1361 . 3219 ± .1196 Rec7(25) Rec8(25) .2647 ± .1429 .2640 ± .1281 Rec9(25) . 3187 ± .1151 Rec10(24) Rule-based system’ decision Rec6

Don2

Don3

Don4

Don5

Don6

Don7

Don8

Don9

Don10

.2640 ± .1315 .2579 ± .1365 .2654 ± .1268 .2571 ± .1316 . 2558 ± .1324

.2775 ± .1283 .2527 ± .1368 .2760 ± .1319 .2608 ± .1300 . 2480 ± .1298

.2562 ± .1327 .2817 ± .1388 .2607 ± .1357 . 2554 ± .1306 .2870 ± .1339

.2412 ± .1407 .2378 ± .1422 .2362 ± .1375 . 2340 ± .1371 .2372 ± .1437

.2725 ± .1389 .2931 ± .1341 .2711 ± .1286 . 2649 ± .1299 .2921 ± .1254

.2119 ± .1361 .2223 ± .1332 . 2104 ± .1333 .2252 ± .1437 . 2309 ± .1350

. 2713 ± .1315 .2237 ± .1402 . 2737 ± .1427 . 2500 ± .1385 . 2156 ± .1378

. 2862 ± .1410 .2346 ± .1437 . 2847 ± .1447 . 2686 ± .1457 . 2225 ± .1355

.2754 ± .1346 . 2565 ± .1371 .2737 ± .1370 .2627 ± .1311 .2589 ± .1270

. 2097 ± .1324 . 2642 ± .1383 .2143 ± .1344 .2215 ± .1410 . 2569 ± .1325

.2312 ± .1408 . 2815 ± .1329 .2380 ± .1483 . 2298 ± .1368 . 2780 ± .1296

. 1948 ± .1393 . 2338 ± .1337 .1979 ± .1415 .1953 ± .1300 . 2340 ± .1387

.2022 ± .1413 . 2319 ± .1431 . 2009 ± .1409 .2043 ± .1412 . 2368 ± .1431

. 2053 ± .1429 . 2485 ± .1366 .2099 ± .1453 .2115 ± .1415 . 2432 ± .1380

. 1782 ± .1452 . 2080 ± .1521 .1783 ± .1462 .1800 ± .1452 . 2056 ± .1482

.2462 ± .1339 . 3009 ± .1280 .2505 ± .1395 . 2432 ± .1253 . 2980 ± .1190

.2533 ± .1339 . 3087 ± .1216 .2576 ± .1387 . 2504 ± .1230 . 3111 ± .1212

. 2126 ± .1369 . 2601 ± .1400 .2165 ± .1394 .2181 ± .1390 . 2474 ± .1332

Rec6

Rec6

Rec6

Rec6

Rec6

Rec6

Rec6

Rec6

Rec6

The best error is in bold face and errors with no significant differences are in italics.

M. Cruz-Ramírez et al. / Artificial Intelligence in Medicine xxx (2013) xxx–xxx

Errors of set-E (M2-E models) Rec1(23) . 2554 ± .1308 Rec2(22) . 1767 ± .1165 . 2476 ± .1309 Rec3(22) Rec4(21) .1929 ± .1290 Rec5(20) . 2493 ± .1443 Errors of set-MS (M2-MS models) Rec1(23) . 2902 ± .1273 Rec2(22) . 3363 ± .1139 Rec3(22) .3036 ± .1145 Rec4(21) . 3226 ± .1194 Rec5(20) .2941 ± .1168 Rule-based system’ decision Rec1

Don2

ARTICLE IN PRESS

Please cite this article in press as: Cruz-Ramírez M, et al. Predicting patient survival after liver transplantation using evolutionary multi-objective artificial neural networks. Artif Intell Med (2013), http://dx.doi.org/10.1016/j.artmed.2013.02.004

Table 6 Using the rule-based system: examples 1 and 2.

G Model

ARTMED-1275; No. of Pages 13

Example 3: recipients with MELD values between 28 and 39 MELD

Don1

Don2

Don3

Don4

Don5

Don6

Don7

Don8

Don9

Don10

.2698 ± .1363 .2688 ± .1297 .2720 ± .1361 . 2553 ± .1314 .2565 ± .1321

. 2794 ± .1364 .2779 ± .1278 .2759 ± .1281 .2690 ± .1311 . 2605 ± .1312

.2493 ± .1273 .2536 ± .1336 .2455 ± .1336 . 2323 ± .1344 . 2710 ± .1369

.2364 ± .1411 .2337 ± .1355 .2394 ± .1389 .2314 ± .1380 . 2304 ± .1366

.2790 ± .1297 .2705 ± .1341 .2670 ± .1311 . 2593 ± .1376 . 2950 ± .1337

.2087 ± .1313 . 2076 ± .1340 .2142 ± .1393 .2139 ± .1372 .2197 ± .1338

. 2717 ± .1375 . 2787 ± .1368 . 2717 ± .1257 . 2622 ± .1332 . 2350 ± .1371

. 2832 ± .1367 . 2894 ± .1396 . 2882 ± .1298 . 2838 ± .1326 . 2465 ± .1389

.2648 ± .1281 .2756 ± .1377 .2741 ± .1271 .2668 ± .1313 . 2561 ± .1296

. 2205 ± .1366 . 2068 ± .1338 .2062 ± .1381 . 1953 ± .1378 . 2493 ± .1339

. 2300 ± .1362 . 2281 ± .1460 .2201 ± .1424 . 2039 ± .1372 . 2690 ± .1340

. 1988 ± .1404 . 1896 ± .1377 .1924 ± .1402 . 1751 ± .1352 . 2256 ± .1354

. 2074 ± .1421 .1944 ± .1403 .1932 ± .1425 . 1867 ± .1417 . 2273 ± .1465

. 2033 ± .1377 . 2014 ± .1442 .1992 ± .1384 . 1895 ± .1396 . 2411 ± .1385

.1773 ± .1428 .1717 ± .1438 .1721 ± .1486 . 1667 ± .1450 . 2014 ± .1513

. 2517 ± .1363 . 2428 ± .1418 .2374 ± .1390 . 2195 ± .1407 . 2856 ± .1291

. 2562 ± .1315 . 2479 ± .1374 .2309 ± .1304 . 2265 ± .1390 . 2938 ± .1237

.2113 ± .1356 .2077 ± .1383 .2058 ± .1404 . 1985 ± .1417 . 2444 ± .1348

Rec13

Rec13

Rec13

Rec12

Rec13

Rec11

Rec13

Rec13

Rec11

Example 4: recipients with the same MELD value (MELD value = 27) MELD

Don1

Errors of set-E (M2-E models) . 1847 ± .1211 Rec16(27) . 2824 ± .1255 Rec17(27) . 2886 ± .1320 Rec18(27) . 3268 ± .1159 Rec19(27) . 2637 ± .1372 Rec20(27) Errors of set-MS (M2-MS models) . 3222 ± .1221 Rec16(27) .2591 ± .1334 Rec17(27) .2629 ± .1294 Rec18(27) Rec19(27) . 2512 ± .1186 . 2881 ± .1266 Rec20(27) Rule-based system’ decision Rec16

Don2

Don3

Don4

Don5

Don6

Don7

Don8

Don9

Don10

. 2389 ± .1407 . 2669 ± .1308 . 2660 ± .1244 . 2940 ± .1276 .2638 ± .1320

. 2304 ± .1349 . 2745 ± .1353 . 2790 ± .1269 . 3084 ± .1269 . 2729 ± .1354

.2718 ± .1357 . 2488 ± .1313 .2674 ± .1342 .2639 ± .1237 .2695 ± .1347

. 2289 ± .1420 .2335 ± .1393 .2418 ± .1393 . 2578 ± .1390 .2364 ± .1421

.2729 ± .1408 . 2695 ± .1327 .2823 ± .1330 .2780 ± .1306 .2866 ± .1396

.2228 ± .1416 . 2072 ± .1311 .2168 ± .1336 .2282 ± .1341 .2226 ± .1441

. 1966 ± .1319 . 2641 ± .1372 . 2660 ± .1328 . 2877 ± .1216 . 2500 ± .1374

. 1963 ± .1255 . 2777 ± .1334 . 2792 ± .1365 . 3039 ± .1236 . 2608 ± .1397

. 2349 ± .1348 . 2687 ± .1289 . 2737 ± .1304 . 3011 ± .1296 .2604 ± .1247

. 2557 ± .1283 .2152 ± .1354 .2191 ± .1345 . 2134 ± .1331 . 2348 ± .1342

. 2773 ± .1279 . 2258 ± .1365 .2420 ± .1423 .2269 ± .1331 . 2543 ± .1375

. 2304 ± .1322 .1956 ± .1434 .2021 ± .1407 . 1920 ± .1364 . 2159 ± .1356

. 2344 ± .1461 . 2027 ± .1441 .2085 ± .1429 .2110 ± .1494 . 2221 ± .1483

. 2502 ± .1371 .1992 ± .1313 .2133 ± .1412 . 1972 ± .1340 . 2268 ± .1381

. 2026 ± .1469 . 1820 ± .1566 .1827 ± .1456 .1842 ± .1559 . 1944 ± .1484

. 2990 ± .1282 . 2466 ± .1408 .2581 ± .1370 .2489 ± .1244 . 2700 ± .1282

. 3029 ± .1247 . 2473 ± .1331 .2646 ± .1348 .2548 ± .1268 . 2752 ± .1276

. 2546 ± .1349 .2111 ± .1408 .2207 ± .1370 . 2032 ± .1310 . 2370 ± .1415

Rec16

Rec16

Rec17

Rec17

Rec17

Rec17

Rec16

Rec16

Rec16

M. Cruz-Ramírez et al. / Artificial Intelligence in Medicine xxx (2013) xxx–xxx

Errors of set-E (M2-E models) . 2951 ± .1310 Rec11(39) . 3045 ± .1336 Rec12(34) . 3004 ± .1229 Rec13(32) . 2932 ± .1218 Rec14(29) . 2493 ± .1421 Rec15(28) Errors of set-MS (M2-MS models) . 2612 ± .1269 Rec11(39) . 2535 ± .1399 Rec12(34) Rec13(32) .2530 ± .1382 Rec14(29) . 2222 ± .1297 Rec15(28) . 3090 ± .1245 Rule-based system’ decision Rec13

The best error is in bold face and errors with no significant differences are in italics.

ARTICLE IN PRESS

11

Please cite this article in press as: Cruz-Ramírez M, et al. Predicting patient survival after liver transplantation using evolutionary multi-objective artificial neural networks. Artif Intell Med (2013), http://dx.doi.org/10.1016/j.artmed.2013.02.004

Table 7 Using the rule-based system: examples 3 and 4.

G Model ARTMED-1275; No. of Pages 13 12

ARTICLE IN PRESS M. Cruz-Ramírez et al. / Artificial Intelligence in Medicine xxx (2013) xxx–xxx

this case, a value of 27). The responses of the receivers in these situations were tested with 10 potential extended criteria donors (i.e., they possessed at least two of the following restrictions: age > 75 years; hospitalisation length in the ICU > 4 days; high inotropic drug use = 1; BMI > 30; and cold ischemia time = 2 (>12 h)). These donors were selected from the dataset, and their characteristics are shown in Table 5. 4.4.2. Example 1: recipients with MELD values between 20 and 23 The first part of Table 6 presents the mean errors in set-E (see rows 4–8). The mean errors obtained in set-MS are also shown (see rows 11–15). The rule-based system uses these errors to allocate the donor graft to the recipient whose code can be observed in row 17 in the table. In this example, the recipients have MELD values between 20 and 23. It is notable that recipient 4, who had a MELD value of 21, is assigned the donor graft six times; the first recipient (MELD 23) is assigned the donor graft four times; and the second, third and fifth recipients are assigned the donor graft zero times. We can see that for recipients with these MELD values, the organ is generally allocated to those with higher MELD values. Therefore, under similar conditions, our system behaves similarly to the current MELD-based system. 4.4.3. Example 2: recipients with MELD values between 24 and 26 In this second example (second part of Table 6), five recipients were selected with MELD values between 24 and 26. Ten organs were assigned to recipient 6 (MELD 26), and no organs were assigned to recipients 7, 8, 9 and 10. In this situation, our system respects the criteria of the MELD-based system. 4.4.4. Example 3: recipients with MELD values between 28 and 39 In the third example (first part of Table 7), the recipients selected have MELD values between 28 and 39 (a value of MELD of 39 is very high). In this example, seven organs were allocated to recipient 13 (MELD 32), two organs were allocated to recipient 11 (MELD 39) and one organ was allocated to recipient 12 (MELD 34).

waiting-list, maintaining the desired principles of equity and efficiency. From the simulation, it can be inferred that the performance of the proposed system is similar to that of the MELD method in the following percentages of cases: 40% of the time when MELD values are between 20 and 23; 100% of the time when MELD values are between 24 and 26; 20% of the time when MELD values are between 28 and 39; and 60% of the time when the recipients have the same MELD value (in this case, 27). Therefore, the proposed system complements the MELD method, which is commonly used. In general, the proposed system behaves like the MELD 55% of the time. This result suggests that our system complements the current system gaverning the allocation of organs from donors to recipients by taking into account characteristics of the donors, recipients and transplant organ. It would be interesting to reduce the number of input characteristics in future experiments. Decreasing the complexity of the ANN models would facilitate their interpretation. Moreover, there would be a reduction in the time and cost needed to determine the necessary attribute values for donors and recipients. Another future objective is the implementation of rule-based systems in clinical practice to study their effectiveness with respect to the current allocation system (MELD). Acknowledgments This work was partially subsidised by the Spanish InterMinisterial Commission of Science and Technology under Project TIN2011-22794, the European Regional Development fund, and the “Junta de Andalucía” (Spain), under Project P08-TIC-3745. M. Cruz-Ramírez’s research has been subsidised by the FPU Predoctoral Program (Spanish Ministry of Education and Science), grant reference AP2009-0487. We would like thank the Editor and the Reviewers for their helpful suggestions for the paper and Astellas Pharma Company for their partial support. References

4.4.5. Example 4: recipients with the same MELD value (MELD = 27) In the last example (second part of Table 7), we selected five recipients with the same MELD value of 27. The criterion used to sort recipients was the waiting-list time. Six of the ten organs were allocated to recipient 16, and the other four were allocated to recipient 17. In many cases, there was an allocation equality between multiple recipients, but the organs were allocated to those with higher MELD values (as in this case, if all patients have the same MELD value, the organ is assigned to the recipient who has spent the longest time on the waiting list). Assigning the organ to the recipient with the longest wait time preserves principles of equity, utility and efficiency because any patient with an acceptable MELD value will stay on the waiting list indefinitely. 5. Conclusions In this study, a MOEA was designed to determine the survival of a patient after liver transplantation. In our experiment, the survival time was set at three months after the operation. With this MOEA, two sets of ANN models are obtained. One set is formed by ANN models guided by E (models with the best E value in the training phase) and optimises the probability of graft survival, and the other set is composed of ANN models guided by MS (models with the best MS value) and minimises the probability of graft failure. To combine the errors provided by these two sets, a rule-based system was designed. This rule-based system obtains the most favourable donor–recipient pair of the top five recipients on the

[1] Kamath P, Wiesner R, Malinchoc M, Kremers W, Therneau T, Kosberg C, et al. A model to predict survival in patients with end-stage liver disease. Hepatology 2001;33(2):464–70. [2] Busuttil RW, Tanaka K. The utility of marginal donors in liver transplantation. Liver Transplant 2003;9(7):651–63. ˜ J, Solorzano G, Pera C. A proposal for scoring marginal liver grafts. [3] Briceno Transplant International 2000;13:S249–52. [4] Child C, Turcotte J. Surgery and portal hypertension. In: Child C, editor. The liver and portal hypertension. Philadelphia: Saunders/Elsevier; 1964. p. 50–64. [5] Feng S, Goodrich NP, Bragg-Gresham JL, Dykstra DM, Punch JD, DebRoy MA, et al. Characteristics associated with liver graft failure: the concept of a donor risk index. American Journal of Transplantation 2006;6(4):783–90. [6] Kamath P, Kim W. The Model for End-stage Liver Disease (MELD). Hepatology 2007;45(3):797–805. [7] Rana A, Hardy MA, Halazun KJ, Woodland DC, Ratner LE, Samstein B, et al. Survival outcomes following liver transplantation (SOFT) score: a novel method to predict patient survival following liver transplantation. American Journal of Transplantation 2008;8(12):2537–46. [8] Dreiseitl S, Ohno-Machado L. Logistic regression and artificial neural network classification models: a methodology review. Journal of Biomedical Informatics 2002;35(5–6):352–9. [9] Coello CAC, Lamont GB, Veldhuizen DAV. Evolutionary algorithms for solving multi-objective problems (genetic and evolutionary computation). 2nd edition Springer-Verlag New York, Inc.; 2007. [10] Li B, Meng M-H, Lau J. Computer-aided small bowel tumor detection for capsule endoscopy. Artificial Intelligence in Medicine 2011;52(1):11–6. [11] Oztekin A, Delen D, Kong Z. Predicting the graft survival for heart-lung transplantation patients: an integrated data mining methodology. International Journal of Medical Informatics 2009;78(12):e84–96. [12] Delen D, Oztekin A, Kong Z. A machine learning-based approach to prognostic analysis of thoracic transplantations. Artificial Intelligence in Medicine 2010;49(1):33–42. [13] Sheppard D, McPhee D, Darke C, Shrethra B, Moore R, Jurewitz A, et al. Predicting cytomegalovirus disease after renal transplantation: an artificial neural network approach. International Journal of Medical Informatics 1999;54(1):55–76.

Please cite this article in press as: Cruz-Ramírez M, et al. Predicting patient survival after liver transplantation using evolutionary multi-objective artificial neural networks. Artif Intell Med (2013), http://dx.doi.org/10.1016/j.artmed.2013.02.004

G Model ARTMED-1275; No. of Pages 13

ARTICLE IN PRESS M. Cruz-Ramírez et al. / Artificial Intelligence in Medicine xxx (2013) xxx–xxx

[14] Bishop C. Neural networks for pattern recognition. 1st edition New York, NY, USA: Oxford University Press, Inc.; 1995. [15] Zhang G. Neural networks for classification: a survey. IEEE Transactions on Systems Man and Cybernetics, Part C: Applications and Reviews 2000;30(4):451–62. [16] Abbass H. An evolutionary artificial neural networks approach for breast cancer diagnosis. Artificial Intelligence in Medicine 2002;25(3):265–81. [17] Grossi E, Mancini A, Buscema M. International experience on the use of artificial neural networks in gastroenterology. Digestive and Liver Disease 2007;39(3):278–85. [18] Yao X. Evolving artificial neural networks. Proceedings of the IEEE 1999;87(9):1423–47. [19] Moscato P, Cotta C. A gentle introduction to memetic algorithms. In: Handbook of metaheuristics, vol. 57. Boston, MA: Springer US; 2003. p. 105–44. [20] Igel C, Hüsken M. Empirical evaluation of the improved rprop learning algorithms. Neurocomputing 2003;50(6):105–23. [21] Levenberg K. A method for the solution of certain non-linear problems in least squares. The Quarterly of Applied Mathematics 1944;2(2):164–8. [22] Bishop C. Improving the generalization properties of radial basis function neural networks. Neural Computation 1991;3(4):579–81. [23] Durbin R, Rumelhart D. Products units: a computationally powerful and biologically plausible extension to backpropagation networks. Neural Computation 1989;1(1):133–42. [24] Taneda A. Multi-objective pairwise RNA sequence alignment. Bioinformatics 2010;26(19):2383–90. [25] Yu Y, Zhang J, Cheng G, Schell M, Okunieff P. Multi-objective optimization in radiotherapy: applications to stereotactic radiosurgery and prostate brachytherapy. Artificial Intelligence in Medicine 2000;19(1):39–51. [26] Teixeira C, Graa Ruano M, Ruano A, Pereira W. Neuro-genetic non-invasive temperature estimation: intensity and spatial prediction. Artificial Intelligence in Medicine 2008;43(2):127–39. [27] Pardo-Montero J, Fenwick J. An approach to multiobjective optimization of rotational therapy. II. Pareto optimal surfaces and linear combinations of modulated blocked arcs for a prostate geometry. Medical Physics 2010;37(6):2606–16. [28] Craft D, Monz M. Simultaneous navigation of multiple Pareto surfaces with, an application to multicriteria IMRT planning with multiple beam angle configurations. Medical Physics 2010;37(2):736–41. [29] Jin Y, Sendhoff B. Pareto multiobjective machine learning: an overview and case studies. IEEE Transaction on Systems Man and Cybernetics, Part C: Applications and Reviews 2008;38(3):397–415. [30] Abbass H. Speeding up backpropagation using multiobjective evolutionary algorithms. Neural Computation 2003;15:2705–26. [31] Silva V, Fleming P, Sugimoto J, Yokoyama R. Multiobjective optimization using variable complexity modelling for control system design. Applied Soft Computing 2008;8(1):392–401. [32] Fernández JC, Hervás C, Martínez FJ, Gutiérrez PA, Cruz M. Memetic Pareto differential evolution for designing artificial neural networks in multiclassification problems using cross-entropy versus sensitivity. In: Corchado E, Wu X, Oja E, Herrero A, Baruque B, editors. Proceedings of the 4th international conference, HAIS 2009, vol. 5572. Berlin, Heidelberg, Salamanca, Spain: Springer-Verlag; 2009. p. 433–41.

13

[33] Cruz-Ramírez M, Sánchez-Monedero J, Fernández-Navarro F, Fernández J, Hervás-Martínez C. Memetic Pareto differential evolutionary artificial neural networks to determine growth multi-classes in predictive microbiology. Evolutionary Intelligence 2010;3(3–4):187–99. [34] Fawcett T. An introduction to ROC analysis. Pattern Recognition Letters 2006;27:861–74. [35] Cohen J. A coefficient of agreement for nominal scales. Educational and Psychological Measurement 1960;20(1):37–46. [36] Ligeza A. Logical foundations for rule-based systems. Secaucus, NJ, USA: Springer-Verlag New York, Inc.; 2006. [37] Chiarugi F, Colantonio S, Emmanouilidou D, Martinelli M, Moroni D, Salvetti O. Decision support in heart failure through processing of electro- and echocardiograms. Artificial Intelligence in Medicine 2010;50(2):95–104. [38] Deb K, Pratab A, Agarwal S, Meyarivan T. A fast and elitist multiobjective genetic algorithm: NSGA2. IEEE Transactions on Evolutionary Computation 2002;6(2):182–97. [39] Pareto V. Cours D’Economie Politique. Genève: Droz; 1896. [40] Martínez-Estudillo AC, Hervás-Martínez C, Martínez-Estudillo FJ, GarcíaPedrajas N. Hybridization of evolutionary algorithms and local search by means of a clustering method. IEEE Transactions on Systems, Man and Cybernetics, Part B: Cybernetics 2006;36(3):534–45. [41] Fernández JC, Martínez F, Hervás C, Gutiérrez PA. Sensitivity versus accuracy in multi-class problems using memetic Pareto evolutionary neural networks. IEEE Transactions on Neural Networks 2010;21(5):750–70. [42] Fernández J, Hervás C, Martínez-Estudillo F, Gutierrez P. Memetic Pareto evolutionary artificial neural networks to determine growth/no-growth in predictive microbiology. Applied Soft Computing 2011;11:534–50. [43] Hwang C, Yoon K. Multiple attribute decision making: methods and application: a state-of-the-art survey. Berlin, New York: Springer-Verlag; 1981. [44] Witten I, Frank E. Data mining: practical machine learning tools and techniques, 2nd edition, Morgan Kaufmann Series in Data Management Sys. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc.; 2005. [45] le Cessie S, van Houwelingen J. Ridge estimators in logistic regression. Applied Statistics 1992;41(1):191–201. [46] Landwehr N, Hall M, Eibe F. Logistic model trees. Machine Learning 2005;59:161–205. [47] Quinlan R. C4.5: programs for machine learning. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc.; 1993. [48] Cristianini N, Shawe-Taylor J. An introduction to support vector machines and other kernel-based learning methods. 1st edition Cambridge, UK: Cambridge University Press; 2000. [49] Gutirrez PA, Hervs-Martnez C, Martnez-Estudillo FJ, Carbonero M. A two-stage evolutionary algorithm based on sensitivity and accuracy for multi-class problems. Information Sciences 2012;197(0):20–37. [50] Caruana R, Niculescu-Mizil A. Data mining in metric space: an empirical analysis of supervised learning performance criteria. In: Proceedings of the 10th international conference in knowledge discovery and data mining. 2004. p. 69–78. [51] Pe na-Reyes C, Sipper M. Evolutionary computation in medicine: an overview. Artificial Intelligence in Medicine 2000;19(1):1–23. [52] Hochberg Y, Tamhane A. Multiple comparison procedures. New York, NY, USA: John Wiley & Sons, Inc.; 1987.

Please cite this article in press as: Cruz-Ramírez M, et al. Predicting patient survival after liver transplantation using evolutionary multi-objective artificial neural networks. Artif Intell Med (2013), http://dx.doi.org/10.1016/j.artmed.2013.02.004