Modeling consideration sets and brand choice using artificial neural networks

Modeling consideration sets and brand choice using artificial neural networks

European Journal of Operational Research 154 (2004) 206–217 www.elsevier.com/locate/dsw Computing, Artificial Intelligence and Information Technology ...

374KB Sizes 0 Downloads 29 Views

European Journal of Operational Research 154 (2004) 206–217 www.elsevier.com/locate/dsw

Computing, Artificial Intelligence and Information Technology

Modeling consideration sets and brand choice using artificial neural networks €rn Vroomen Bjo

a,*

, Philip Hans Franses b, Erjen van Nierop

c,1

a

Erasmus Research Institute of Management, Erasmus University Rotterdam, P.O. Box 1738, NL-3000 DR, Rotterdam, The Netherlands b Econometric Institute and Department of Marketing and Organization, Erasmus University Rotterdam, P.O. Box 1738, NL-3000 DR, Rotterdam, The Netherlands c Tinbergen Institute, Erasmus University Rotterdam, P.O. Box 1738, NL-3000 DR, Rotterdam, The Netherlands Received 16 January 2001; accepted 12 September 2002

Abstract Brand choice can be viewed as a two-step process. Households first construct a consideration set, which does not necessarily include all available brands, and then make a final choice from this set. In this paper we put forward an econometric model for this two-step process, where we take into account that consideration sets usually are not observed. Our model is an artificial neural network, where the consideration set corresponds with the hidden layer of the network. We discuss representation, parameter estimation and inference. We illustrate our model for the choice between six detergent brands and show that the model improves upon one-step models, in terms of fit and out-of-sample forecasting. Ó 2002 Elsevier B.V. All rights reserved. Keywords: Consideration set; Brand choice; Artificial neural network

1. Introduction The increasing amount of household-specific scanner data allows marketing researchers to give better explanations and make better predictions of consumer behavior. An important topic in marketing is modeling brand choice. Numerous * Corresponding author. Tel.: +31-10-4088947; fax: +31-104089162. E-mail address: [email protected] (B. Vroomen). 1 Current address: Graduate School of Industrial Administration, Carnegie Mellon University, 5000 Forbes Avenue, Pittsburgh, PA 15213-3890, USA.

models have been developed and occasionally used in practice. Many of these brand choice models are based on a multinomial logit model, MNL (McFadden, 1973; Guadagni and Little, 1983; Lattin and Bucklin, 1989; among others) or a multinomial probit model (Hausman and Wise, 1978; Daganzo, 1979). A key property of these models is that households are assumed to consider the full set of available brands on each purchase occasion. However, this assumption may not be realistic as households may first reduce the number of brands to a smaller set, before making a final choice. This intermediate set, called the consideration set, thus contains a

0377-2217/$ - see front matter Ó 2002 Elsevier B.V. All rights reserved. doi:10.1016/S0377-2217(02)00673-2

B. Vroomen et al. / European Journal of Operational Research 154 (2004) 206–217

subset of available brands that a household considers to buy. The concept of consideration sets implies that the process of making a decision concerning brand choice is a two-step process. In the first step there is a reduction of the set of available brands into a consideration set. In the second step the final choice is made from this reduced set. It is important to note here that consideration sets are usually not observed. Hence, somehow one has to estimate these from observed purchase data. In this paper, we present an econometric model for the above discussed two-stage process of brand choice, that is, the model incorporates the formation of consideration sets. Interestingly, the model we propose for brand choice turns out to have the structure of an artificial neural network (ANN). ANNs are often used in economics and business applications. ANNs are used to describe and forecast many important variables. An important feature of ANNs is that they can be fitted to a wide range of data patterns, see also Kuan and White (1994) and Swanson and White (1995). For example, they have been used to predict bankruptcy (Zhang et al., 1999) and to highlight structural changes in time series data (Franses and Draisma, 1997; Franses and van Dijk, 2000). We refer to Vellido et al. (1999) for a survey of applications of ANNs in business. Although ANNs are valuable tools for empirical modeling, a well-known drawback of ANNs concerns the interpretation of the model parameters. Applications of ANNs for marketing problems can be found in Hruschka (1993) and Agrawal and Schorling (1996), among others. More specific, applications of ANNs for brand choice modeling can be found in Hu et al. (1999), Heimel et al. (1998) and, more recently, Hruschka et al. (2002). West et al. (1997) compare an ANN with discriminant analysis and logistic regression. They conclude that an ANN can outperform the two statistical techniques on predicting consumer choice when the underlying choice rule is known and can give better out-of-sample forecasts when the choice rule is not known. Dasgupta et al. (1994) and Kumar et al. (1995) compare the same two statistical techniques with an ANN.

207

The novelty of this paper is that we address the interpretation aspects of an ANN by demonstrating that such a model can naturally arise from a two-step brand choice model involving consideration sets. This is achieved by assuming that the consideration set corresponds with the hidden layer of the network. We illustrate our model for the choice between six liquid detergent brands. The outline of this paper is as follows. In Section 2, we present a further but brief elaboration of the concept of consideration sets. In Section 3 we propose our model, discuss how it can be evaluated and how the parameters can be estimated. In Section 4, we illustrate our model. Finally, we conclude our paper with a discussion.

2. Consideration sets The assumption that households can reduce the total number of alternative brands into a consideration set before making a final choice has recently raised much interest in theoretical and empirical marketing research. The theory of consideration sets assumes that consumers reduce the number of alternatives from an overall set (or universal set) into smaller sets. The consideration set is the final set before the purchase decision, containing only those products the household is considering to choose. We refer to Roberts and Lattin (1991, 1997), Shocker et al. (1991) and Horowitz and Louviere (1995) for detailed discussions on the theory of consideration sets and for extensive surveys of the relevant literature. Numerous (theoretical) models can be developed to describe this brand choice process incorporating consideration sets, due to the different ways households can construct these sets. For example, the formation of the set can be memory-based or stimulus-based. Furthermore, the consideration set can be assumed constant or time-varying over subsequent purchase occasions. A discussion of these issues can be found in Manrai and Andrews (1998). From a theoretical point of view the assumption that the brand choice process is a multi-level process is rather appealing. From a practical point of view however, the concept involving consideration

208

B. Vroomen et al. / European Journal of Operational Research 154 (2004) 206–217

sets is a little less appealing, as in many situations one cannot observe these sets. Only in cases of experimental surveys, the market researcher may obtain information on the various sets as constructed by households. Surveys have the disadvantage that they are expensive, not widely available and it is not straightforward to forecast the consideration sets based on surveys. If one only has scanner data, containing individual-specific and longitudinal information on purchases, one has no direct information on the composition of the consideration set. By treating the consideration set as a stochastic variable, which is somewhere between all available brands and the selected brand, one can try to estimate this set. For example, Chiang et al. (1999) use scanner data to simulate probabilities that a certain consideration set is formed. Based on these probabilities, the final choice can be described. Although consideration sets cannot be observed directly, empirical arguments can be found that support this concept. Furthermore, results from an extensive comparison study in Manrai and Andrews (1998) indicate that two-stage models tend to give better forecasts then a one-stage MNL model. Although the two-stage models may perform better, Manrai and Andrews (1998) argue that this does not automatically mean that the set formation process is described correctly. Moreover, Horowitz and Louviere (1995) indicate that, in case the final choice model is not properly specified, the use of a consideration set yields no extra information.

Sections 3.3 and 3.4, we elaborate on model interpretation and on the method used for parameter estimation. 3.1. Graphical representation During the two stages of a decision making process a household needs to form an attitude towards a brand in order to determine whether that particular brand is to be considered or even purchased. We assume that in the first stage, that is, the formation of the consideration set, a utility measure is determined for each brand and that the brands that exceed a certain threshold are incorporated in the consideration set. Similar approaches can be found in Fotheringham (1988), Andrews and Srinivasan (1995) and Bronnenberg and Vanhonacker (1996). We further assume that these utilities are based on two types of characteristics. First, there are household-specific characteristics such as demographic and social factors, like the size of the household or the income level. Second, there are brand characteristics like price, promotion, and advertising. In the second stage, conditional on the determined consideration set, a brand is chosen. This decision is based on the available characteristics of the brands in the consideration set. The utility formation, resulting in consideration set formation, is illustrated by the highlighted part in Fig. 1. We denote by CSj , a 1=0 node that brand

3. Consideration sets, brand choice and a neural network As discussed in the previous section, a two-stage model for brand choice might be more appropriate than a one-stage model. In this section we present a two-stage model, which is based on the decision making process of households and which assumes consideration set formation. In Sections 3.1 and 3.2, we indicate that this model has strong similarities with an ANN. This is because we assume that the unobserved hidden layer in an ANN corresponds with such a consideration set. In

Fig. 1. Formation of the consideration set for two brands based on household-specific (xi ) and brand-specific (yp;j ) characteristics, including a brand-specific constant.

B. Vroomen et al. / European Journal of Operational Research 154 (2004) 206–217

j, j ¼ 1; 2; . . . ; J , is included in a consideration set of a certain household (1) or not (0). For each brand j, there are p ¼ 1; 2; . . . ; P possible characteristics yp;j , which can be used to determine the value of CSj . Also for each brand j, there are i ¼ 1; 2; . . . ; I household-specific characteristics xi , that are able to influence consideration set formation. The utility formation procedure stated above implies that, we assume that during the phase of consideration set formation, a household does not make a comparison between brands, which is an assumption usually made in the literature, see for example Chiang et al. (1999). A brand is considered when a function of its own characteristics exceeds a certain threshold. In the choice stage, a comparison is made between the considered brands. This results in that the brand characteristics yp;j , like price and promotion, only influence brand j, while the variables xi can have an effect on each brand. In Fig. 1 we graphically display these assumptions for two brands. The second stage in brand choice is illustrated by the highlighted part in Fig. 2, again for two brands. The final choice for brand j, j ¼ 1; 2; . . . ; J denoted as the binary variable FCj , is determined by the outcome of the consideration set formation stage (CSj ) and by the choice-specific characteristics zq , q ¼ 1; 2; . . . ; Q, like the observed price at the purchase occasion. We assume that both CSj and zq have an effect on FCj , for all j. As depicted in Figs. 1 and 2, the two stages together constitute the decision making process.

209

Interestingly, the structure of this process appears to have the familiar structure of a feed-forward ANN. In network terminology, the resulting network consists of three layers. Because we aim to describe a brand choice process including a consideration set, we in fact assume that the second layer, what is called the hidden layer, corresponds with the consideration set. Household-specific and brand-specific characteristics are used as input (the first layer) while the output layer (the third layer) indicates which brand is chosen. A special property of our network is that it is not fully connected, that is, some effects are restricted to zero. Additionally, the output layer is allowed to benefit from external data. To make the connection between the model in Figs. 1 and 2 and an ANN more specific, we relax the assumption that the hidden layer contains nodes that give a value of 1 when the brand is considered and a value of 0 when not. Instead, we transform the utility of each brand into a probability that the brand is considered. Nodes in the output layer indicate the choice made from all available brands. Both the hidden and output layer have as many nodes as there are brands. We further impose the restriction that the probability of considering brand j, that is, CSj , has a positive influence on the probability of choosing brand j, that is, FCj . Additional to a flexible value of CSj , we also assume that nodes in the output layer give the probability that the corresponding brand is chosen. Finally, we assume that the brand with the highest probability FCj is chosen.

3.2. Representation

Fig. 2. Final choice between two brands, based on the consideration set and extra information, including a brand-specific constant.

In this section we transform the graphical model in Figs. 1 and 2 into an econometric model. When one is familiar with the underlying structure of an ANN, it is not difficult to see that the (overall) structure in Figs. 1 and 2 already suggests such a model, see also Kuan and White (1994). Each of the J brands in the hidden (second) layer gets represented by, what in network terminology is called, a sigmoidal node CSj . This node determines the probability that the corresponding brand is considered according to

210

CSj ¼ F a0;j þ

B. Vroomen et al. / European Journal of Operational Research 154 (2004) 206–217 I X i¼1

j ¼ 1; 2; . . . ; J :

ai;j xi þ

P X

! cp;j yp;j ;

p¼1

ð1Þ

We take a weighted sum for each of the J brands of household-specific variables (xi ) and brandspecific variables (yp;j ). The weights are denoted by ai;j and cp;j , respectively. Further, a constant a0;j is added. This weighted sum is scaled to the ½0; 1interval using the logistic function, F ðvÞ ¼ ð1 þ 1 expðvÞÞ . The second stage, as illustrated by the highlighted part in Fig. 2, can be interpreted as a classification problem where the classes are mutually exclusive. It is now desired that the nodes in the output layer give the choice probabilities for the J brands, and hence that the output of these nodes should lie in the ½0; 1-interval and that they should sum to unity. To achieve this, the probability of final choice (FCj ) is determined using P P expðb0;j þ Jk¼1 bk;j CSk þ Qq¼1 dq;j zq Þ FCj ¼ PJ ; PJ PQ l¼1 expðb0;l þ k¼1 bk;l CSk þ q¼1 dq;l zq Þ j ¼ 1; 2; . . . ; J ;

ð2Þ

which is of course the familiar MNL model. In network terminology, (2) is a generalization of the logistic sigmoid activation function, also known as the normalized exponential or softmax activation function. Upon using (2), the nodes in the output layer can be interpreted as probabilities conditional on the output of the hidden nodes, that is, the probability of choice is determined conditional on the probability of consideration Bishop (1995). For each brand, a weighted sum is determined based on the outputs of the consideration layer (CSj ) and external data (zq ) where the weights are denoted by bk;j and dq;j , respectively. Again, a constant b0;j is added. This sum is scaled to the ½0; 1-interval by dividing it with the sum over all brands. Note that the external data zq may include information already used while determining the probabilities of consideration. In fact, no adverse consequences are expected from including the same variables for both the determination of the consideration set and the final choice, see also Andrews and Manrai (1998).

Combining (1) and (2) gives our two-stage model for describing brand choice. The model first determines for each brand the probability that it is considered, based on household-specific and brand-specific characteristics. Subsequently, based on these probabilities and on extra information, the finally selected brand is determined. The model contains J 2 þ J ðI þ P þ Q þ 2Þ parameters. For example, if J ¼ 4, I ¼ 2, P ¼ 3 and Q ¼ 1, the model has 48 parameters. It should be noted here, that even though the MNL-part of the model contains unidentified parameters, the estimation routine discussed below does not suffer from this and can be used to estimate all parameters. A potential drawback of the model is that the number of parameters is quadratic in J . Therefore, an increase in the number of brands leads to a rapid increase of parameters. 3.3. Interpretation ANNs differ from many econometric models in the sense that they, in general, do not make assumptions about the properties of the disturbance term. Hence, standard tests on model specification (based on this disturbance term) cannot be performed. Furthermore, testing hypothesis on parameter restrictions is not straightforward. It is well known that ANN parameters, that is, the weights in (1) and (2), cannot be dealt with as in a linear regression model (see, among others, Bishop (1995), Franses and Draisma (1997)). The main reason for this is that the model has many more parameters than there are observable variables. Despite this limitation of ANNs, the impact of variables on the outcome of the model can be understood by looking at the changes in the outcomes due to the changes in the values of the variables. To illustrate the influence of variables, one can perform a graphical sensitivity analysis. The variables are fixed at their average value and only the value of one variable is allowed to vary. Changes in the outcome of the nodes in the hidden layer and the output layer can give an idea of the influence of that particular variable on the probabilities of consideration and choice, respectively. In addition to this graphical method, sensitivity criteria can be

B. Vroomen et al. / European Journal of Operational Research 154 (2004) 206–217

determined to display the effect of variables on the outcome of the model. Examples P of these criteria are the average derivative (1=n i ðoyi =oxi Þ) P or average elasticity (1=n i ðoyi =oxi Þðxi =yi Þ), see Zapranis and Refenes (1999). These criteria are based on the first derivative and one should bear in mind that they are not constant but are a function of one or more variables due to the highly nonlinear structure of the model. Due to this dependency, the effect measured can differ between the criteria used. Another example of a sensitivity criterium can be found in Hruschka (1993). Although these two methods can give valuable insights, no clear objective method is available to determine the significance of the effects. 3.4. Estimation The model proposed in (1) and (2) is what is called a feedforward network. The network is unidirectional with a MNL like structure for the output layer. The latter specification deviates from the commonly used feedforward networks. To train the network, that is, to estimate the parameters (weights on the connecting arcs) we use a heuristic learning algorithm. We use the commonly used back-propagation algorithm, which is based on gradient descent. An observation (also called training example) is propagated through the layers in forward direction, resulting in a output from the layers. As the actual outcome of that particular observation is known, the prediction error made by the output layer can be determined. These errors are propagated in backward direction through the network resulting in error values for the nodes in the hidden layer. All parameters are adjusted according to these error values obtained from the hidden and output layer. Progress of this estimation process is measured by an overall error function. Usually this function has the form of the sum of the squared differences between target and network output. However, estimating the parameters in an ANN with a sum-ofsquared error function assumes a priori that the target data are generated from a smooth deterministic function with added Gaussian noise. In our case, the Gaussian model does not provide a good description, as the target (actual) data are

211

binary variables, see Bishop (1995). Furthermore, as we use the MNL specification for the output layer, see (2), the appropriate error function to minimize is E¼

N X J X n¼1

tjn lnFCnj ;

ð3Þ

j¼1

which is the negative log-likehood function of the MNL model. N in (3) is the number of observations and J denotes the number of brands in the output layer. FCj denotes the output of node j in the output layer and tj denotes the target (actual) output for node j. The back-propagation algorithm starts with setting the weights, wij , at a small value (wij 2 ½0:1; 0:1). After completing the forward step of the algorithm, the errors made in the network are determined. Weights are then adjusted according to the update rule Dwi;j ¼ g

oE ; owij

ð4Þ

where g 2 ð0; 1 is a learning constant. The gradient used for updating the parameters in our model is derived in Appendix A. Estimation of the parameters in an ANN with the back-propagation algorithm can take thousands of iterations. As stopping criteria one may use the number of iterations or a threshold for the error function. As back-propagation is a heuristic, it is possible that the algorithm ends in a local optimum. To overcome this problem, one may add a momentum to the weight update rule which yields Dwni;j ¼ g

oE ðn1Þ þ aDwi;j ; owij

ð5Þ

where a fraction (a) of the weight update of the previous iteration is added to the original update rule. Furthermore, one may estimate the parameters several times and select the one with the lowest value for the error function. One should note that both methods may help with respect to convergence, but they do not guarantee that the global optimum will be found. Another point of concern is the problem of overfitting the data. To avoid this, two samples are

212

B. Vroomen et al. / European Journal of Operational Research 154 (2004) 206–217

used during the estimation procedure. The first sample is used to estimate the parameters. The second sample is used to determine the progress of the estimation process. One may also use K-fold cross validation to avoid overfitting (see for an example Hruschka et al. (2002). We refer to Mitchell (1997) and Bishop (1995) for a discussion as well as extensions and shortcomings of the back-propagation algorithm.

4. Illustration To illustrate the empirical usefulness of our model for a two-stage brand choice process, we demonstrate it on scanner data for six liquid detergent brands. These data are also used in Chintagunta and Prasad (1998) and can be freely downloaded. 4.1. Data Our data consists of scanner data with information on six liquid detergent brands, that is, Tide, Wisk, Eraplus, Surf, Solo and All 2. The data contain 3055 observations of liquid detergent purchases, concerning 400 households. Each observation consists of four household-specific variables. These include the volume the household purchased on the previous purchase occasion, the expenditures on non-detergents, the size of the household and the inter-purchase time, see also Chintagunta and Prasad (1998) for a discussion of these data. Furthermore, three variables are available for each brand per observation. These are the price of a brand (cents/oz.), a 1=0 dummy variable for feature promotion, and a 1=0 dummy for display promotion. An average household consists of three members. The average inter-purchase time is 80 days and expenditures on non-detergents are $34.69. Average prices for the six brands range from 3.9 to 6.1 cents/oz.

2

This is the A.C. Nielsen household scanner panel data on purchases of liquid laundry detergents in the Sioux Falls, South Dakota, market.

There are 398 incomplete observations, and these are eliminated from the sample. Further, we introduce for each brand a variable that indicates if the brand has been purchased recently. We use, for comparison reasons, the specification of Bronnenberg and Vanhonacker (1996) instead of the more common Guadagni-Little loyalty variable (Guadagni and Little, 1983. This ÔrecencyÕ variable is determined according to Xjth ¼ ð1  djth ÞkXt1 þ djth , where djth is an indicator function if household h bought brand j at time t  1, with k 2 ð0; 1Þ a smoothing parameter. If brand j has never been bought on a previous occasion, this variable will have the value 0 and else a value in the range of 0 and 1, according to the recency of the purchase. As a result, the first recorded purchase of each household is lost. Hence, in total we have 3055  398  400 ¼ 2257 observations. Due to the different scales of the variables, data transformations seem appropriate to facilitate parameter estimation. All non-binary variables are scaled to the ½0; 1-interval. Finally, to avoid overfitting and for forecasting purposes, the data sample is divided into three parts. The first sample is used for estimation of the parameters. The second sample is used as a validation sample to monitor the progress of the estimation process and the third sample is used for out-of-sample forecast evaluation. The sets contain 1127, 552, 578 observations respectively. 4.2. Estimation results As there are six brands to choose from, the hidden and output layer consist of six nodes. The variables mentioned above are used in our model. To determine the recency variable we tried several values for k and it turns out that a value 0.75 yields the best results. Each household-specific and brand-specific variable is represented by a node in the input layer, which thus consists of 4 þ 4 6 ¼ 28 nodes. The values of the output layer depend on the outputs of the hidden layer as well as on the prices of the brands, which serve as extra information for making the final choice. As mentioned earlier an issue of concern is the possibility of ending in a local minimum. To overcome this, we add a momentum of 0.6 to the

B. Vroomen et al. / European Journal of Operational Research 154 (2004) 206–217

update rule of the parameters, see (5). Furthermore, as the back-propagation algorithm gives no guarantee that a global optimum has been attained, the parameters of the model are estimated several times. Each estimation process consists of 1000 iterations of the back-propagation algorithm with a learning constant g of 0.05. After these 1000 iterations, the error value obtained with (3) is approximately constant, thereby indicating that an optimum has been attained. The estimated parameters, which result in the lowest error value for the validation sample, are selected for evaluating the impact of the variables and for forecasting. We use the software package Ox (see Doornik, 1999) to estimate the parameters. The performance of the model is examined by considering the results on forecasting for each of the three samples. As performance measure, we use the hit rate, that is, the percentage of times the estimated selected brand matches the actually chosen brand. We also computed other criteria like BIC, but these gave the same qualitative outcomes. Results of our consideration set artificial neural network (CSANN) are compared with three other models. The first is the CSM model taken from Bronnenberg and Vanhonacker (1996), which has a similar structure as CSANN and is also a twostage model. This CSM model is related to the model proposed in Fotheringham (1988). The second model is a one-stage model and is an ANN, which is fully connected, has no parameter restrictions, has no MNL-like structure for the output layer and it also assumes that the output layer does not get external data. Finally, the third model is a one-stage MNL model. Results of the four models are shown in Table 1. Based on these results, the performance of our model appears to be rather good. Looking at the hit rate our model performs well in-sample and it outperforms the other models on out-of-sample forecasting. However, the log likelihood values indicate that the CSM-model performs better on out-of-sample forecasting. In general the performance of the CSM model and the CSANN model are similar, and are both better than the one-stage models. Clearly, these two-stage models benefit from the incorporation of the consideration set formation. The findings from Table 1 are in line

213

Table 1 Results of the proposed ANN in comparison with other models Two-stage Sample

CSANN

One-stage CSMa

ANN

MNL

b

Hit rate Estimation (%)c Validation (%) Forecast (%)

84.5 72.9 73.9

80.7 74.6 73.7

Log likelihood d Estimation Validation Forecast #Parameters

)555.4 )476.9 )468.2 132

)660.4 )413.1 )443.7 12

a

86.7 74.8 70.8 – – – 168 CSj expðfj ðvÞÞ

Choice-set formation model: FCj ¼ PJ

b

l¼1

82.7 74.6 71.6 )585.2 )483.7 )546.0 78 .

CSl expðfl ðvÞÞ

Percentage of correct predicted purchases. The parameters of the CSANN and ANN are estimated for the estimation sample, using the validation sample for progress evaluation. The parameters of the MNL and CSM model are estimated for the estimation sample only. d Log likelihood values cannot be obtained for the ANN model due to the different model specification. c

with the results of Manrai and Andrews (1998), that is, two-stage models tend to give better outof-sample forecasts then one-stage models. Of course, the CSANN contains many more parameters than the MNL model, but from an interpretation point of view, we believe it is more appropriate. 4.3. Graphical interpretation We use a graphical method to investigate the importance of variables on outcomes of the model. These graphs are easy to interpret. Due to the large number of explanatory variables, there are many ways to visualize the effects of variables. For example, one can investigate the competitive structure between brands or the influence of promotion activities. In this section, we will only give a few examples. Tide and Wisk are the two largest brands in our data set. It is now interesting to see whether and how the probability that Wisk is chosen reacts on a price change of Tide. To determine the possible influence of the price of Tide on Wisk, all variables are fixed at their average value and the price of Tide is varied over the ½0; 1-interval. The effect of the price change is shown in Fig. 3, which shows

214

B. Vroomen et al. / European Journal of Operational Research 154 (2004) 206–217 1

1

TIDE WISK

.8 .7 .6 .5 .4 .3

.8 .7 .6 .5 .4 .3 .2

.2

.1

.1

0

.1

.2

.3

.4

.5

.6

.7

.8

.9

1

Fig. 3. Change in the probability that Tide and Wisk are chosen due to price changes of Tide.

the probabilities for Tide and for Wisk that they are chosen for various price levels of Tide. Clearly, Wisk is in this Ôaverage scenarioÕ not a key competitor of Tide. The probability that Wisk is purchased increases, but not in an explosive way as one would expect of a main competitor. It is also interesting to see that the probability that Tide is chosen stays long at a high level before rapidly dropping towards a value of zero. This suggests that there is some threshold for loyalty. The effect of promotion activities, that is, a feature or a display for Tide can be determined by switching these variables on and off during a price increase of Tide. Figs. 4 and 5 display the effect on 1

Benchmark Feature Display Recency

.9 .8 .7 .6 .5 .4 .3 .2 .1

0

.1

.2

.3

.4

.5

0

.1

.2

.3

.4

.5

.6

.7

.8

.9

1

Price Tide

Price Tide

Probability Tide is considered (CS)

Benchmark Feature Display Recency

.9

Probability Tide is choosen (FC)

Probability Tide is choosen (FC)

.9

.6

.7

.8

.9

1

Price Tide

Fig. 4. The estimated probability that Tide is considered due to changes in price of Tide, when combined with a feature, display and previous purchase.

Fig. 5. The estimated probability that Tide is purchased due to changes in price of Tide, when combined with a feature, display and previous purchase.

the node for Tide in the hidden layer, that is, the probability of consideration, and the effect on the node for Tide in the output layer, that is, the probability of choice, respectively. The benchmark case concerns no promotion and no purchase of Tide on a previous occasion. Featuring has a clear positive effect on the probability of considering Tide, as the corresponding line starts at a higher level. A display causes a similar but much smaller effect on the probability of considering Tide. The effect of these two promotion variables on the final choice is similar to their effect for the hidden layer, as shown in Fig. 5. Again, a feature has a clear stronger effect than a display. Activating one of the two promotion variables causes the curve to shift to the right, resulting in an increasing willingness to accept a higher price by the (average) household. The effect of a display is so small that it seems that this effect, for the data sample used, is not relevant. The recency variable that indicates whether Tide has been purchased on a previous occasion can be interpreted as a loyalty variable. When this variable is relevant, there should be a positive effect on consideration and on final choice. The effect of this variable on the probability of consideration is indeed large in comparison with the benchmark situation as can be seen from Fig. 4. The effect on the probability that Tide is finally selected, which is shown in Fig. 5, is also positive when the curve

B. Vroomen et al. / European Journal of Operational Research 154 (2004) 206–217

5. Discussion

Probability Tide is choosen (FC)

1 NDEXP HHSIZE TIME VOL

.9 .8 .7 .6 .5 .4 .3 .2 .1

0

.1

.2

.3

.4

.5

.6

.7

.8

215

.9

1

Value household-specific characteristic

Fig. 6. Changes in the probability that Tide is chosen due to changes in household-specific variables.

shifts to the right. Hence, if one has purchased Tide on the previous occasion, one is more likely to choose again for Tide in the current occasion. The impact of the household variables can be determined in a similar way as in the previous two illustrative examples. Again all variables are fixed at their average value, though now each household-specific variable varies over the ½0; 1-interval. Fig. 6 shows the influence of changes of these variables on the probability of choosing Tide. Clearly, their influence on this probability is nonlinear but of a minimal magnitude, indicating that the household characteristics are not really of importance. It can be expected that a variable like income (if it would have been available) would have a higher impact on brand choice. The above three illustrations give an indication of the possibilities of our two-stage model. Other illustrations can easily be generated in order to provide information that can be valuable to a marketing researcher. In fact, we only used the average values of the brand and household characteristics for our analysis. Another type of analysis could cluster households according to, for example, household size and examine the influence of different scenarios on each cluster. In sum, the model turns out to be a good forecasting tool that can also be used to graphically analyze the effects of changes in important variables on unobserved consideration set probabilities.

Considering the decision making process (mainly brand choice) as a two-stage process seems to be more appropriate than imposing that a brand is selected in one step. To describe these two stages, we proposed an econometric model, which appeared to be an ANN. The first stage, that is, the reduction of available brands into unobserved consideration sets, was transformed into a stage where for each brand the probability of consideration is determined. This resulted in the interpretation of the hidden layer as the consideration set. Based on the outcome of the first stage, the probability was determined for each brand that it would be chosen. We have shown that an ANN can be used for modeling brand choice, taking consideration set formation into account. Our results show that the model performs reasonably good, in comparison with another two-stage model as well as with one-stage models. Furthermore, our illustration confirms earlier findings in the literature that two-stage models tend to give better outof-sample forecasts than one-stage models. Related to the literature on ANNs, our results show that imposing a problem-specific structure on an ANN can improve its interpretation and its performance.

Acknowledgements We thank the editor Jyrki Wallenius, four anonymous reviewers, Bas Donkers, Dennis Fok and Richard Paap for their valuable comments and suggestions.

Appendix A. Derivation of the gradient This appendix deals with the derivation of the gradient of the proposed two-stage model. First we restate, in a slightly different form, the error function used for the back-propagation algorithm. We use the error-function for each observation n, that is,

216

B. Vroomen et al. / European Journal of Operational Research 154 (2004) 206–217

En ¼ 

J X

ðA:1Þ

tj lnFCj :

The partial derivatives of (1) are determined following:

j¼1

This simplification is made for convenience and has no effect on the understanding of the derivation of the gradient. The gradient of the proposed model has the following form: ! oEn oEn oEn oEn : ðA:2Þ rE n ¼ ; ; ; oai;j obk;j ocp;j odq;j First, we will derive oEn =obk;j and oEn =odq;j . oEn oEn oinputj ¼ ; obk;j oinputj obk;j

ðA:3Þ

where inputj stands for the input of node j, that is, the numerator in (2). X oEn oFCk oEn ¼ ; ðA:4Þ oinputj k2IðjÞ oFCk oinputj where IðjÞ are the nodes in the hidden layer connected to node j in the output layer and that serve as input to node j.

ðA:11Þ

where inputHj is the input of node j in the hidden layer. X oEn oEn oinputOk ¼ ; ðA:12Þ oinputHj k2OðjÞ oinputOk oinputHj X oEn oinputOk oCSj ¼ ðFCk  tk Þ ; oinputHj k2OðjÞ oCSj oinputHj ðA:13Þ where inputOk is the input of node k in the output layer and OðjÞ the nodes in the output layer for which node j in the hidden layer serves as input. X oEn ¼ ðFCk  tk Þbk;j CSj ð1  CSj Þ: oinputHj k2OðjÞ ðA:14Þ Using the results of (A.11)–(A.14) yields

oEn tk ¼ ; oFCk FCk

ðA:5Þ

oFCk ¼ FCk I½k¼j  FCk FCj ; oinputj

ðA:6Þ

where I½k¼j is an indicator function that equals 1 if k equals j. oEn ¼ FCj  tj : oinputj

ðA:7Þ

Using the results of (A.3)–(A.6) yields oEn ¼ ðFCj  tj ÞCSk ; obk;j

oEn oEn oinputHj ¼ ; oai;j oinputHj oai;j

k ¼ 1; . . . ; J :

ðA:8Þ

The partial derivative for the constants in (2) becomes:

X oEn ¼ xi CSj ð1  CSj Þ bk;j ðFCk  tk Þ: oai;j k2OðjÞ

ðA:15Þ

The partial derivative for the constants in (1) becomes: X oEn ¼ CSj ð1  CSj Þ bk;j ðFCk  tk Þ: oa0;j k2OðjÞ

ðA:16Þ

In a similar way as for ai;j the derivatives for cp;j can be determined and become: X oEn ¼ yp;j CSj ð1  CSj Þ bk;j ðFCk  tk Þ: ocp;j k2OðjÞ ðA:17Þ

n

oE ¼ FCj  tj : ob0;j

ðA:9Þ

In a similar way as for bk;j the derivatives for dq;j can be determined and become: oEn ¼ ðFCj  tj Þzq : odq;j

ðA:10Þ

References Agrawal, D., Schorling, C., 1996. Market share forecasting: An empirical comparison of artificial neural networks and multinomial logit model. Journal of Retailing 72 (4), 383– 407.

B. Vroomen et al. / European Journal of Operational Research 154 (2004) 206–217 Andrews, R.L., Manrai, A.K., 1998. Simulation experiments in choice simplification: The effects of task and context on forecasting performance. Journal of Marketing Research 35, 198–209. Andrews, R.L., Srinivasan, T.C., 1995. Studying consideration effects in empirical choice models using scanner panel data. Journal of Marketing Research 32 (1), 30–41. Bishop, C.M., 1995. Neural Networks for Pattern Recognition. Oxford University Press, Oxford. Bronnenberg, B.J., Vanhonacker, W.R., 1996. Limited choice sets, local price response and implied measures of price competition. Journal of Marketing Research 33, 163–173. Chiang, J., Chib, S., Narasimhan, C., 1999. Markov chain monte carlo and models of consideration set and parameter heterogeneity. Journal of Econometrics 89, 23–248. Chintagunta, P.K., Prasad, A.R., 1998. An empirical investigation of the dynamic ‘‘mcfadden model’’ of purchase timing and brand choice: Implications for market structure. Journal of Business and Economic Statistics 16, 2–11. Daganzo, C., 1979. Multinomial Probit: The Theory and its Applications to Demand Forecasting. Academic Press, New York. Dasgupta, C.G., Dispensa, G.S., Ghose, S., 1994. Comparing the predictive performance of a neural network model with some traditional market response models. International Journal of Forecasting 10, 235–244. Doornik, J.A., 1999. Object-Oriented Matrix Programming Using Ox, third ed. Timberlake Consultants Press and Oxford, London, Available from: . Fotheringham, A.S., 1988. Consumer store choice and choice set definition. Marketing Science 7 (3), 299–310. Franses, P.H., Draisma, G., 1997. Recognizing changing seasonal patterns using artificial neural networks. Journal of Econometrics 81, 273–280. Franses, P.H., van Dijk, D., 2000. Nonlinear Time Series Models in Empirical Finance. Cambridge University Press, Cambridge. Guadagni, P.M., Little, J.D.C., 1983. A logit model of brand choice calibrated on scanner data. Marketing Science 2, 203–238. Hausman, J., Wise, D., 1978. A conditional probit model for qualitative choice: Discrete decisions recognizing interdependence and heterogenous preferences. Econometrica 45, 319–339. Heimel, J.P., Hruschka, H., Natter, M., Taudes, A., 1998. Konnexionistische kaufakt- und markenwahlmodelle. Zeitschrift f€ ur Betriebswirtschaftliche Forschung 50, 596–613. Horowitz, J.L., Louviere, J.J., 1995. What is the role of consideration sets in choice modelling? International Journal of Research in Marketing 12, 39–54. Hruschka, H., 1993. Determining market response functions by neural network modeling: A comparison to econometric techniques. European Journal of Operational Research 66, 27–35.

217

Hruschka, H., Fettes, W., Probst, M., Mies, C., 2002. A flexible brand choice model based on neural net methodology. A comparison to the linear utility multinomial logit model and its latent class extension. OR Spectrum 24, 127–143. Hu, M.Y., Shanker, M., Hung, M.S., 1999. Estimation of posterior probabilities of consumer situational choices with neural network classifiers. International Journal of Research in Marketing 16, 307–317. Kuan, C.M., White, H., 1994. Artificial neural networks: An econometric perspective (with discussion). Econometric Reviews 13, 1–91. Kumar, A., Rao, V.R., Soni, H., 1995. An empirical comparison of neural network and logistic regression models. Marketing Letters 6, 251–263. Lattin, J.M., Bucklin, E., 1989. Reference effects of price and promotion on brand choice behavior. Journal of Marketing Research 26, 299–310. Manrai, A.K., Andrews, R.L., 1998. Two-stage discrete choice models for scanner panel data: An assessment of process and assumptions. European Journal of Operational Research 111, 193–215. McFadden, D., 1973. Conditional logit analysis of qualitative choice behavior. In: Zarembka, P. (Ed.), Frontiers in Econometrics. Academic Press, New York, pp. 105–142. Mitchell, T.M., 1997. Machine Learning. McGraw-Hill, Singapore. Roberts, J.H., Lattin, J.M., 1991. Development and testing of a model of consideration set composition. Journal of Marketing Research 28, 429–440. Roberts, J.H., Lattin, J.M., 1997. Consideration: Review of research and prospects for future insights. Journal of Marketing Research 34, 406–410. Shocker, A.D., Ben-Akiva, M., Boccara, B., Nedungadi, P., 1991. Consideration set influences on consumer decisionmaking and choice: Issues, models and suggestions. Marketing Letters 2, 181–197. Swanson, N.R., White, H., 1995. A model-selection approach to assessing the information in the term structure using linear models and artificial neural networks. Journal of Business and Economic Statistics 13, 265–275. Vellido, A., Lisboa, P.J.G., Vaughan, J., 1999. Neural networks in business: A survey of applications (1992–1998). Expert Systems with Applications 17, 51–70. West, P.M., Brockett, P.L., Golden, L.L., 1997. A comparative analysis of neural networks and statistical methods for predicting consumer choice. Marketing Science 16 (4), 370– 391. Zapranis, A., Refenes, A., 1999. Principles of neural model indentification, selection and adequacy. With applications to financial econometrics perspectives in neural computing. Springer-Verlag, London. Zhang, G., Hu, M.Y., Patuwo, B.E., Indro, D.C., 1999. Artificial neural networks in bankruptcy prediction: General framework and cross-validation analysis. European Journal of Operational Research 116, 16–32.