Reinforcement online learning for emotion prediction by using physiological signals

Reinforcement online learning for emotion prediction by using physiological signals

Accepted Manuscript Reinforcement Online Learning for Emotion Prediction by Using Physiological Signals Weifeng Liu, Lianbo Zhang, Dapeng Tao, Jun Ch...

10MB Sizes 1 Downloads 42 Views

Accepted Manuscript

Reinforcement Online Learning for Emotion Prediction by Using Physiological Signals Weifeng Liu, Lianbo Zhang, Dapeng Tao, Jun Cheng PII: DOI: Reference:

S0167-8655(17)30200-3 10.1016/j.patrec.2017.06.004 PATREC 6843

To appear in:

Pattern Recognition Letters

Received date: Revised date: Accepted date:

4 March 2017 21 April 2017 8 June 2017

Please cite this article as: Weifeng Liu, Lianbo Zhang, Dapeng Tao, Jun Cheng, Reinforcement Online Learning for Emotion Prediction by Using Physiological Signals, Pattern Recognition Letters (2017), doi: 10.1016/j.patrec.2017.06.004

This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

ACCEPTED MANUSCRIPT 1

Pattern Recognition Letters journal homepage: www.elsevier.com

Reinforcement Online Learning for Emotion Prediction by Using Physiological Signals

a China

University of Petroleum (East China), Qingdao 266580, China University, Kunming 650091, China c Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China b Yunnan

ABSTRACT

CR IP T

Weifeng Liua,∗∗, Lianbo Zhanga , Dapeng Taob , Jun Chengc

ED

M

AN US

Physiological signals generated from human internal organs can objectively and truly reflect the real-time variations of human emotion and monitor body situation. Recently, with the accessibility of a massive number of physiological signal data, emotion analysis by using physiological signals is attracting an increasing attention and many methods have been reported by using electroencephalogram (EEG) or peripheral physiological signals. Although the prominent online learning methods can predict the emotion status with time varying physiological signals, it does not consider the reward of current operation in each iteration. To tackle this problem, in this paper, we propose a reinforcement online learning (ROL) method for real-time emotion state prediction by exploiting the reward to modify the predictor during the online training iterations. In each iteration, we evaluate the reward and then select some specific instances into predictor learning. It gains both significant time reduction and prominent performance. We apply the reinforcement online learning to least squares (LS) and support vector regression (SVR) for Emotion Prediction, respectively. Extensive experiments are conducted on artificial dataset and real-world physiological signal dataset (DEAP dataset) and the experimental results validate the effectiveness of the proposed method. c 2017 Elsevier Ltd. All rights reserved.

PT

1. Introduction

AC

CE

As a psycho-physiological process, emotion is a direct reflection of human conscious or unconscious perception of an object or situation. It plays an important role in people’s interaction with each other. Recently, there is an increasing interest for the development of human’s emotion-state recognition techniques [11, 5, 24]in human activities. The function of these techniques is improving [19, 32]. In addition, the target of affective computing is to detect emotions during human-computer interaction and synthesize emotional responses. Generally, human emotion analysis could be categories into several subtopics based on features that are applied for human affective recognition. The first category is facial expression and voice [8, 9, 25, 4, 34, 33]. These techniques [6, 7] allow researchers to detect emotions from images or videos which have been recorded. Zheng [15] investigated the multi-view facial images for facial expressions. He also developed a novel group sparse ∗∗ Corresponding

author: e-mail: [email protected] (Weifeng Liu)

reduced-rank regression (GSRRR) model for regression purpose. Some methods like robust technique can still work even if human subjects’ facial images are hidden [26]. Jia et al. [16] built a macro-to-micro transformation model to enhance microexpression features by transfering macro-expression learning to micro-expression. Ben et al. [1] proposed a tensor subspace analysis algorithm based on maximum margin projection for micro-expression recognition. The second kind of approaches focus on body movements or human gestures [19, 2, 20]. It usually utilizes a small set of body movements such as gestures, walking, waving. Karg et al. [19] summarized recognition and generation reports on movements which convey affective expressions. Kleinsmith et al. [2] explored works on affective body recognition. They also used body expressions as input modality for automatic emotion recognition. However, A common movement notation system still needs to be built, which could be used for facilitating the affect-expressive movements and recognizing emotions of the action that a person is doing [2]. Physiological signal based emotion analysis is another kind of approaches for emotion analysis. It allows researchers to

ACCEPTED MANUSCRIPT 2

$FWLYH

1HJDWLYH

([FLWHG

$IUDLG 6DG

+DSS\ 3RVLWLYH &RQWHQW

'HSUHVVHG

• We develop a reinforcement online learning method. • The proposed method exploits model performance during each iteration. • The performance of proposed method is investigated on artificial dataset and physiological signal based dataset (DEAP).

&DOP

The rest of this paper is organized as follows. Section 2 presents two related works including online learning and reinforcement learning. Section 3 describes the proposed method and details of procedure. Section 4 is devoted to the experimental settings and explanations of experimental results and is followed by the conclusion in Section 5.

3DVVLYH Fig. 1. Valence-arousal dimensional model

2. Related Work

AC

CE

PT

ED

M

AN US

directly assess the inner state of a user, which makes it a crucial element in machine’s interaction with human. An emotional model can be characterized by two dimensions (valence, arousal). The valence ranges from negative to positive, while arousal ranges from passive to active. Fig. 1 illustrates valencearousal dimensional model. A variety of emotion recognition methods and system based on physiological signals especially on EEG have been studied [22, 36, 17, 37]. Zheng et al. [22] systematically compared different classification methods for emotions recognition of EEG. Jenke et al. [14] completely review a set of feature extraction methods for EEG signal analysis. The EEG based emotion recognition do not generally agree upon which features are most appropriate, and only a few works exist, which compared different features with each other. Soleymani et al. [31] detect emotions continuously from EEG signals and facial expressions. The utilized dataset consists viewer’s responses to a set of emotional videos. Although many methods have been developed for EEGbased emotion recognition, it is essential to select an appropriate presentation for varying physiological signals. This challenge is well-suitable for online learning, which is a wellestablished learning paradigm to make a sequence of predictions given knowledge of previous prediction tasks [29]. Furthermore, online learning is particularly important in situations where it is computationally infeasible to train over the entire dataset. Recently online learning has been successfully applied to sequential emotion prediction and obtained promising performance, for it updates the predictor for future data at each step, as opposed to batch learning methods which generate the best predictor on the entire training data set at once. However, the exploration and exploitation of the rewards for the reward of current operation in each iteration is still an open question. In this paper, we exploit the reward as a critic element to measure the performance of predictor after each iteration. Under the condition of supervised learning, the reward is defined as feedback, which is associated with the prediction result if the agent outputs a token related to the state it is in. The proposed method draws inspiration from reinforcement

CR IP T

$QJU\

learning. However, in comparison with tradition reinforcement learning, the state in our case is fixed and is characterized as feature space. The main contributions of this papers are as follows:

2.1. Reinforcement Learning For a standard reinforcement learning setting [18, 23, 27], an agent interacts with an environment over a number of discrete time steps. According to the state st that the agent receives, it selects an action at from an action set A by using its policy π, in which π denotes a mapping from state st to action at . After that, the agent gets the next state st+1 and a scalar reward rt . This process continues until the agent reaches a terminal state, after which the process will restart. In particular, the return P k Rt = ∞ k=0 γ rt+k is the total accumulated return from time step t and γ ∈ (0, 1] is a constant that determines the relatives value of delayed versus immediate rewards. The goal of the agent is to maximize the expected return from each state st . The action value Qπ (s, a) = E[Rt st = s, a] is the expected return for selecting action a in state s following policy π. The optimal value function Q∗ = maxπ Qπ (s, a) gives the maximum action value for state s and action a achievable by any policy. Similarly, the value of state s under policy π is defined as V π = E[Rt st = s] and is simplely the expected return under policy π from state s.

2.2. Online Learning Online learning methods observe instances in a sequential manner, dealing with sequence learning problems and fitting new data constantly. Many online learning algorithms [3, 28, 13, 12] and their variants have been proposed becasuse of the advantange that they deal with stream data automatically, accumulating experience over time. They could also use such knowledge to facilitate future learning and decisionmaking processes. Online Passive-Aggressive algorithm [3] sets the new weight vector wt+1 to be the solution to the following constrained optimization problem 1 wt+1 = arg min kw − wt k2 , w∈Rd 2

s.t.

L(w; (xt , yt )) = 0

ACCEPTED MANUSCRIPT 3 If the loss is defined by the following hinge-loss function    y(w · x) ≥ 1 0 L(w, (x, y)) =   1 − y(w · x) otherwise

Table 1. List of important notations

wt+1 = wt + τt yt xt ,

τt =

Lt kxt k2

Shai and Anbuj [28] described two stochastic methods for l1 regularization problem min

w∈Rd

T 1X L(h2, xt i, yt ) + λkwk1 T t=1

min R(w) = min

w∈R2d

w∈R2d

T 2d X 1X L(hw, xˆt i, yt ) + λ wt T t=1 t=1

AN US

which is equal to

Description Loss function The data distribution set Weight vector at time t Number of examples Minibatch size Gradient of L with respecte to w Gradient The set of real numbers with d dimensions Learning rate The estimate of Y Threshold State at time t Policy State value under policy π Action value Total accumulated return from t Discount rate Immediate return at t

CR IP T

Then, the resulting algorithm is passive whenever the hingeloss is zero, that is, wt+1 = wt . The solution to the optimization problem has a simple closed form solution

Notation L(x, y, w) pdata wt T T0 Ow L g Rd η Yˆ σ st π Vπ Qπ (s, a) Rt γ rt

where xˆt = [xt ; −xt ]. Then, they used the derivative of the R(w) w.r.t the jth element of w, gt = (OR(w)) j , to perform the update while maintain the constraint w j ≥ 0.

using examples from the minibatch B. The stochastic gradient descent algorithm then follows the estimated gradient downhill

3. Method Overview

The method we propose is able to work for general optimization solutions, but here we describe problem in terms of a classification problem. Given data of T labelled examples (x1 , y1 ), . . . , (xT , yT ), where xt ∈ Rd , and yt ∈ {−1, 1}. We assume that for all t, the norm kxt k ≤ 1. In a linear classification problem, the goal is to find a hyperplane through the origin that largely separates the data labelled 1 from those labelled −1. The most popular method of training such a linear classifier based on labelled data is by solving a regularized convex optimization problem

T 1X L(xt , yt , w) T t=1

(1)

PT

J(w) = E x,y∼pdata L(x, y, w) =

ED

M

The cost function in machine learning can be decomposed as a sum over training examples of per-example function. For instance, the negative conditional log-likelihood of the training data can be written as

CE

where L is the per-example loss L(x, y, θ) = − log p(y|x; w). For these additive cost functions, gradient descent requires computing T 1X Ow J(w) = Ow L(xt , yt , w) (2) T t=1

AC

For convenience, Table 1 briefly describes the important notations used in this paper. The insight of stochastic gradient is that the gradient is an expection. This expection can be estimated by using a small set of samples. In particular, on each step of algorithm, we can sample a minibatch of examples drawn uniformly from the training set. The minibatch size T 0 is typically chosen to be a relatively small number of examples, ranging from 1 to a few hundred. Crucially, T 0 remains unchanged as the training set size T grows. When T 0 = 1, we obtain online learing model. The estimate of the gradient is formed as T0

g=

X 1 Ow L(xt , yt , w) 0 T t=1

(3)

w ← w − ηg

T

λ 1X w∗ = arg min kwk2 + L(xt , yt , w) n t=1 w∈Rd 2

(4)

wt+1 = wt − ηt (λwt + Ow L(xt , yt , w))

(5)

Here w is the normal vector to the hyperplane separator, and L is convex loss function. Popular choices for L in the machine learning literature are the least square loss L(x, y, w) = 21 kwT x − yk2 , which leads to Least Square regression, and the hinge loss L(x, y, w) = max(0, 1 − ywt x), which leads to Support Vector Machines (SVMs). We intend to solve the regularized points by using stochastic gradient descent (SGD), and at step t, update the iterate as

Here ηt is a learning rate, and the (sub)gradient OL(xt , yt , w) is computed on the basis of a single example (xt , yt ). In the problem of learning to predict a sequence, Y = (y1 , . . . , yT ) is ground-truth, given an input X = (x1 , . . . , xT ). Yˆ = (ˆy1 , . . . , yˆ T ) is the estimate of Y. The dataset is split into input-output pairs (X, Y) with both training set and testing set

ACCEPTED MANUSCRIPT 4 accessible. The trained predictor h is evaluated by computing ˆ on the testing set, the average task-specific score R = (Y, Y) where Yˆ = h(X) is the prediction result. Given a sequence data, we need to build a model that can clearly illustrate RL states, perform action at each state, and provides feedback for each action. Under supervised condition, we define each example in the training set as a state, especially in the time serial process, all features of the sample at time t contribute to make the state. st = xt

Nz Fp1 AF7 AF3 F9

Fp2

AFz

AF4 AF8 F10

F7 F5 F8 F3 F1 Fz F2 F4 F6

FT10 FT9 FT7 FC5 FC3 FC1 FCz FC2 FC4 FC6 FT8 A1 T9

T7

C5

C3

C1

Cz

C2

C4

C6

T8 T10 A2

CP5 CP3 CP1 CPz CP2 CP4 CP6 TP8 TP10 TP9 TP7 P9

P7

P5 P3 P1 Pz P2 P4 P6 P8 PO7

PO3

PO4

Oz

O2

P10

PO8

CR IP T

O1

POz

Iz

Fig. 3. Electrode placement of 10-20 system [30]

(28 cores), 128GB RAM and 64bit Ubuntu 14.04 system. We conduct extensive experiments on an artificial dataset and the DEAP dataset [21]. DEAP dataset [30] is a multimodal dataset that was created for the research of human affective states. There are 32 participants whose electroencephalogram (EEG) and peripheral physiological signals were recorded as each watched 40 oneminute highlight music videos. Those participants performed a self-assessment in terms of their levels of arousal, valance, like/dislike, dominance and familiarity. In DEAP dataset, 32 EEG channels (Fp1, AF3, F3, F7, FC5, FC1, C3, T7, CP5, CP1, P3, P7, PO3, O1, Oz, Pz, Fp2, AF4, Fz, F4, F8, FC6, FC2, Cz, C4, T8, CP6, CP2, P4, P8, PO4, O2) and 8 peripheral physiological signal channels (hEOG, vEOG, zEOG, tEMG, GSR, Respiration belt, Plethysmograph, Temperature) was utilized for data recording, which gave a more comprehensive understanding of physiological activity. The electrode placement of DEAP also follows 10-20 system [30] shown in Fig. 3, where the highlighted electrodes was used during DEAP dataset recording. In this paper, we use the DEAP dataset after preprocessing. The sample rate of the preprocessed date is down sampled from original 512Hz to 128Hz, and the artifacts were removed. Finally, we got an a 3-dimension array with a shape of (40 × 40 × 8064) for each participant. Each sequence has 8064 examples. In DEAP experiments, we apply the proposed method to one channel each time to evaluate the performance of different channels. When predicting the value of that channel at time t, other values of channels at t are treated as features. After each prediction, we record the learning time and prediction results. The experiments results are evaluated using learning time and mean square error, and the mean square error (MSE) is defined as follows

AN US

At a state, the agent chooses an action to perform according to current state. Then the agent gives a prediction using st , which is yˆ t . Also, because we already have ground-truth as the part of environment, the environment compares yˆ t with yt and give return reward as feedback. The reward can be expressed in various forms. For example, we defined the Euclidean distance as a reward metric, and after giving a threshold, the reward function is as follows    1, kˆyt − yt k2 < σ (6) r=  0, otherwise

Fpz

ED

M

where is the threshold determining the value of rewards. As is shown in Fig. 2, if the corresponding reward of current state st is 1, the prediction value can be used to present its true value. Then the model will remain same for the prediction of next state st+1 , until the agent gets zero as a reward. If the reward is zero, the policy needs to be modified to adopt new condition, so that the current state st will be used to train model for performance improvement. With the learning process continuing, there is an increasing possibility that agent get 1 during each iteration. Algorithm 1 describes details of our reinforcement online learning (ROL) method for prediction.

AC

CE

PT

Algorithm 1 ROL Method for Sequence Prediction 1: Input: A set of sequential data X(x1 , x2 , . . . , xT ), ground truth Y(y1 , y2 , . . . , yT ) 2: Output Prediction model under ROL conditions 3: Initializations policy π, prediction model h, reward function r 4: for each step in sequence do 5: a ← action given by π for st 6: Perform action a, y = h(st ) 7: Get reward r, and reach next state st+1 8: if r = 0 then Online training using state st 9: end if 10: st = st+1 11: end for

4. Experiments To evaluate the effectiveness of the proposed ROL method, we apply a linear SVR regressor [10] and a least square regressor (LS) [15, 35] for physiological signal based emotion state prediction, on a workstation with Intel Xeon E5-2670 2.6GHz

n

MS E =

1X (yt − yˆ t )2 n t=1

where yt is the ground truth, yˆ t denotes predicted value. The algorithm performs better when MSE approach 0 and vice versa.

ACCEPTED MANUSCRIPT 5

New data, undetermined

Input data

Refit

CR IP T

Fit

New data, determined

No change

M

AN US

Fig. 2. Framework of reinforcement online learning. First, we learn a model to fit the input data using online models such as SVR and LS. Then, when new example is added, it may be classified correctly. If so, the model will maintain until uncertainty about the label. After that, the example will be added to the traing set to improve performance.

(a) Time

ED

(b) MSE

PT

Fig. 4. Experimental results in terms of time, MSE for LS, LSRL, SVR, SVRRL on artificial dataset

AC

CE

We compared the selected model under ROL method (LSRL, SVRRL) with traditional methods (LS, SVR). The artificial dataset we created has a number of 10k examples with 100 features. The results on artificial dataset are shown in Fig. 4. The learning time of models (LSRL, SVRRL) under condition of ROL is less than those traditional models (LS, SVR). With learning process continuing, the advantage on time becomes larger. However, comparing to common methods, it is still able to achieve equivalent performance. For DEAP dataset, Fig. 5 and Fig. 6 illustrate learning time and MSE of selected sequences on electrode channels in the area of angry and afraid (Fig.1). Each subfigure corresponds to one channel. The x-coordinate is the number of iteration. From Fig. 5, we can see that LSRL and SVRRL significantly reduce learning time, comparing with LS and SVR. The proposed methods learn much faster than two standard prediction methods (Fig. 5). This is because RL agents neglect some instance during learning process. Also, only those instance that can be predicted accurately were ignored, which means that they contribute much less for the model improvements. As a

result, LSRL and SVRRL could also be able to performance as well as traditional methods (LS, SVR), as is shown in Fig. 6. Boxplots in Fig. 7 and Fig. 8 further demonstrate the time and MSE on selected videos. Each subfigure corresponds to one video. The x-coordinate are models used in experiments. From these two figures, we can see that learning under ROL system (LSRL, SVRRL) uses less time but achieves equivalent effectiveness, comparing to traditional methods (LS, SVR). 5. Conclusion Physiological signals based emotion recognition has attracted intensive attentions, and many methods for emotion recognition has achieved promising performance. The proposed model can be more effective with sequential data, and the performance improves by utilizing more informative instances. Thus, it can be advantageous on learning time and could perform as well as standard prediction models with less time. Experiments on artificial dataset and DEAP dataset confirm the effectiveness of the proposed method. 6. Acknowledgments This study was supported by the National Natural Science Foundation of China under Grants 61671480, 61572486; the Fundamental Research Funds for the Central Universities, China University of Petroleum (East China), under Grants 14CX02203A. References [1] Ben, X., Zhang, P., Yan, R., Yang, M., Ge, G., 2016. Gait recognition and micro-expression recognition based on maximum margin projection with tensor representation. Neural Computing and Applications 27, 2629– 2646.

ACCEPTED MANUSCRIPT

AC

CE

PT

ED

M

AN US

CR IP T

6

Fig. 5. Experimental results in terms of learning time for LS, LSRL, SVR, SVRRL on DEAP

ACCEPTED MANUSCRIPT

AC

CE

PT

ED

M

AN US

CR IP T

7

Fig. 6. Experimental results in terms of MSE for LS, LSRL, SVR, SVRRL on DEAP

ACCEPTED MANUSCRIPT

AC

CE

PT

ED

M

AN US

CR IP T

8

Fig. 7. Boxplot in terms of Time for Different Videos on DEAP

ACCEPTED MANUSCRIPT

AC

CE

PT

ED

M

AN US

CR IP T

9

Fig. 8. Boxplot in terms of MSE for Different Videos on DEAP

ACCEPTED MANUSCRIPT 10

CR IP T

nition Letters 66, 22–30. [27] Sang, Q., Lin, Z., Acton, S.T., 2016. Learning automata for image segmentation. Pattern Recognition Letters 74, 46–52. [28] Shalev-Shwartz, S., Tewari, A., 2011. Stochastic methods for l1regularized loss minimization. Journal of Machine Learning Research 12, 1865–1892. [29] Shalev-Shwartz, S., et al., 2012. Online learning and online convex optiR in Machine Learning 4, 107–194. mization. Foundations and Trends [30] Sharbrough, F., Chatrian, G., Lesser, R., L¨uders, H., Nuwer, M., Picton, T., 1991. American electroencephalographic society guidelines for standard electrode position nomenclature. J. clin. Neurophysiol 8, 200–202. [31] Soleymani, M., Asghari-Esfeden, S., Fu, Y., Pantic, M., 2016. Analysis of eeg signals and facial expressions for continuous emotion detection. IEEE Transactions on Affective Computing 7, 17–28. [32] Weisgerber, A., Vermeulen, N., Peretz, I., Samson, S., Philippot, P., Maurage, P., Catherine De Graeuwe, D., De Jaegere, A., Delatte, B., Gillain, B., et al., 2015. Facial, vocal and musical emotion recognition is altered in paranoid schizophrenic patients. Psychiatry research 229, 188–193. [33] Zhang, P., Ben, X., Yan, R., Wu, C., Guo, C., 2016. Micro-expression recognition system. Optik-International Journal for Light and Electron Optics 127, 1395–1400. [34] Zheng, W., 2014. Multi-view facial expression recognition based on group sparse reduced-rank regression. IEEE Transactions on Affective Computing 5, 71–85. [35] Zheng, W., Xin, M., Wang, X., Wang, B., 2014. A novel speech emotion recognition method via incomplete sparse least square regression. IEEE Signal Processing Letters 21, 569–572. [36] Zheng, W.L., Santana, R., Lu, B.L., 2015. Comparison of classification methods for eeg-based emotion recognition, in: World Congress on Medical Physics and Biomedical Engineering, June 7-12, 2015, Toronto, Canada, Springer. pp. 1184–1187. [37] Zhou, F., Qu, X., Jiao, J.R., Helander, M.G., 2014. Emotion prediction from physiological signals: A comparison study between visual and auditory elicitors. Interacting with computers 26, 285–302.

AC

CE

PT

ED

M

AN US

[2] Cai, X., Wang, C., Xiao, B., Chen, X., Zhou, J., et al., 2013. Regularized latent least square regression for cross pose face recognition., in: IJCAI. [3] Crammer, K., Dekel, O., Keshet, J., Shalev-Shwartz, S., Singer, Y., 2006. Online passive-aggressive algorithms. Journal of Machine Learning Research 7, 551–585. [4] Dahl, A., Sherlock, B.R., Campos, J.J., Theunissen, F.E., 2014. Mothers tone of voice depends on the nature of infants transgressions. Emotion 14, 651. [5] Dhall, A., Goecke, R., Joshi, J., Wagner, M., Gedeon, T., 2013. Emotion recognition in the wild challenge 2013, in: Proceedings of the 15th ACM on International conference on multimodal interaction, ACM. pp. 509– 516. [6] Ding, C., Choi, J., Tao, D., Davis, L.S., 2016. Multi-directional multilevel dual-cross patterns for robust face recognition. IEEE transactions on pattern analysis and machine intelligence 38, 518–531. [7] Ding, C., Tao, D., 2015. Robust face recognition via multimodal deep face representation. IEEE Transactions on Multimedia 17, 2049–2058. [8] Ding, C., Tao, D., 2016. A comprehensive survey on pose-invariant face recognition. ACM Transactions on intelligent systems and technology (TIST) 7, 37. [9] Ding, C., Xu, C., Tao, D., 2015. Multi-task pose-invariant face recognition. IEEE Transactions on Image Processing 24, 980–993. [10] Drucker, H., Burges, C.J., Kaufman, L., Smola, A., Vapnik, V., et al., 1997. Support vector regression machines. Advances in neural information processing systems 9, 155–161. [11] Fragopanagos, N., Taylor, J.G., 2005. Emotion recognition in human– computer interaction. Neural Networks 18, 389–405. [12] Hall, E.C., Willett, R.M., 2015. Online convex optimization in dynamic environments. IEEE Journal of Selected Topics in Signal Processing 9, 647–662. [13] Hazan, E., et al., 2016. Introduction to online convex optimization. FounR in Optimization 2, 157–325. dations and Trends [14] Jenke, R., Peer, A., Buss, M., 2014. Feature extraction and selection for emotion recognition from eeg. IEEE Transactions on Affective Computing 5, 327–339. [15] Ji, Y., Idrissi, K., 2012. Automatic facial expression recognition based on spatiotemporal descriptors. Pattern Recognition Letters 33, 1373–1380. [16] Jia, X., Ben, X., Yuan, H., Kpalma, K., Meng, W., 2017. Macro-tomicro transformation model for micro-expression recognition. Journal of Computational Science . [17] Jirayucharoensak, S., Pan-Ngum, S., Israsena, P., 2014. Eeg-based emotion recognition using deep learning network with principal component based covariate shift adaptation. The Scientific World Journal 2014. [18] Kaelbling, L.P., Littman, M.L., Moore, A.W., 1996. Reinforcement learning: A survey. Journal of artificial intelligence research 4, 237–285. [19] Karg, M., Samadani, A.A., Gorbet, R., K¨uhnlenz, K., Hoey, J., Kuli´c, D., 2013. Body movements for affective expression: A survey of automatic recognition and generation. IEEE Transactions on Affective Computing 4, 341–359. [20] Kleinsmith, A., Bianchi-Berthouze, N., 2013. Affective body expression perception and recognition: A survey. IEEE Transactions on Affective Computing 4, 15–33. [21] Koelstra, S., Muhl, C., Soleymani, M., Lee, J.S., Yazdani, A., Ebrahimi, T., Pun, T., Nijholt, A., Patras, I., 2012. Deap: A database for emotion analysis; using physiological signals. IEEE Transactions on Affective Computing 3, 18–31. [22] Miranda, L., Vieira, T., Mart´ınez, D., Lewiner, T., Vieira, A.W., Campos, M.F., 2014. Online gesture recognition from pose kernel learning and decision forests. Pattern Recognition Letters 39, 65–73. [23] Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A.A., Veness, J., Bellemare, M.G., Graves, A., Riedmiller, M., Fidjeland, A.K., Ostrovski, G., et al., 2015. Human-level control through deep reinforcement learning. Nature 518, 529–533. [24] Picard, R.W., Picard, R., 1997. Affective computing. volume 252. MIT press Cambridge. [25] Rahulamathavan, Y., Phan, R.C.W., Chambers, J.A., Parish, D.J., 2013. Facial expression recognition in the encrypted domain based on local fisher discriminant analysis. IEEE Transactions on Affective Computing 4, 83–92. [26] Ringeval, F., Eyben, F., Kroupi, E., Yuce, A., Thiran, J.P., Ebrahimi, T., Lalanne, D., Schuller, B., 2015. Prediction of asynchronous dimensional emotion ratings from audiovisual and physiological data. Pattern Recog-