On stochastic parameter estimation using data assimilation

On stochastic parameter estimation using data assimilation

Physica D 230 (2007) 88–98 www.elsevier.com/locate/physd On stochastic parameter estimation using data assimilation James A. Hansen a,∗ , C´ecile Pen...

821KB Sizes 3 Downloads 95 Views

Physica D 230 (2007) 88–98 www.elsevier.com/locate/physd

On stochastic parameter estimation using data assimilation James A. Hansen a,∗ , C´ecile Penland b a Naval Research Laboratory, Marine Meteorology Division, 7 Grace Hopper Ave, Monterey, CA 93943, United States b NOAA/ESRL/Physical Sciences Division, 325 Broadway, Boulder, CO 80305, United States

Available online 26 December 2006

Abstract Data assimilation-based parameter estimation can be used to deterministically tune forecast models. This work demonstrates that it can also be used to provide parameter distributions for use by stochastic parameterization schemes. While parameter estimation is (theoretically) straightforward to perform, it is not clear how one should physically interpret the parameter values obtained. Structural model inadequacy implies that one should not search for a deterministic “best” set of parameter values, but rather allow the parameter values to change as a function of state; different parameter values will be needed to compensate for the state-dependent variations of realistic model inadequacy. Over time, a distribution of parameter values will be generated and this distribution can be sampled during forecasts. The current work addresses the ability of ensemblebased parameter estimation techniques utilizing a deterministic model to estimate the moments of stochastic parameters. It is shown that when the system of interest is stochastic the expected variability of a stochastic parameter is biased when a deterministic model is employed for parameter estimation. However, this bias is ameliorated through application of the Central Limit Theorem, and good estimates of both the first and second moments of the stochastic parameter can be obtained. It is also shown that the biased variability information can be utilized to construct a hybrid stochastic/deterministic integration scheme that is able to accurately approximate the evolution of the true stochastic system. Published by Elsevier B.V. Keywords: Data assimilation; Parameter estimation; Stochastic differential equations; Central limit theorem

1. Introduction There is great potential for the use of data assimilationbased parameter estimation in the atmosphere/ocean sciences. In addition to the obvious benefits for model tuning, we will show that the results of data assimilation-based parameter estimation can be used to help account for model inadequacy during ensemble forecasting. Because models are likely to be structurally rather than parametrically inadequate (model equations have a different functional form from the true system equations), we should not expect parameter estimation approaches to converge to a single deterministic set of parameter values. Rather, the parameter values should be expected to change as a function of time (and space) in order to offset the state-dependent impact of structural inadequacy. The ultimate product of a properly performed data assimilationbased parameter estimation procedure will be a distribution ∗ Corresponding author. Tel.: +1 831 6564741; fax: +1 831 6564769.

E-mail addresses: [email protected] (J.A. Hansen), [email protected] (C. Penland). 0167-2789/$ - see front matter Published by Elsevier B.V. doi:10.1016/j.physd.2006.11.006

of parameter values that can be used as a basis for stochastic parameterizations in ensemble forecasting. Data assimilation-based parameter estimation is conceptually sound, but there are several issues that must be confronted before proceeding. First, there will be issues regarding the forward integration scheme used by the forecast model; adding random numbers to traditional deterministic forward integrations schemes does not convert a deterministic ordinary differential equation (DODE) to a stochastic differential equation (SDE). One must take extreme care when working with SDEs. A particular challenge of SDE integration is that both the integrand and the variable of integration “wiggle” over any finite period. The practical consequence for the numerical integration of SDEs is the existence of an infinite number of solutions to the integral dependent upon the chosen evaluation points of the integrand. Strictly, each of these solutions corresponds to a different calculus, and while a theoretically infinite number of calculii exist, only two are commonly used: Ito and Stratonovich. Stratonovich calculus follows the standard Riemannian rules of integration; the Ito integral is somewhat different [16]. If the

J.A. Hansen, C. Penland / Physica D 230 (2007) 88–98

integral is approximated by setting the value of the integrand at the beginning of the intervals spanning the integration limits, the result is said to be stochastic in the sense of Ito. If the integral is approximated by setting the value of the integrand in the middle of the intervals spanning the integration limits, the result is said to be stochastic in the sense of Stratonovich. Horsthemke and Lefever [12] physically distinguish the two calculii by arguing that Ito calculus approximates a discrete noise process using continuous time, and that Stratonovich calculus approximates a correlated continuous noise process by assuming the de-correlation time is short relative to the temporal resolution of the numerical model. The relevance of this discussion to the current work is that stochasticity in continuous geophysical systems is typically assumed to be Stratonovich, while the appropriate calculus (and consequent forecast integration scheme) for the stochasticity associated with a trajectory of data assimilation-provided analyses depends on how the data assimilation scheme is employed. In data assimilation the noise (analysis increment) is applied discretely every time observations are assimilated. If observations were assimilated at every model timestep, the trajectory of analyses would be stochastic in the sense of Ito. In general, however, this is not the case and standard rules of numerical integration are appropriate over the time that the parameter is held constant; i.e., the trajectory of analyses is best described as piecewise deterministic. The ability of deterministic models employing data assimilation-based parameter estimation to uncover the correct form of a stochastic system is fundamentally hampered by the inconsistency of treating parameters as random variables in the data assimilation scheme and as constant values in the integration schemes. In addition, the calculus and timescales utilized by the stochastic variables of an SDE are at odds with the calculus and timescales implied by the state estimation procedure. The inconsistencies of integration scheme, timescales, and stochastic calculus compound, making the proper interpretation of parameter estimation results under the impact of structural model inadequacy difficult. The aims of this work are to (1) explore the ability of deterministic models combined with the data assimilation schemes traditionally utilized by the atmospheric science community to correctly uncover the distribution of a stochastic parameter given the complications outlined above, and to (2) explore the ways in which the results of stochastic parameter estimation can be used to improve ensemble forecast systems without recasting our forecast models as SDEs. To obtain unambiguous results, an experimental design is adopted that specifies a system (truth) that is stochastic and a model that is deterministic. Data assimilation-based parameter estimation is performed in an effort to uncover the correct form of the system’s stochasticity using observations drawn from the integrated SDE (truth). This is done using an interpretation of the Central Limit Theorem (CLT: Appendix; [14,21,23]) in a manner opposite to traditional approaches. As described in standard textbooks (e.g., [12,7]), one typically uses the CLT to approximate the impact of a finitely correlated noise process by using a white noise process. In our investigations we need


to do the opposite; we want to uncover the finitely correlated noise process when we are given a white noise process by a data assimilation-based parameter estimation scheme. In order to estimate the white noise variability associated with the true system, we shall apply the CLT to our data assimilation products. These products obviously have a finite correlation time (i.e., the observation interval), but they can still be treated as white noise on timescales large compared with that correlation time, albeit with modified amplitude. It will be shown that this procedure is capable of recovering the correct stochastic information; reasonable estimates of the two moments of the stochastic parameter value(s) can be obtained (at least in the systems considered), but the accuracy of the results are sensitive to quantities such as the level of stochasticity in the system, the size of the observational error, the period between subsequent observations, and the quality of the estimates of state and parametric uncertainty. There are ranges of these quantities where the method fails, and inaccurate estimates of the uncertainty associated with these quantities will also render the procedure inaccurate. The authors appreciate that geophysical modelers are unlikely to recast their governing equations as SDEs, and so we introduce a hybrid stochastic/deterministic integration scheme that utilizes the parameter estimation results to produce forecast statistics that are nearly identical to those produced by a proper SDE integration scheme. These issues will be illustrated using the low-order chaotic system of Lorenz [18], hereafter L63. Deterministic and stochastic versions of the equations will be outlined in Section 2. The relevant integration schemes will be briefly described in Section 3, followed by a description of the data assimilation schemes employed for parameter estimation in Section 4. The experimental design is described in Section 5, followed by results in Section 6. A brief statement of conclusions is given in Section 7. 2. Deterministic and stochastic versions of L63 The issues associated with estimating stochastic parameter values using data assimilation techniques will be elucidated using the L63 system. The deterministic L63 equations are written as: x˙ = −a(x − y) y˙ = r x − x z − y z˙ = x y − bz


where typical chaotic parameter values are a = 10, b = 8/3, and r = 28. We use these parameter values for the results reported in this work, but also considered a combination of parameters that produce a limit cycle solution (a = 10, b = 1/4, r = 30) and obtained qualitatively similar results. Following Hansen and Penland [10], we consider a stochastic version of L63: dx = −a0 (x − y)dt − as (x − y) ◦ dWmx + ηx ◦ dWax dy = (r0 x − x z − y)dt + rs x ◦ dWmy + η y ◦ dWay dz = (x y − b0 z)dt − bs z ◦ dWmz + ηz ◦ dWaz .



J.A. Hansen, C. Penland / Physica D 230 (2007) 88–98

In Eq. (1.2) the deterministic parameters in Eq. (1.1) are augmented by stochastic variables (multiplicative noise), and additive noise is added to each equation. Eq. (1.2) can be written more compactly as: dx = F(x, t)dt + G(x, t) ◦ dWm + B ◦ dWa


where dWm is a vector of independent Wiener processes for the multiplicative noise, dWa is a vector of independent Wiener processes for the additive noise, and the “◦” symbol indicates Eq. (1.3) are Stratonovich SDEs. For simplicity and ease of interpretation, this work is restricted to a single multiplicative stochastic variable. The parameters rs , bs , ηx , η y , and ηz in Eq. (1.2) are all set to zero. Only as is non-zero for the experiments below. Other situations were explored and produced qualitatively similar results. 3. Stochastic integration scheme To have a full appreciation of the implications of data assimilation-based parameter estimation it is necessary to understand the ways in which the SDEs are integrated in time. In this work two different integration schemes are employed: one deterministic and one stochastic. The deterministic scheme is the standard fourth-order Runge–Kutta (RK4) method, Press et al. [20], and the authors assume the reader is familiar with this scheme. The stochastic scheme used is a Stochastic RK4 scheme [10]. The Stochastic RK4 integration scheme is nothing more than appropriately scaling the stochastic terms of an SDE and integrating as with a deterministic RK4 scheme. In brief, given a standard SDE expression like dx = Fdt + G ◦ dW.


4.1. The ensemble Kalman filter The ensemble Kalman filter equations can be written as follows: f

xi (t) = 8(xia (t − τobs )) 1 f f (X f − X )(X f − X )T Pf = Nens − 1 K = P f HT (HP f HT + R)−1 xia (t) = Pa =

Stochastic RK4 recasts it as dW dx =F+G◦ dt dt

in geophysics to provide initial conditions for a model’s prognostic variables [17,5,8], one can augment the model’s prognostic variables with parameters during the data assimilation process to produce an improved estimate of each [1]. In this work, we employ a Kalman filter approach [13,24]. By its nature, data assimilation requires accurate estimates of uncertainty both for observations and for short-term model forecasts. The uncertainty associated with short-term model forecasts will be state dependent. We will account for this state dependent uncertainty by taking an ensemble approach to data assimilation. The sample covariance of short term ensemble forecasts is used to define the uncertainty associated with the model state, and then each of the short term forecasts making up the ensemble are corrected by the available observations. The ensemble Kalman filter [6] and its variants [1,25,3] are examples of this approach. It has also been applied to 4dVar by Hansen [9]. The ensemble Kalman filter is summarized below.


i and recognizes that when dW dt is discretized in time it has 1 1 statistics of N (0, ∆t ), where ∆t is the integration time step. Integration proceeds by forcing a deterministic RK4 scheme with random numbers drawn from N (0, ∆1t ). This approach is a special case of the general stochastic Runge–Kutta methods described by Ruemelin [22]. Hansen and Penland [10] have shown that Stochastic RK4 produces solutions that are almost indistinguishable from those produced by more sophisticated SDE integrators for several simple systems tested, including L63. The benefits of the scheme are that it is simple to implement and is computationally efficient.

4. Data assimilation We employ data assimilation for the purpose of parameter estimation. Data assimilation is the blending of two independent estimates of the state of a system and their associated uncertainty to produce a superior state estimate. Traditionally used 1 The N (a, b) notation indicates a normal distribution with mean a and variance b.

f xi (t) + K(yi

1 Nens − 1


f − Hxi ) a


(Xa − X )(Xa − X )T f

for i = 1, Nens . The xi is the first guess forecast for ensemble member i, obtained by propagating the ith analysis ensemble member forward over the observation interval, τobs , with the full nonlinear model. The sample covariance is given by P f , where X f is an n × Nens matrix where each column is a f forecast ensemble member. The X is an n × Nens matrix where each column is the mean of the forecast ensemble. The gain term, K, can be thought of as the ratio between the first guess uncertainty and the sum of the first guess and observational uncertainty. This weighting is applied to the difference between the observation and the first guess to provide a correction term that is added to the first guess to produce the new analysis. The quantity yi is a so-called perturbed observation. The perturbed √ observations are defined as yi = y + Rη, where R is the observation covariance matrix and where η consists of elements drawn from a Gaussian distribution with a mean of zero and a variance of one. The perturbed observations are necessary for obtaining the correct posterior statistics [4]. The H operator maps from model space, x, to observation space, y, allowing differences between the two to be calculated. The extension to parameter estimation is as follows: The state vector of prognostic variables is augmented with a vector of unknown parameter values, x = [x p : α]T , and an evolution

J.A. Hansen, C. Penland / Physica D 230 (2007) 88–98

equation is provided for the parameter values. In this work, the 2 evolution equation is set to dα dt = 0 (unless otherwise stated). The inclusion of parametric temporal correlation information is more difficult in EnKF than, for example, fourdimensional variational assimilation (4d-Var). While such correlations could be included in the EnKF through a parameter evolution equation(s), the de-correlation parameter in the evolution equation(s) would have to be assimilated as well. An alternative would be to move from the filter framework to the smoother framework, but such an approach is essentially identical to ensemble 4d-Var. Ensemble 4d-Var was utilized in an experimental design identical to the one described in Section 5 below, and the results were found to be qualitatively identical to those of the EnKF. In the interest of brevity the ensemble 4d-Var framework is not introduced and the results are not presented. Even though the inclusion of parameters in the control vector accounts for some of the forecast uncertainty associated with model inadequacy, sampling errors and model inadequacy not well modeled by the chosen parameters will render the uncertainty described by the ensemble based forecast error covariance matrix, P f , inadequate. Additionally, the trivial parametric evolution equation supports no dynamic change in parameter values and no associated uncertainty growth in parameter space. Data assimilation will systematically decrease the variance in the parameters’ uncertainty estimates, and “parametric” filter divergence may result. In the case of a timeinvariant parameter, filter divergence manifests itself by all ensemble members collapsing to the same parameter value. A simple approach to ameliorate these problems is to artificially boost the uncertainty by a constant factor, β (e.g. P f = βP f ). The value of β is tuned empirically (finding the smallest β that allows us to avoid parametric filter divergence), and is used in the experiments discussed below. Recall that in the experimental design considered we expect the parameter estimates to change as a function of time in order to account for model inadequacy. A crucial part of this work is to attempt to relate the statistics estimated through data assimilation using a deterministic model to the stochastic parameters in the true system. The mechanism for doing this employs the dynamical form of the Central Limit Theorem (e.g., [14,21,23] see also Appendix), which supplies a white noise approximation to a random process with finite correlation time. Hasselmann [11] gives a heuristic argument for the more rigorous early work of Khasminskii; one basically replaces the variance of the random process with its spectral value at zero frequency. This is equivalent to integrating the lagged covariance function over all lags. During data assimilation, parameter values are changed each time observations become available and are then held fixed (in our implementation). If we say that the observations are uncorrelated, then the integral over the correlation function is simply the parameter variance multiplied by the size of the assimilation interval. Thus, the amplitude of the stochastic parameter, as in our case, 2 In the results described in Section 6, α is a scalar consisting of the a parameter in Eq. (1.1).


is estimated as the square root of that product. As can be seen from Eq. (1.2), as multiplies an increment of the Wiener process whose variance is proportional to dt. We expect, therefore, for as in Eq. (1.2) to have units of a0 times the square root of time, consistent with the CLT. Thus, to obtain our estimate of as , we normalize the sample standard deviation of our estimate of the parameter a by the square root of the observation period (the time between subsequent observations, τobs ). It will be shown that this can lead to remarkably accurate estimates of moments of the stochastic parameters in the true system’s SDEs. 5. Experimental design The experiments utilize an SDE for the system (truth, see Eq. (1.2)), and a DODE for the model (see Eq. (1.1)). The experiments have two aims: (1) to identify the extent to which parameter statistics estimated by data assimilation-based approaches to parameter estimation and the DODE represent the true distribution of the stochastic parameter in the SDE, and (2) to develop a hybrid stochastic/dynamic integration scheme that utilizes the parameter distribution information in the ensemble-based parameter estimation results for the purpose of improved ensemble forecasts. In the experiments below the only error is in the a parameter (see Eq. (1.1)). We estimate the value of the a parameter using the DODE (Eq. (1.1)) and the EnKF. The SDE system (truth) has a0 = 10, as = 0.1, dWa = 0, and dWm = [dWmx 0 0] (see Eq. (1.2)). We also used as = 0.5, 0.05, and 0.01 and achieved qualitatively similar results. The integration time step for the SDE is set to 0.00025, while the integration time step for the DODE (Eq. (1.1)) is set to 0.01. The time between observations, τobs , is 0.05, which is roughly 1/10th the error doubling time of the L63 system (as estimated by the leading Lyapunov exponent). The EnKF is cycled for 1000 subsequent observation times. All model components (x, y, and z) are observed at each observation time by taking the system value and adding a perturbation drawn from N (0.0,0.01). In all ensemble experiments the ensemble size is Nens = 250. A large ensemble size enables a cleaner interpretation of results by ensuring accurate estimates of P f . The system equations are integrated using the Stochastic RK4 scheme. The initial ensemble of parameter values for the deterministic model was drawn from N (11, 0.5). 6. Results 6.1. Estimating the moments of the stochastic parameter The ensemble mean estimate of the uncertain parameter as a function of time is plotted in Fig. 1. Estimates of the moments of the true stochastic parameter are achieved by treating the collection of mean estimates as a distribution. It is found that the ensemble-based data assimilation approach to parameter estimation can robustly recover a good estimate of the mean of the stochastic variable for the system and situations considered. Whether or not this is a generic result is yet to be shown. In the case shown in Fig. 1, the mean of the estimated distribution is 10.02, compared with the correct value of a0 = 10. In spite of


J.A. Hansen, C. Penland / Physica D 230 (2007) 88–98

Fig. 1. Parameter estimates using a deterministic model to estimate the parameter distribution used by a system described by an SDE. The correct mean value of the system parameter is a0 = 10 (solid horizontal line), and the correct standard deviation is as (dashed lines).

the success of estimates of the mean, we find that the variability of the a parameter, as , is overestimated if interpreted to be the sample standard deviation of the estimated parameter. In the case shown in Fig. 1, the standard deviation of the distribution is 0.44, compared with the correct value of as = 0.1. However, good estimates of as are obtained if the CLT is employed and the sample standard deviation √ is scaled by the square root of the observation period: 0.44 0.05 = 0.098, compared with the correct value of as = 0.1. The CLT rescaling was tested by considering other observation periods. Fig. 2(a) and Fig. 2(b) plot estimates of a0 and as as a function of τobs , respectively. Ten independent parameter estimation experiments were run for each of 5 different τobs and the distribution of these 10 estimates of the moments of the stochastic parameter are plotted for each τobs . Good estimates are seen for each τobs considered (the τobs used to produce Fig. 1 was 0.05, the middle value in Fig. 2). Results for larger τobs become less accurate as the stochasticity of the system renders the first guess of the data assimilation system less accurate. The parameter estimates produced by the EnKF were sensitive to the boost factor applied to the background error covariance. Boosting is traditionally used as a crude method to account for sampling errors and model inadequacy, and that is how it should be interpreted here. The model inadequacy comes from the difference in dynamics between the L63 system in SDE and DODE form. The results in Figs. 1 and 2(a) and (b) are for a boost factor of 5. As mentioned in Section 4.1, the value of the boost factor is determined empirically by finding the smallest boost factor that avoids parametric filter divergence; the boost factor is systematically increased until it reaches a level where all estimates of the parameter do not collapse to the same value. This factor was determined by an analysis of the autocorrelation of the trajectory of tuned parameter values. For

smaller boost factors, parametric filter divergence manifested itself with parameter estimates that were highly correlated in time. The impact of a larger boost factor is shown in Fig. 2(c) and (d). The experiments are identical to those used to produce Fig. 2(a) and (b), but the boost factor was increased from β = 5 to β = 6. While it is still possible to obtain good estimates of the mean (Fig. 2(c)), estimates of the standard deviation are too large by about 20%, even after application of the CLT (Fig. 2(d)). Experiments carried out using different values of as (0.5, 0.05, 0.01) also indicate the ability of ensemble-based parameter estimation to uncover the correct form of the system’s stochasticity, but different values of boosting, and in some instances different values of observational noise, were required. The utility of the approach is governed by a threeway interplay between the “amount” of stochasticity, the boost factor, and the observational error level. If the stochasticity is small (small as ) and the observational error is large the stochastic “signal” is not discernable over the linearizable time scales required by the data assimilation scheme. If the stochasticity is large (large as ) then the deterministic model will likely be a very poor predictor of forecast error, possibly requiring a large boost factor that would essentially render the data assimilation scheme useless. Nevertheless, we were able to identify boosting and observation error levels that enabled us to uncover the correct form of the system stochasticity for all values of as considered in our experiments. 6.2. Approximating SDE integration with a hybrid stochastic/deterministic approach The authors realize that it is unlikely that any operational forecasting center will recast its governing equations as SDEs

J.A. Hansen, C. Penland / Physica D 230 (2007) 88–98


Fig. 2. Sensitivity of estimates of the moments of the stochastic parameter as a function of observation period, τobs , and EnKF boost factor, β. The left column shows estimates of the mean, a0 , as a function of τobs , the right column shows estimates of the standard deviation, as , as a function of τobs , the top row shows the results for a boost factor of 5, and the bottom row shows results for a boost factor of 6. Ten independent parameter estimation runs were performed for each τobs /β combination. The means of the runs are given as the asterisk, the standard deviation of the runs by the error bars.

and solve them using an integration scheme appropriate for stochastic systems. But in the same way that the CLT was leveraged to take us from white noise estimates provided by the parameter estimation scheme to continuous noise for an SDE, we can utilize the CLT to explain the application of a hybrid stochastic/deterministic integration scheme that directly utilizes the distribution provided by the parameter estimation scheme. The parameter estimation results of Fig. 1 indicate the variability of a perturbation that could be applied at observational time-scales and integrated with a deterministic integration scheme. This point bears repeating: the parametric distribution implied by Fig. 1 can be used in a forecasting framework, but only if it is applied in the same manner as it was produced. For ensemble forecasting, one can continue to use the deterministic integration scheme, draw from the distribution defined by the fit parameter values, and hold the parameter values fixed over the time between observations. This forecast model is a kind of piecewise deterministic system that we denote “hybrid stochastic/deterministic”. This approach is justified by the Stochastic RK4 integration scheme [10]. Stochastic RK4 utilizes a deterministic integration scheme after rescaling the stochastic term by the square root of the time step over which the stochasticity is applied (the integration

time step) and holding the stochasticity constant over that time step. The hybrid stochastic/deterministic approach is similar, but with a stochastic time step defined by the observational period, τobs . The performance of ensemble forecasts that utilize the hybrid stochastic/deterministic integration scheme is compared to four other forecasting approaches. Two of the approaches utilize standard deterministic integration schemes, and two utilize a proper SDE integration scheme. In the first of the two deterministic integration schemes we use RK4 and a constant value for the a parameter. The value is given by the parameter’s first guess at the beginning of the parameter estimation experiments, a = 11. We denote this the “Untuned Deterministic” approach and it should be regarded as a null hypothesis. The second of the two deterministic integration schemes uses RK4 and a constant value for the a parameter, but this time the tuned mean value of a provided by the EnKF, a = 10.02, is used. We denote this the “Tuned Deterministic” approach. The first proper stochastic integration scheme utilizes Stochastic RK4 and the two moments of the parameter distribution provided by the EnKF-based parameter estimation procedure, a0 = 10.02, and as = 0.098 (recall the correct values are a0 = 10 and as = 0.1). We denote this


J.A. Hansen, C. Penland / Physica D 230 (2007) 88–98

Fig. 3. Ensemble mean forecast RMS error statistics normalized by expected observational error as a function of forecast lead time for five approaches to ensemble forecasting. The Untuned Deterministic approach (solid) performs the worst, followed by the Tuned Deterministic (dashed). The SDE (dot-dashed), Hybrid (solid grey), and Perfect (dotted) are essentially indistinguishable, indicating the utility of applying the CLT both to the interpretation of the tuned parametric distribution (SDE) and to an approximate stochastic integration scheme (Hybrid).

the “SDE” approach. The final scheme for comparison utilizes Stochastic RK4 and the correct two moments of the parameter distribution, a0 = 10 and as = 0.1. This is denoted the “Perfect” approach. The hybrid stochastic/deterministic scheme (hereafter the “Hybrid” approach) utilizes deterministic RK4 as its integration scheme, but it allows the a parameter to change in exactly the manner in which it was estimated: a parameter value is drawn from N (a0 , as2 ) = N (10.02, 0.442 ) and is held constant during an integration of length τobs . After a time period of τobs a new draw from N (a0 , as2 ) is obtained and is held constant for the next τobs period, etc. Ensemble mean RMSE forecast results for the five approaches to forecasting are shown in Fig. 3. Plotted is the average over 1000 realizations of the ensemble mean forecast errors (Nens = 500) normalized by the expected observational error as a function of lead time. Ensemble initial conditions for each of the approaches were provided by an independent EnKF data assimilation where truth was defined by the SDE of Eq. (1.2) and the same observations were used by each of the independent filters. The Untuned Deterministic approach fared the worst, as shown by the solid black line. This is in part because the Untuned Deterministic approach started with the largest initial errors (very large boost factors were necessary to keep this filter from diverging), but the slope of the error growth curve is such that even if the initial error were brought down to the level of the other three approaches it would still have the largest forecast errors. Next best was the Tuned Deterministic approach (dashed line). It too has slightly larger analysis errors, and the rate of error growth is again larger than the three remaining approaches. The SDE (dot-dashed), Hybrid (solid

grey), and Perfect (dotted) approaches all have comparable skill, with the SDE approach being slightly inferior. Ensemble mean forecast errors for the Hybrid and the Perfect approaches are essentially identical. The second moment error statistics of the ensemble forecasts are assessed by collecting the ratio of the forecast ensemble mean error to the forecast ensemble standard deviation for each component of the model as a function of forecast lead time. If the ensemble forecast has the correct second moment statistics, then the distribution of this ratio should have a mean of zero (unbiased), and a standard deviation of 1. The mean of the ratio as a function of forecast lead for each of the five approaches is shown in Fig. 4. While there is some offset from the zero line, the SDE (dot-dashed), Hybrid (solid grey), and Perfect (dotted) approaches are all qualitatively similar and have relatively small bias for the sample considered. The Tuned Deterministic (dashed) approach has slightly more variability, and the Untuned Deterministic (solid) has a marked positive bias at long forecast leads. The standard deviation of the ratio for each of the methods is plotted in Fig. 5. Both the Untuned and the Tuned Deterministic integration schemes give standard deviations of the ratio that are far in excess of 1 (black solid line and dashed line, respectively). This implies that the forecast ensembles for these approaches have too little spread. In contrast, the SDE (dot-dashed), Hybrid (solid grey), and Perfect (dotted) approach give nearly identical results. The Hybrid and the SDE approaches produce a standard deviation of the ratio that is almost indistinguishable from the Perfect approach, indicating that from the point of view of second moment statistics, the Hybrid integration scheme

J.A. Hansen, C. Penland / Physica D 230 (2007) 88–98


Fig. 4. Mean of the ratio of ensemble mean error to ensemble spread as a function of forecast lead for the five approaches to ensemble forecasting considered. If the forecasts are unbiased the mean of this ratio should be zero. The Untuned Deterministic (solid) and the Tuned Deterministic (dashed) both produce biased forecasts. The SDE (dot-dashed), and Hybrid (solid grey) approaches produce non-zero means, but are nearly indistinguishable from the Perfect (dotted) approach. Experimentation has shown that the apparent bias in these three approaches is due only to sampling; larger samples drive the means towads zero.

Fig. 5. Standard deviation of the ratio of ensemble mean error to ensemble spread as a function of forecast lead for the five approaches to ensemble forecasting considered. If the second moment forecast statistics are correct, the standard deviation of this ratio should be one (for unbiased forecasts). The Untuned Deterministic (solid) and the Tuned Deterministic (dashed) both produce underdispersive ensemble forecasts. The SDE (dot-dashed), Hybrid (solid grey), and Perfect (dotted) approaches all produce forecasts with almost perfect second moment statistics.

using the parameter distribution provided by ensemble-based parameter estimation produces a model that is nearly a perfect representation of the SDE that defines the true system.

Experiments carried out using other parameter values and observation frequencies rendered qualitatively similar results. The tuned distribution changes with τobs (in a manner


J.A. Hansen, C. Penland / Physica D 230 (2007) 88–98

predictable by the CLT), but applying the distribution to the forecasting problem in the same manner it was estimated always produced excellent second moment error statistics. Because the Hybrid approach is an approximation to the SDE approach, both give qualitatively identical results. A large practical advantage of the Hybrid approach is that it is much less cumbersome to implement (we don’t have to rewrite the integration scheme in the deterministic forecast model) and can be run with much larger timesteps than the SDE. 7. Conclusions Data assimilation-based parameter estimation can be effective both for model tuning and for the production of parametric distributions for use during forecasting. When in a situation of structural, rather than parametric, model inadequacy, one should not expect to produce a static estimate of parameter values. Rather, one should expect parameter values to change as a function of time in order to offset statedependent structural error. A parameter estimation approach under these constraints should result in a time-varying parameter estimate that is ultimately interpreted as a probability density function (pdf). One must take care when interpreting the resulting parametric distributions. The distributions only represent the level of parametric variability necessary to mimic observations under the constraints imposed by the data assimilation, namely (i) a deterministic model integration, and (ii) a stochastic timescale dictated by the frequency of observations rather than a Wiener process (in data assimilation the stochasticity is only applied at the end points of the observation periods and held constant in between). It is possible to recover the correct form of the system’s stochasticity if the variability of the tuned parametric distribution is scaled in a manner consistent with the CLT. The validity of the approach assumes that the data assimilation procedure has accurate estimates of all relevant uncertainties (initial condition, parametric, model, observation); data assimilation will always produce estimates of the state and, if so configured, parameters, but the accuracy of those estimates (and the utility of the CLT for rescaling) are critically dependent upon the accuracy of the information used in the data assimilation process. Note that when these conditions are met we are able to apply the CLT to accurately estimate multiplicative noise processes. Sophisticated techniques such as those presented in Miller et al. [19] considered data assimilation in the presence of only additive noise. It has been shown that it is perfectly reasonable to apply the parametric distributions to the forecasting problem so long as the “stochasticity” is applied in exactly the same way it was estimated: a deterministic integration where the parameter value changes only on observation period time-scales. For our system, the equivalence of results using properly integrated SDEs and the hybrid stochastic/deterministic forecasting system indicated that this was true, and suggests that assimilation schemes currently prevalent in the atmospheric

sciences may be adequate to uncover parametric distributions that can skillfully account for model inadequacy. Obviously, there remains work to do on some practical aspects of this approach, particularly those concerning the boost factor applied to the forecast error covariance matrix. At present, we suggest the minimum boost factor necessary to eliminate parametric filter divergence. However, since this is normally what is done anyway, our suggestion does not pose any difficulties. We also wish to understand why a distribution made up of estimates of the mean parameter value provides the desired information; the instantaneous parametric distributions provided by an EnKF analyses do not give the same results. Simultaneously estimating state and parameter values both increases the length of the data assimilation control vector and increases the degree of nonlinearity of the problem (parameters are typically multiplied by prognostic variables or functions of prognostic variables). It remains to be seen the extent to which these techniques can be applied to spatially extended parameters in large geophysical models. An area where this approach may have an immediate impact is parameter tuning in single column models. Such an application is currently under investigation. Acknowledgments The authors are grateful for the comments provided by Brian Ewald, Brian Farrell, Petros Ioannou, and two anonymous referees. The bulk of the work was done during Penland’s visit to MIT as a Houghton Lecturer and Hansen’s visit to NCAR as a guest of the Data Assimilation Initiative. This work was supported in part by NSF grant ATM-0216866, and by the National Oceanic and Atmospheric Administration. Appendix. The central limit theorem applied to data assimilation In much of the mathematical literature (e.g. [2,15]), Khasminskii’s [14] classic paper – entitled “A limit theorem for solutions of differential equations with random right hand side”. – is called the Central Limit Theorem (CLT). As will be made clear in the following, Khasminskii’s approach is essentially a dynamical update to the statistical version of the CLT familiar to most scientists. Extensions to the Khasminskii work, such as Papanicolaou and Kohler [21] (see also the discussion in Gardiner [7]), are also updated versions of the CLT. Consider a dynamical system of equations as follows: dx ˜ ˜ = G(x, t) + F(x, t) (A.1) dt where x is a vector in the N -dimensional Euclidean space ˜ ˜ R N and where G(x, t) and F(x, t) are characterized by short ˜ and long correlation times, respectively. The quantity F(x, t) will be identified with the deterministic Lorenz model L63. For our purposes, an alternative description of x in terms of a dimensionless parameter ε is preferable to Eq. (A.1): dx = εG(x, t) + ε 2 F(x, t), dt



J.A. Hansen, C. Penland / Physica D 230 (2007) 88–98

where now F(x, t) and G(x, t) are of the same order of magnitude. It should be noted that the parameter ε is not intended here to be a measure of the relative importance of ˜ ˜ F(x, t) and G(x, t) but rather of the relative rapidity with which the autocorrelation functions of these terms decay, as will be clear in what follows. The theorem of Papanicolaou and Kohler ([21]; PK74 hereafter; for an earlier, simpler version see Khasminskii [14]) describes the conditions under which a singular scaling of time allows Eq. (A.2) to converge weakly to a stochastic differential equation. This is a dynamical form of the Central Limit Theorem. The conditions require that the fast process εG(x, t) be sufficiently variable and that the probability density function (pdf) of any value of εG(x, t) becomes independent of any initial conditions as time increases indefinitely, and at a sufficiently rapid rate. Further, G(x, t) is required to be sufficiently smooth with respect to the components of x, where “sufficiently rapid” and “sufficiently smooth” are made explicit in PK74. These conditions are easily met for any of the cases we consider in this manuscript. The time coordinate is now scaled s = ε2 t


G ik (x, s)ςk (s/ε 2 ),



where angle brackets denote expectation value. With these restrictions, the theorem by PK74 states that in the limit of long times (t → ∞) and small ε (ε → 0), taken so that s = ε2 t remains fixed, the conditional pdf for x at time s given an initial condition x0 (s0 ) satisfies the backward equation (A.7)

where N X i j=1

a i j (x0 , s0 )

N X ∂2 ∂ + b j (x0 , s0 ) ∂ x0i ∂ x0 j ∂ x 0j j=1

b (x, s) = j





G ik (x, s)

∂G mj (x, s)


∂ xi

+ F j (x, s).


In this limit, the conditional pdf also satisfies a forward (Fokker–Planck) equation in the scaled coordinates: ∂ p(x, s | x0 , s0 ) = L∗ p(x, s | x0 , s0 ) ∂s


where the adjoint operator is L∗ =

N X ∂ j ∂2 a i j (x, s) − b (x, s). ∂ x ∂ x ∂ xj i j i j=1 j=1 N X



Gk (x, s)Skα ◦ dWα ,







dx = F(x, s)ds +

where ζk (s/ε 2 ) is stationary, centered and bounded. Note that L63 (Eq. (1.1)) is of this form when the parameter a has an uncertain perturbation as . The integrated lagged covariance matrix of ζ has elements Z ∞ Ckm = hςk (t)ςm (t + τ )idτ,

∂ p(x, s | x0 , s0 ) = L p(x, s | x0 , s0 ), ∂s0

Ckm G ik (x, s)G mj (x, s)



k, m = 1, 2, . . . K ,



The proof in PK74 that Eq. (A.4) converges weakly to a stochastic differential equation is more general, and more difficult to apply, than what is needed for the cases we shall consider. We therefore restrict the problem by putting more conditions on G(x, s/ε 2 ), and by stating that F, G, and x are all vectors with N elements. Let G(x, s/ε 2 ) be of the form G i (x, s/ε 2 ) =

a i j (x, s) =

As stated above, the conditional pdf of x in the scaled coordinate system obeys Eqs. (A.7) and (A.10) in a weak sense. That is, the moments of x can be approximated by the moments of the solution to the stochastic differential equation,

and Eq. (A.2) becomes 1 dx = G(x, s/ε 2 ) + F(x, s/ε 2 ). ds ε



where the symmetric matrix C has been written as the product of two matrices (C = SST ) and has absorbed the factor of 1/2 present in most formulations of the Fokker–Planck equation. The quantity W is a vector of independent Wiener processes and the expression ◦ dW denotes the fact that Eq. (A.12) is to be interpreted in the sense of Stratonovich. That is, the white noise is an approximation to a continuous system with small but finite decorrelation time so that stochastic integrals reduce to standard Riemann integrals [16]. The interpretation of (A.12) is as follows: both s and t measure time, but not in the same units. A process that goes through a life cycle in a matter of seconds may look like a Wiener process on the timescale of a week. Certainly, the number of seconds (t) equal to one week (s) is very large, so their scaling factor (ε2 ) is very small. If the fast process is to have any effect, though, it has to be big enough so that its effects don’t simply average out over the larger timescale. ˜ must scale as in Eq. (A.2). ˜ and G Hence, the relative sizes of F ˜ over a Since the cumulative effects of the rapid variations in G 2 time interval ε are consolidated into a larger unit of time, and are individually unresolved, these variations act as a Gaussian random process with a variance proportional to the scaled time. That is, the coarse-grained system looks like Eq. (A.12). We now apply the CLT to the process of assimilating an uncertain parameter into L63, identifying the state variable x with the vector (x, y, z). Let ε 2 F(x, t) be identified with the deterministic Lorenz model L63 (Eq. (1.1)), and let us assume that the parameter a is perturbed by a rapidly varying quantity of amplitude as , which we estimate through data assimilation. ˜ Thus, we identify ε G(x, t) in Eq. (A.2) with the expression


J.A. Hansen, C. Penland / Physica D 230 (2007) 88–98

as ξ(x − y), where ξ is a random variable with mean zero and unit standard deviation. From Eq. (A.5), we identify k = 1, i = 1, G 11 = (x − y) and ζ1 = as ξ . We observe as once per time interval τobs . The model is run holding as ξ fixed during this interval, after which an independent sample of as ξ , uncorrelated with the previous sample of as ξ , is observed. Thus, the sample random variable as ξ has an autocorrelation function equal to one for a time τobs and equal to zero thereafter. From the CLT, we note that if the stochastic model can be treated as a stochastic differential equation, it will see a stochastic differential equation in the scaled coordinates s = ε 2 t as follows: √ dx = F(x, s)ds + eˆ x (x − y)as τobs ◦ dWs , (A.13)

[9] [10]

[11] [12]

[13] [14]

where eˆ x is a unit vector in the x-direction and where dW s has variance ds. On timescales large compared with τobs , moments of the discretely assimilated system will be statistically indistinguishable from Eq. (A.13).




[1] J.L. Anderson, An ensemble adjustment Kalman filter for data assimilation, Mon. Wea. Rev. 129 (2001) 2284–2903. [2] L. Arnold, Hasselmann’s Program Revisited: The Analysis of Stochasticity in Deterministic Climate Models, 2001. [3] C.H. Bishop, B.J. Etherton, S.J. Majumdat, Adaptive sampling with the ensemble transform Kalman filter. Part 1: Theoretical aspects, Mon. Wea. Rev. 129 (2001) 420–436. [4] G. Burgers, P.J. van Leeuwen, G. Evensen, Analysis scheme in the ensemble Kalman filter, Mon. Wea. Rev. 126 (1998) 1719–1724. [5] R. Daley, Atmospheric Data Analysis, Cambridge University Press, 1991. [6] G. Evensen, Sequential data assimilation with a nonlinear quasigeostrophic model using Monte Carlo methods to forecast error statistics, J. Geophys. Res. 99 (C5) (1994) 10143–10162. [7] C.W. Gardiner, Handbook of Stochastic Methods, Springer-Verlag, Berlin, 1985. [8] M. Ghil, K. Ide, A. Bennett, P. Courtier, M. Kimoto, M. Nagata, M. Saiki, N. Sato (Eds.), Data Assimilation in Meteorology and Oceanography:


[18] [19] [20] [21] [22] [23]

[24] [25]

Theory and Practice, in: A Collection of Papers Presented at the WMO Second International Symposium on Assimilation of Observations in Meteorology and Oceanography, 13–17 March 1995, Tokyo, Japan, 1997. J. Hansen, Accounting for model error in ensemble-based state estimation and forecasting, Mon. Wea. Rev. 130 (2002) 2373–2391. J. Hansen, C. Penland, Efficient approximation techniques for integrating numerical stochastic differential equations, Mon. Wea. Rev. 134 (2006) 3006–3014. K. Hasselmann, Stochastic climate models, Part I. Theory, Tellus 28 (1976) 473–485. W. Horsthemke, R. Lefever, Noise-Induced Transitions: Theory and Applications in Physics, Chemistry and Biology, Springer-Verlag, Berlin, 1984. R.E. Kalman, A new approach to linear filtering and prediction problems, J. Basic Eng. 82 (1960) 35–45. R.Z. Khasminskii, A limit theorem for solutions of differential equations with random right hand side, Theory of Probability and its Applications 11 (1966) 390–406. Y. Kifer, Averaging and climate models, in: Stochastic Climate Models, Birkh¨auser, Basel, 2001, pp. 171–188. P.E. Kloeden, E. Platen, Numerical Solution of Stochastic Differential Equations, Springer-Verlag, Berlin, 1992. A.C. Lorenc, Analysis method for numerical weather prediction, Quart. J. Royal Meteorological Soc. 112 (1986) 1177–1194. E.N. Lorenz, Deterministic nonperiodic flow, J. Atmos. Sci. 20 (1963) 130–141. R.N. Miller, E.F. Carter, S.T. Blue, Data assimilation into nonlinear stochastic models, Tellus 51A (1999) 167–194. W.H. Press, S.A. Teukolsky, W.T. Vetterling, B.P. Flannery, Numerical Recipes in FORTRAN, Cambridge University Press, 1986. G.C. Papanicolaou, W. Kohler, Asymptotic theory of mixing stochastic differential equations, Comm. Pure Appl. Math. 27 (1974) 641–668. W. Ruemelin, Numerical treatment of stochastic differential equations, SIAM J. Numer. Anal. 19 (1982) 605–613. P.D. Sardeshmukh, C. Penland, M. Newman, Rossby waves in a stochastically fluctuating medium, in: Stochastic Climate Models, Birkh¨auser, Basel, 2001, pp. 369–384. M.K. Tippett, J.L. Anderson, C.H. Bishop, T.M. Hamill, J.S. Whitaker, Ensemble square-root filters, Mon. Wea. Rev. 131 (2003) 1485–1490. J.S. Whitaker, T.M. Hamill, Ensemble data assimilation without perturbed observations, Mon. Wea. Rev. 130 (7) (2002) 1923–1924.