A reduced adjoint approach to variational data assimilation

A reduced adjoint approach to variational data assimilation

Comput. Methods Appl. Mech. Engrg. 254 (2013) 1–13 Contents lists available at SciVerse ScienceDirect Comput. Methods Appl. Mech. Engrg. journal hom...

3MB Sizes 1 Downloads 81 Views

Comput. Methods Appl. Mech. Engrg. 254 (2013) 1–13

Contents lists available at SciVerse ScienceDirect

Comput. Methods Appl. Mech. Engrg. journal homepage: www.elsevier.com/locate/cma

A reduced adjoint approach to variational data assimilation M.U. Altaf a,⇑, M. El Gharamti a, A.W. Heemink b, I. Hoteit a a b

King Abdullah University of Science and Technology, Thuwal, Saudi Arabia Delft University of Technology, Delft, The Netherlands

a r t i c l e

i n f o

Article history: Received 2 August 2011 Received in revised form 1 October 2012 Accepted 5 October 2012 Available online 26 October 2012 Keywords: Proper orthogonal decomposition 4DVAR Model order reduction

a b s t r a c t The adjoint method has been used very often for variational data assimilation. The computational cost to run the adjoint model often exceeds several original model runs and the method needs significant programming efforts to implement the adjoint model code. The work proposed here is variational data assimilation based on proper orthogonal decomposition (POD) which avoids the implementation of the adjoint of the tangent linear approximation of the original nonlinear model. An ensemble of the forward model simulations is used to determine the approximation of the covariance matrix and only the dominant eigenvectors of this matrix are used to define a model subspace. The adjoint of the tangent linear model is replaced by the reduced adjoint based on this reduced space. Thus the adjoint model is run in reduced space with negligible computational cost. Once the gradient is obtained in reduced space it is projected back in full space and the minimization process is carried in full space. In the paper the reduced adjoint approach to variational data assimilation is introduced. The characteristics and performance of the method are illustrated with a number of data assimilation experiments in a ground water subsurface contaminant model. Ó 2012 Elsevier B.V. All rights reserved.

1. Introduction The adjoint method is a well-known variational approach for data assimilation. The method aims at adjusting a number of unknown control parameters on the basis of given data. The control parameters might be model initial conditions, or any model parameters and inputs [1,2]. In this approach, an objective function is first defined which, for any model solution over the assimilation interval, measures the misfit between that solution and the available data. This objective function is typically the sum of squared differences between the data and the corresponding model values. One will then look for the model solution that minimizes this objective function by adjusting the chosen set of control parameters. It is now recognized that the adjoint method is computationally the most efficient procedure to calculate the gradients of the objective function. In this method, the gradients are obtained by running the adjoint of the linearized forward model backward in time. These are then used in a gradient-based optimization algorithm to solve the variational data assimilation problem. The adjoint method is widely used and have been successfully applied to many types of data assimilation problems including weather applications, ground water flow studies (e.g., [3,4]), oceanography applications (e.g., [5–9]) and in shallow water flow models (e.g., [10–13]). ⇑ Corresponding author. E-mail address: [email protected] (M.U. Altaf). 0045-7825/$ - see front matter Ó 2012 Elsevier B.V. All rights reserved. http://dx.doi.org/10.1016/j.cma.2012.10.003

The adjoint method is an efficient approach because the gradient calculation is independent of the number of control parameters and requires just a single simulation of the model forward in time and a single simulation of the adjoint model backward in time. However, the integration of the adjoint of large scales model backward in time can be equivalent to several forward model simulations [14–16] and thus remains computationally expensive. This is because the model-data differences act as a forcing term in the adjoint model and therefore one needs to store the whole model trajectory to run the adjoint model. Check-pointing strategies are then used to avoid the storage of the model state at every step as described by Griewank [14]. Another drawback of the adjoint method is the important human and computing resources required to implement, maintain, and run the adjoint model. Research has recently been carried out to develop automatic differentiation tools in order to generate the adjoint of a computer code through direct compilation, and adjoint compilers have now become available (see [17,15]). However, even with the use of these adjoint compilers, developing an adjoint model still require significant programming efforts that hampers new applications of the method. Moreover any modification to the forward model requires modifying the adjoint model and therefore one need to keep track of the model code development to update the adjoint accordingly. One way to reduce the computational burden of the adjoint method is to reduce the size of the control parameters. The optimization is then applied in a reduced space requiring less iterations for convergence [18,19]. This reduced control space approach

2

M.U. Altaf et al. / Comput. Methods Appl. Mech. Engrg. 254 (2013) 1–13

reduces the CPU time from reducing forward model and backward adjoint simulations but does not change the complexity of the original variational problem as the original adjoint model is still used to compute the gradients before projection onto the reduced space. To simplify coding efforts and further alleviate computational burden of the adjoint model, one can build an approximate adjoint from a reduced but cheaper forward model. In this reduced model approach, the optimization is again applied in a reduced space. For instance, Courtier et al. [20] proposed to replace the original model by a fast low-resolution model approximating the original forward model [20]. Similarly, Schiller and Willebrand [21] simplified the model physics before building the adjoint model. Reduced order techniques has also been used to apply the variational data assimilation problem on efficient low order approximate linear models for which the adjoint code can be easily developed (see e.g., [22]). The proposed optimization algorithms, although not optimal in a statistical sense, allows calculating dynamically consistent model states estimates that agree as closely as possible with the observed data [21]. Proper orthogonal decomposition (POD) is a model reduction technique considered as an application of the singular value decomposition (SVD) to the approximation of general dynamical systems [23]. The POD method is widely used in many fields like image processing, signal processing, data compression, oceanography, chemical engineering and fluid mechanics (see [24]). In the POD method the reduced (or projection) subspace is determined by processing data obtained from numerical simulations of the model, which is expected to summarize information about the dynamical behavior of the system. POD is often used to determine a low-dimensional basis for order reduction of the control space (e.g. [18,19,25,26]), but can also be used to build a reduced-order forward model, from which a reduced adjoint model can be constructed, via Galerkin projection of the forward model equations on the POD modes [23]. This latter approach is however an intrusive approach in the sense that it requires rewriting the model equations with all associated significant additional coding efforts. Vermeulen and Heemink [27] recently proposed a POD-based non-intrusive and numerically efficient reduced model variational assimilation method. In this method, a reduced-order model that is linear and very fast to integrate is numerically built using finite differences applied in the POD reduced space. The reduced adjoint can then be implemented very easily and the minimizing problem is completely solved in the POD space with a very low computational cost. This POD-based reduced model method was applied for the estimation of depth values for a very large scale tidal model [28,29]. The method has also been applied in petroleum engineering for history matching problems [30]. One drawback of this method is that the numerical construction of POD-reduced model scales linearly with the number of control parameters, and could therefore become computationally very demanding for data assimilation problems with very large dimensional control spaces. Large control spaces can be however often reduced using additional order-reduction techniques. Another more serious drawback of this method is that it is fully applied in the reduced POD space, where the model-data misfit terms in the cost function are computed based on the reduced model state. This could possibly lead to an optimized set of control parameters that is not a solution of the original variational problem. In this work, we propose to apply a simple but very effective modification to the POD-reduced model method of Vermeulen and Heemink [27]. The idea here is to work on solving exactly the original variational assimilation problem where the modeldata misfit terms in the cost function are computed in the original full space, but using approximate gradients computed as the outputs of the POD reduced adjoint model. The numerical integration of the reduced adjoint model requires negligible computa-

tional cost, which greatly alleviates the complexity of the adjoint method. Once the reduced adjoint model is integrated backward in time, its outputs are projected back in the full space where the minimization process is carried. The proposed POD-reduced adjoint method not only avoids the implementation of the adjoint of the original nonlinear model, but is also shown to be very efficient, in term of computation cost and performances, as compared to the full adjoint method in our numerical test experiments with a subsurface contaminant data assimilation problem. The paper is organized as follows. Section 2 briefly recalls the classical four dimensional variational data assimilation method. Sections 3 and 4 describe the construction of the POD reduced space and the POD based model reduction procedure for minimization, respectively. Section 5 presents POD-reduced adjoint method. Section 6 discusses several numerical assimilation experiments with a subsurface contaminant model to evaluate performances. Concluding remarks are offered in Section 7. 2. Four dimensional variational data assimilation (4DVAR) Consider a dynamical system modeled by a discrete nonlinear equation of the form

Xðtiþ1 Þ ¼ M i Xðt i Þ;

i ¼ 1; . . . ; m  1;

ð1Þ

n

where Xðt iþ1 Þ 2 R is a model state vector and Mi is a nonlinear and deterministic dynamics operator that includes inputs and propagates the state from time t i to t iþ1 . Suppose that imperfect observations Yðti Þ 2 Rq of the dynamical system (1) are available that are related to the model state at time t i through

Yðti Þ ¼ HðXðt i ÞÞ þ gðt i Þ; n

ð2Þ

q

where H : R ! R is an observation operator that maps the model fields on observation space, gðti Þ is a white Gaussian observation noise process with zero mean and covariance matrix Ri introduced to model the uncertainties associated with observation process. Variational data assimilation aims at looking for an ‘‘optimal’’ set of parameters that provides the best model fit with available observations, measure through an objective cost function. Here we will present the traditional 4DVAR approach in which only model initial conditions X 0 is considered as control parameters. The method can be however easily generalized to further include any model parameter or input as control. The objective function J to be optimized is defined based on model-data misfit penalty terms as:

JðX 0 Þ ¼

1 b 1X b ðX  X 0 ÞT B1 ðYðt i Þ 0 ðX  X 0 Þ þ 2 2 i  HðXðt i ÞÞÞT R1 i ðYðt i Þ  HðXðt i ÞÞÞ; b

ð3Þ

where X is a prior estimate of X 0 , assumed uncorrelated with covariance matrix B0 , and the term X b  X 0 acts as a regularization term. The errors in the observations and the errors in the prior estimates are also assumed to be uncorrelated. The 4DVAR problem then consists of minimizing J over the window of available data and constrained by the model dynamics (1). The minimization of J is often based on gradient-based methods, as the quasi-Newton methods. These methods require the computation of the gradient of J with respect to X 0 , denoted by rJ. It is now recognized that the most efficient approach to compute the exact gradients of J is to use the adjoint method [1,2]. The principle of the adjoint method is based on the systematic use of the chain rule for differentiation [31,32]. One way to formulate the adjoint method is to use the Lagrange multipliers method to replace this constrained optimization problem (1) by an unconstrained optimization of the functional

M.U. Altaf et al. / Comput. Methods Appl. Mech. Engrg. 254 (2013) 1–13

1X ðYðt i Þ  HðXðt i ÞÞÞT R1 i ðYðt i Þ  HðXðt i ÞÞÞ 2 i X mðtiþ1 ÞT ðXðtiþ1 Þ  Mi ðXðti ÞÞÞ; þ

JðX 0 Þ ¼ J b þ

ð4Þ

i

where mðt i Þ is the vector of Lagrange multipliers, often referred to as the adjoint state variables. Jb is the background term. Now the incremental changes in J; Xðt i Þ and m due to incremental change in one of the components of X 0 gives

DJðX 0 Þ ¼

n

m1 X

o

mðtiþ1 ÞT DXðtiþ1 Þ  MTi DXðti Þ

þ

T

i¼0

ð5Þ

In the above formula, Mi denotes the Jacobian of Mi with respect to X i . The above expression after rearrangement yields

DJðX 0 Þ ¼

n o DXðt i ÞT mðt i Þ  MTi mðt iþ1 Þ  HT R1 i ðYðt i Þ  HðXðt i ÞÞÞ

m 1 X i¼1

ð6Þ The adjoint states mðt i Þ are still free variables. An expression for the adjoint model mðtiþ1 Þ; i 2 fm  1; . . . ; 1g, solved backward in time follows from

ð7Þ

with

mðtm Þ ¼ 0:

ð8Þ

The gradient of the objective function J with respect to X 0 is then given by

rJðX 0 Þ ¼

b B1 0 ðX

 X 0 Þ  mðt 0 Þ:

ð10Þ

A ‘‘centered’’ ensemble is then formed by subtracting the mean from each snapshot

i 2 f1; 2; . . . ; sg:

ð11Þ

ð12Þ

The subspace Sr is then generated by the first few eigenvectors of Q. The dimension n can be of the order of 108 , and even larger, so direct eigenvalue decomposition of Q is not feasible. One can however avoid this problem by applying the eigenvalue decomposition on the matrix G defined as the inner product of E:

GZ i ¼ Et EZ i ¼ ki Z i ;

þ mðt m ÞDXðtm ÞT þ DJb :

mðti Þ ¼ MTi mðtiþ1 Þ þ HT R1 i ðYðt i Þ  HðXðt i ÞÞÞ;

s 1X Xi: s i¼1

Q ¼ EEt :

i¼1

 HðXðt i ÞÞÞDXðti Þ þ DJ b :



The sample covariance matrix Q of the X i is computed from the ensemble E by taking the outer product:

m X Dmðt iþ1 Þ fXðt iþ1 Þ  M i ðXðt i ÞÞg  HT R1 i ðYðt i Þ

m1 X

A set of s snapshots X i of some physical process, each of dimension n, are first collected. The elements within a snapshot represent the signal at a specific location in the model, possibly for multiple quantities. Define the mean vector X

Ei ¼ X i  X;

i¼0

3

ð9Þ

Each step of the gradient iteration process requires one forward solution of the model Eq. (1), starting from the most recent estimate of X 0 , and one backward integration of the adjoint model (4). The most recent estimate of X 0 is then updated according to the computed gradient direction, and a new optimization iteration begins. The adjoint method can be implemented with realistic large scale data assimilation problems if enough human and computing resources are available to develop and run the adjoint model. Despite continuous progress in computing resources, the adjoint method remains computational very demanding for large scale realistic data assimilation problems, as each optimization iteration requires one forward and one backward integration of the model and its adjoint. Moreover, as discussed in Section 1, often checkpointing techniques are needed to avoid storing the while state trajectory needed for the integration of the adjoint, at the cost of increase in number of forward model integrations for each optimization iteration. Another hurdle in the use of the adjoint method is its implementation. Indeed, coding the adjoint model requires tremendous programming efforts.

3. Proper orthogonal decomposition The POD method, also known as principle components analysis in statistics, is used for a broad range of order-reduction applications. The main idea is as follows. Given a signal that lies in a vector space S, to find a subspace Sr of fixed dimension r, such that the projection error of the signal onto Sr is minimized.

i 2 f1; 2; . . . ; sg;

ð13Þ

with Z i are the eigenvectors of G ranked in decreasing order of their associated eigenvalues ki . This matrix is of dimension s  s and is usually much simpler to diagonalize than Q since the number of snapshots s is often much smaller the dimension of the system n. The eigenvectors Z i may be chosen to be orthonormal and the POD modes pi are then given by

pffiffiffiffi pi ¼ EZ i = ki ;

ð14Þ

which in matrix form can be written as

P ¼ EZ^1=2 ;

ð15Þ

with Z ¼ fZ 1 ; Z 2 ; . . . ; Z s g, P ¼ fp1 ; p2 ; . . . ; ps g and ^ is a diagonal matrix containing the eigenvalues on its diagonal. A physical interpretation of the POD subspace is that they maximize the average energy in the projection of the signal snapshots onto the subspace spanned by the POD modes. The eigenvalues ki provide a measure (wi ) of the relative energy associated with the corresponding POD modes pi :

ki wi ¼ Ps

l¼1 kl

 100%;

i ¼ f1; 2; . . . ; sg:

ð16Þ

It can be also shown that the first r POD modes P r ¼ fp1 ; p2 ; . . . ; pr g (r < s  n) such that w1 > w2 >    > wr retains

wr ¼

r X wl

ð17Þ

l¼1

of the total variance of the original set of snapshots X i . Any X i can then be approximately represented in the reduced POD space as

b i  X ¼ P r ni ; X

ð18Þ

with the reduced-order state ni a vector of size r. 4. POD-reduced model variational approach A dynamical model (1) can be reduced whenever its state Xðti Þ can be approximated as a linear combination of a set of normalized basis vectors p1 ; p2 ; . . . ; pr , i.e.

b ðtiþ1 Þ ¼ X þ Pnðt iþ1 Þ; X

ð19Þ

^ i Þ is the approximate (linearized) state, X is a background where Xðt state (often taken as a mean state), P is a matrix composed of the pi , and therefore satisfies PT P ¼ I. In this work, the matrix P is obtained

4

M.U. Altaf et al. / Comput. Methods Appl. Mech. Engrg. 254 (2013) 1–13

Since the reduced model is linear and in reduced space its adjoint is easily available and the reduced adjoint states m^ðtiþ1 Þ; i 2 fm  1; . . . ; 1g, solved backward in time are given by

e Tm ^ T 1 b ^ m^ðti Þ ¼ M i ðt iþ1 Þ þ H Ri ðYðt i Þ  H X ðt i ÞÞ;

ð24Þ

where

b ¼ HP: H

ð25Þ

The gradient of the objective function bJ is then obtained in the reduced space by Fig. 1. 2D saturated flow field with the two major rocks having different permeabilities (the small rock with K2 is located at the center of the large rock).

Table 1 Summary of the performed experiments. Advection

First guess

Observation Prior noise

Experiement 1 Invariant Avge. of ref. No Experiment 2 Case 1 Time varying Avge. of ref. Yes Case 2 Time varying Bottom domain Yes

No No Yes

T b ^ rbJðn0 Þ ¼ PT B1 0 PðP ðX  X 0 ÞÞ  mðt 0 Þ:

ð26Þ

The optimization is then performed in the reduced space and it converges efficiently. One disadvantage of this approach is that the method requires converting the whole nonlinear problem into a linear problem and thus loss accuracy. Secondly the optimization is performed in the reduced space which is finally projected back to the original space, thus the solution from the reduced model approach may not converge to the same global minimum. 5. POD-reduced adjoint method

by collecting the dominant POD modes of an ensemble of snapshots as simulated by a forward model run. n is called the reduced timevarying state vector and it evolves in time according to

e i nðti Þ; nðt iþ1 Þ ¼ M

ð20Þ

e i are simplified dynamics operators defined as where M

e i ¼ PT Mi P: M

ð21Þ

In the reduced model approach [19,25] we look for an optimal solution of (1) to minimize the approximate cost function (bJ)

X bJðn0 Þ ¼ 1 ðPT ðX b  X 0 ÞÞT PT B1 PðPT ðX b  X 0 ÞÞ þ 1 ðYðt i Þ 0 2 2 i¼1 b ðti ÞÞT R1 ðYðt i Þ  H X b ðt i ÞÞ;  HX i

ð22Þ

where

b ðti Þ ¼ Pnðt i Þ: X

ð23Þ

In the proposed reduced adjoint approach the forward model (1) remains the same. This means that we look for an optimal solution of (1) to minimize the same cost function J (3). The adjoint model is built in reduced space by projecting the adjoint states on the dominant POD modes. An expression for the reduced adjoint ^ðt iþ1 Þ; i 2 fm  1; . . . ; 1g, solved backward in time are then states m given by

e Tm ^ T 1 ^ m^ðti Þ ¼ M i ðt iþ1 Þ þ H Ri ðYðt i Þ  HXðt i ÞÞ;

ð27Þ

with

m^ðtm Þ ¼ 0:

ð28Þ

e i are simplified dynamics operators which approximate the full M Jacobians Mi . The dimension on which the reduced adjoint model operates depends on the number of POD modes (r) selected in the POD basis. The gradient of the objective function J is then obtained in the full space by b ^ rJðX 0 Þ ¼ B1 0 ðX  X 0 Þ  P mðt 0 Þ:

Fig. 2. Spatial distribution of the water heads and streamlines inside the porous medium.

ð29Þ

M.U. Altaf et al. / Comput. Methods Appl. Mech. Engrg. 254 (2013) 1–13

5

Fig. 3. The reference states of the contaminant transport model.

the reduced model approach and the proposed reduced adjoint approach can be seen from Eqs. (24) and (27). In the proposed approach the reduced adjoint states are obtained by constraining the system state with the observations Yðti Þ. In the reduced model approach, the reduced adjoint states are obtained by constraining the reduced system state with the observations. This of course means that the original forward model needs to be integrated for every optimization iteration, compared to integrating the PODreduced in the reduced model approach. This was however shown to significantly improve performances in our numerical experiments, at the cost of few more forward model integrations. The reduced adjoint model is used in both approaches. ei 5.1. Numerical construction of the POD-reduced model M

Fig. 4. Pseudo-observations collected from porous medium.

ppm

50

10 40

West

8 30 6 20

4

10

2

5

10

15

20

25

30

35

40

45

50

0

South Fig. 5. First guess of the initial conditions X 0 .

Once the gradient has been computed, the process of minimizing the objective function J can be carried along the direction of the gradient in the full space. Note that the difference in the adjoints of

One simple way to construct the reduced dynamical operator e i is use finite differences to linearize the original dynamical modM el (1) with respect to Xðt i Þ along the directions of the dominant r POD modes pi [27]. This can be done by perturbing the nonlinear operator M i in the direction of each POD mode pi as follows:

@Mi M i ½Xðt i Þ þ eph   M i ½Xðt i Þ ; p ¼ @Xðt i Þ h e

ð30Þ

where e is the size of the perturbation. Following (21), the reduced e i can be computed by pre-multiplying the dynamical operator M T above formulae by P , i.e.

e i ¼ PT M



 @Mi @Mi p1 ; . . . ; pr : @Xðti Þ @Xðti Þ

ð31Þ

The dimension of the reduced adjoint model is much smaller than that of original adjoint model. More importantly the adjoint of the original nonlinear model is not needed. This means that the programming effort to build the adjoint of the original nonlinear model is not required any more. After the minimization process a new set of initial conditions X 0 is obtained. This process of minimization can be repeated by constructing new POD modes with the updated set of initial conditions X up 0 to obtain new optimized initial conditions.

6

M.U. Altaf et al. / Comput. Methods Appl. Mech. Engrg. 254 (2013) 1–13

ing the aquifer but in this model all water sources, that could be contaminated as well, are suppressed so that the term S in Eq. (33) vanishes. Under these assumptions, Eq. (33) reduced to

1

Captured Energy

0.95

 @ ð/C Þ @ @  þ ðU x C Þ þ U y C ¼ 0; @t @x @y

0.9 0.85 0.8 0.75

0.71

4

8

12

16

No. of POD modes Fig. 6. The POD modes captured energy for an ensemble of 60 snapshots.

5.1.1. Procedural flow with the reduced adjoint method The proposed estimation procedure is as follows:

  where U x ; U y represent the velocity field. This is a linear system and would allow us to study the performance of the different methods without the effect of non-linearities. Different assimilation experiments were carried onto assess the usefulness, and eventually limitations, of the proposed reduced adjoint method. The configurations of these experiments are summarized in Table 1. In particular, advection was introduced in the experiments as static or time varying. Experiments were also performed with or without observational noise, different number of retained PODs. We also tested the methods under difficult conditions starting the optimization from a ‘‘bad’’ initial guess. 6.2. Experiment I

1. Initial guess X 0 is used to generate an ensemble of forward model simulation. 2. Solve eigenvalue problem to get dominant eigenmodes pi . e i is obtained on the basis 3. The reduced dynamic operator M of these dominant POD modes pi . 4. Optimization iteration: Perform optimization iterations in full space by running the original nonlinear model (1) and the reduced adjoint model (31) in each iteration of the optimization process to obtain the optimal solution of the objective function J. 5. After minimization the updated initial conditions X 0 can be used again from step 1 as an initial guess if required. 5.1.2. Convergence criterion for the optimization iterations We stop the present optimization iteration following the criterion l, which is defined as:



X k 5 J ki k=k 5 J k0 k 6 j

ð32Þ

k¼1

with 5Jk0 is value of the gradient at the start of minimization iteration, 5Jki is the value of the gradient after each iteration. For convergence the gradient should decrease by at least three order of magnitude from the initial gradient [19]. 6. Numerical application 6.1. The subsurface contaminant model To test the proposed POD reduced adjoint method and to evaluate its performance compared to the standard adjoint method and the POD-reduced model method we use a test case problem consisting of a 2D advection diffusion equation modeling a subsurface contaminant transport

@ ð/C Þ þ r  ðUC  DðU ÞrC Þ ¼ r ðC Þ þ S; @t

ð33Þ

where / is the porosity of the medium, C is the concentration of the contaminant, D is the dispersion/diffusion term, r is the reaction/ adsorption term and S is the source term. Cell-centered finite difference scheme is used to discretize the equations on a H  L grid for a rectangular domain of ½0; 0  ½H; L. The porous medium consists of two major rocks having different permeabilities where one is embedded in the other as shown in Fig. 1. In this study, we consider only the advection part of Eq. (33) and ignore the dispersion and the reaction terms. In real applications, precipitation and dissolution can play an important role for water-

The model domain is shown in Fig. 1. The model is discretized on a 50  50 rectangular grid. The embedded small rock was positioned in the subdomain ½12; 12  ½38; 38 in terms of grid cells with a permeability of 10 millidarcy and the main rock has larger permeability of 100 millidarcy. The velocity field, obtained by solving the Darcy flow equations for incompressible fluids, was assumed to be constant in time meaning that the flow was steady. Considering constant water heads at the western and eastern boundaries of 100 and 10 m-water respectively, the resulting spatial distribution of the water heads and the velocity field inside the domain is shown in Fig. 2. A reference solution was generated by inserting a contaminant plume close to the western boundary in the subdomain ½4; 5  ½6; 45. Initially the domain has a concentration of 100 ppm in the plume and 0 ppm elsewhere. We model the region for 50 years with a time step of 2 months. Under the flow of water, the contaminant moves towards the eastern boundary avoiding the low permeability layer as can be seen from Fig. 3. Pseudo-observations (truth) were generated from the simulated reference concentrations as shown in Fig. 4. The observations were collected over vertical columns every 200 m, forming 160 cells out of 2500. This experiment was performed with perfect observations. b 0 from which the optimization process starts was The first guess X chosen as the average of the simulated state vectors from a long model run initialized with the true initial conditions (see Fig. 5). Starting from the first guess X 0 , an ensemble E of 60 snapshots vectors were simulated by running the subsurface contaminant model (33) over the whole simulation period i.e., 50 years. The snapshots were collected over regularly spaced intervals of five time steps. Using this ensemble E, a basis consisting of only eight dominant POD modes capturing 99.999% of the relative energy was formed. Fig. 6 plots the increase in captured energy (from the 60 snapshots) as function of the number of retained POD modes. The first two POD modes captured more than 95% of the total energy of E. Fig. 7 plots the retained first 8 POD modes. The ree i was obtained on the basis of these 8 duced dynamical operator M dominant POD modes as described in Section 5.1. A quasi-Newton method with LBFGS (Limited Memory Broyden Fletcher Goldfard Shanno) updating formula was used to optimize the objective functions of the different adjoint methods. Fig. 8 plots the reduction of the objective function as function of the number of optimization iterations. Results are shown for the three adjoint methods, the standard, POD-reduced model, and the POD-reduced adjoint. The proposed POD-reduced adjoint method achieves the largest decrease in the objective function

7

M.U. Altaf et al. / Comput. Methods Appl. Mech. Engrg. 254 (2013) 1–13

40

40

40

40

20

20

20

20

20

40

20

40

20

40

40

40

40

40

20

20

20

20

20

40

20

40

20

40

20

40

20

40

0

10

Reduced Adjoint Adjoint Reduced model −2

10

−4

10

0

10

Convergence Criterion (μ)

Normalized Objective Function (Jβ/J ) 0

Fig. 7. The first eight POD modes used for the proposed reduced adjoint method from an ensemble of 60 snapshots.

Reduced Adjoint Adjoint Reduced model

−1

10

−2

10

−3

10

−4

10

−5

10

−6

10

−6

10

1

100

200

300

400

Minimization Iterations (β) Fig. 8. Successive minimization iteration b of the objective function J with the (1) reduced adjoint method (blue), (2) adjoint method (red) and (3) reduced order model (green). (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)

and has the faster convergence rate, reducing the objective function by more than 90% after only few iterations of the optimization algorithm. Similar convergence rate is obtained with the POD-reduced model, but the final reduction in the objective function is not satisfactory. The convergence rate of the standard adjoint method is very slow compared to the reduced adjoint method. The adjoint method failed to achieve similar model fit to the data as the reduced adjoint method after more than 400 optimization iterations. The minimization process was stopped according to the criterion l, stating that the gradient should decrease by at least three order of magnitude from the initial gradient value. Fig. 9 plots the value of the convergence criterion l at successive optimization iterations b as it results from the three adjoint-based methods. The proposed algorithm achieved the lowest value of l after only 100

1

100

200

300

400

Minimization Iterations (β) Fig. 9. The value of the convergence criterion l at successive minimization iterations b with the (1) reduced adjoint method (blue), (2) adjoint method (red) and the (3) reduced order model (green). (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)

iterations, as compared with more than 400 iterations for the standard adjoint method. The POD-reduced model approach is computationally the most efficient but strongly lacked in accuracy. The proposed algorithm lead to accurate results and was computationally much more efficient than the adjoint method. Each iteration with the proposed algorithm requires one full simulation with the original model, while the adjoint computation is carried in the reduced POD space with negligible computational cost. The estimated initial conditions after optimization is plotted in Fig. 10 for the standard, POD-reduced model, and the POD-reduced adjoint. The results clearly suggest that only the proposed algorithm was able to recover the ‘‘true’’ initial conditions. Surprisingly, the adjoint method failed to accurately estimate the initial conditions. Our estimation problem is likely ill posed, which means that here is a possibility of more than one solution fitting the mod-

8

M.U. Altaf et al. / Comput. Methods Appl. Mech. Engrg. 254 (2013) 1–13

Fig. 10. Plot of the estimated initial state with the reduced adjoint method (top), the adjoint method (middle) and the reduced order model (bottom) after several minimization iterations.

M.U. Altaf et al. / Comput. Methods Appl. Mech. Engrg. 254 (2013) 1–13 Table 2 Optimum values of the convergence criterion l and the normalized objective function Jfinal for the minimization process with the 95% and 99% relative energy. J 0

Relative energy (%) 95 99

l

J final J0 1

3:59  101

2

1:03  102

5:85  10 1:02  10

Table 3 Optimum values of the convergence criterion l and the normalized objective function Jfinal obtained for the minimization process with 15 and 30 snapshots in an ensemble E. J0 No. of snapshots 30 15

l

J final J0 5

1:99  105

3

8:30  103

2:31  10 5:90  10

el to the available data. We have not included a background term in the objective function J intentionally in the present experiment to show that the POD based approach also serves as a good preconditioner [18]. The proposed reduced adjoint approach was also tested with two more scenarios in which the POD-reduced adjoint was constructed using only 2 and 4 dominant POD modes accounting for 95% and 99% of the relative energies, respectively. The configurations of the experiments are otherwise the same as before. The results of these experiments are summarized in Table 2. As expected, the total decrease in the objective function is not as good as that resulting from the same method but using a reduced space governed by the dominant 8 POD modes. The best convergence improved using less POD modes, but the final solution is not as accurate. These results suggest that the performance of the reduced adjoint method strongly depends on the representativeness of the retained POD modes. The variability captured by the less dominant POD modes could contain important information even though they do not retain significant percentage of the system relative energy. As suggested by Vermeulen and Heemink [27], the selection of the POD modes could also be made according to the assimilated data set. A quantitative criteria can also be used when truncating the low-energy POD modes ([33]). The optimal value of l was again reduced for both cases but the reduced adjoint method that captured more relative energy had performed better since the POD modes pi that were not included in these cases describe model variance that is relevant to the observations. Finally, to study whether an accurate reduced dynamic operator can be built with fewer snapshots in E, two other assimilation experiments were performed in which 15 and 30 snapshots were collected from the same simulation of the original subsurface contaminant model starting from the first guess X 0 . The snapshots were collected with an equal interval of 20 and 10 time steps, respectively. A POD basis consisting of 9 and 8 dominant POD modes were respectively retained from each set of snapshots, capturing 99.999% of the relative energy. The reduced dynamic opere i was then estimated on the basis of these dominant POD ator M modes. Table 3 summarizes the value of the convergence criterion l and the final values of the normalized objective function Jfinal as J0 resulting from the proposed POD-reduced adjoint approach in both cases. The final values of l and normalized objective function with an ensemble of 30 snapshots were very close to those obtained with the ensemble of 60 snapshot vectors. The results from an ensemble of 15 snapshots were however quite different and the fiJ nal values of l and final resulting from this small set of snapshots J0 were not as good as those obtained from the ensembles with 30 and 60 snapshots. This suggests that the set of snapshots from

9

which the POD modes are computed should be large enough to well represent the variability of the system under study. Beyond that, including more snapshots do not bring any new information to the system. 6.3. Experiment II In this second set of experiment, we discretized the model equation on a 60  60 grid. The embedded small rock is positioned in the subdomain ½15; 15  ½45; 45 with a permeability of 20 millidarcy and the main rock has larger permeability of 100 millidarcy. The bottom boundary of the domain is impermeable while the top boundary is permeable and is subject to seasonal precipitation periods. The inflow rate is uniformly distributed on this boundary varying between 5  107 m3 /s in summer to 5  105 m3 /s in winter. The western and eastern boundaries are subject to constant water head distribution of 100 and 20 mwater respectively. The resulting spatial distribution of the water heads and the velocity field inside the domain in four different seasons is shown in Fig. 11. A reference solution (truth) was generated by inserting a contaminant plume close to the western boundary in the subdomain ½3; 4  ½7; 56. Initially, the domain had a concentration of 100 ppm in the plume and 0 ppm elsewhere. We model the region for 50 years with a time step of 3 months. following the flow of water, the contaminant moves towards the eastern boundary avoiding the low permeability layer as shown in Fig. 12. Pseudoobservations were generated from the simulated true concentrations and 220 observations were collected every fifth time step in the model domain as shown in Fig. 13. Random perturbations were added to the pseudo observations to include measurement noise. The observation error is simulated by adding randomly generated Gaussian noise with zero mean and a variance estimated at 10% of the system total variance. 6.3.1. Case I The first guess X 0 of the initial condition was again set as the average of the simulated state vectors from a long model run starting from the true initial condition. An ensemble E of 40 snapshots vectors were collected by running the subsurface contaminant model for 50 years starting from X 0 . The snapshots were collected with an equal interval of 5 model time steps. Using this ensemble E, a basis consisting of the 12 dominant POD modes were retained. These capture 99.999% of the relative energy. The reduced dynamic e i was then constructed on the basis of these 12 modes operator M and the objective function J was minimized with the proposed reduced adjoint method. The reduced adjoint method was again able to significantly improve the model data consistency as can be seen from the final value of J after optimization in Table 4. Overall, J was reduced by about four degrees of magnitude after only few minimization iterations. After the first minimization process was completed, another iteration of the reduced adjoint method was carried as described in Section 5. The updated initial conditions were therefore used again for a second optimization iteration after updating the POD basis and a new solution was obtained. The results summarized in Table 4 demonstrate that only two iterations were needed with the reduced adjoint method to obtain results comparable to those found with the adjoint method. This again resulted in important computational cost saving compared with the full adjoint method. The final estimate of the initial conditions is shown in Fig. 14, after the first and second iteration of the proposed reduced adjoint method and is compared to that of the adjoint method. The results clearly show that although advection was time varying and perturbations were imposed on the observations, the proposed algorithm per-

10

M.U. Altaf et al. / Comput. Methods Appl. Mech. Engrg. 254 (2013) 1–13

Fig. 11. Different seasonal water head configurations of the aquifer with the corresponding streamlines (from top: fall, winter, spring, summer).

11

M.U. Altaf et al. / Comput. Methods Appl. Mech. Engrg. 254 (2013) 1–13

Fig. 12. The reference (true) states of the time varying contaminant transport model.

220 observation locations

60

Domain Observation pts

55 50 45 40 35 30

Low Permeability Rock

25 20 High Permeability Rock

15 10 5

5

10

15

20

25

30

35

40

45

Fig. 13. Pseudo-observations collected from the porous medium.

50

55

60

12

M.U. Altaf et al. / Comput. Methods Appl. Mech. Engrg. 254 (2013) 1–13

Table 4 Optimum values of the objective function J final obtained for the minimization process with adjoint and reduced adjoint methods (case 2). Initial J Adjoint Reduced adjoint

Optimum J

Minimization iterations

1:02  106

7:87  102

97

Iteration 1

6

1:02  10

2:95  103

46

Iteration 2

2:95  103

6:90  102

21

duced adjoint method were obtained by carrying out two more iterations, which required recomputing the POD modes after each iteration. The final results were as good as the full adjoint method. More iterations increase of course the computational burden of the reduced adjoint method, but these are still less important than those required with the full adjoint method, and the complex coding of the adjoint is avoided. These results also suggest that it is very important to construct the POD modes from a set of snapshots that provide complete information about the system variability. When this is not the case, more iterations with the reduced adjoint method that provided a better representative set of snapshots after every iterations can be applied. As shown in our experiments, this greatly improves the performance of the reduced adjoint method and the final solution is comparable to that of the full adjoint method. 7. Conclusions

Fig. 14. Plot of the estimated initial states with the reduced adjoint method (top: 1st iteration, center: 2nd iteration) and the adjoint method (bottom) after the minimization cycle.

formed very well and was able to recover the initial condition within satisfactory range of uncertainty at low computational cost. 6.3.2. Case II In this experiment, the first guess of the initial conditions X 0 was obtained by introducing the contaminant at the bottom of the numerical domain. A background term was also introduced in the experiment to as a prior information in both adjoint and reduced adjoint methods. An ensemble E of 40 snapshots vectors were collected by running the subsurface contaminant model for the 50 years with these initial conditions X 0 . The snapshots now summarized information about the lower part of the domain. Using this ensemble E, a basis consisting of only 11 dominant POD modes were collected that captured 99.999% of the relative e i was obtained on the baenergy. The reduced dynamic operator M sis of these 11 modes. The reduced adjoint method was not able to improve the model data misfit by more than 30%. Although far more minimization cycles were required, the adjoint method successfully converged close to the true initial conditions. Improved results with the re-

The adjoint method is a powerful tool for variational data assimilation, but coding the adjoint model for the computation of gradients is a laborious task for large scale realistic applications. Moreover, even if the adjoint code is available, the adjoint method remains computationally quite demanding as one forward model integration and one backward adjoint model integrations are required for each iteration of the optimization process. The need for check-pointing techniques in large scale applications to avoid storing the whole model trajectory required for the integration of the adjoint model further increase computation burden. A POD-reduced adjoint approach is proposed in this study to tackle the difficulties of implementing the adjoint method for data assimilation with large scale complex models. The main idea behind this new method is to build a linear reduced adjoint model approximating the full adjoint of the nonlinear model based on the projection of the model state onto a set of POD modes. This would not only allow avoiding the complex implementation of the adjoint model, but is also computationally very efficient as compared to the full adjoint method. To determine the POD modes, an ensemble of forward model simulations, called snapshots, is first collected and only the dominant eigenvectors of the sample covariance matrix of this ensemble are retained to define the POD-reduced space. The reduced adjoint is then numerically constructed using a finite differences approach. The reduced adjoint state belongs to the reduced POD modes and its backward integration with the reduced adjoint requires negligible computing. Once the gradient is obtained in the reduced space, it is projected back into the full space where the minimization process is carried. The POD-reduced adjoint method was implemented to estimate the initial conditions of a 2D subsurface contaminant model. Several assimilation experiments under different setups and configurations were performed using both time invariant and time varying flow to study the behavior of the new method and compare its performances compared to the standard/full adjoint method, and the reduced model adjoint method. The later is similar to the reduced adjoint model, but uses the reduced model to evaluate the model fit with the data in the objective function. It therefore solves another optimization problem than the original problem and could therefore provide a final solution that is not the solution of the original variational problem. Results from the these numerical experiments suggest that the proposed method performs very well compared to the full and reduced model adjoint methods. It was able to accurately recover the initial conditions of the model in all experiments at very reasonable computation cost. Our results also show that it is very important to have a POD basis well representative of the variability of the studied system. In case these are not available, an iterative procedure was successfully introduced

M.U. Altaf et al. / Comput. Methods Appl. Mech. Engrg. 254 (2013) 1–13

and tested in which a new reduced adjoint model is constructed starting from the most recent estimate of the initial conditions. Only few iterations of the reduced adjoint method were required in our experiments to recover the initial conditions of the system starting from a bad initial guess. Even with these additional iterations, the reduced adjoint method was still computationally much more efficient than the adjoint method. The results of the preliminary test experiments performed here were very encouraging and suggest that the new reduced adjoint method can be used to efficiently solve the four-dimensional variational data assimilation problem. Further studies need to be conducted to study the proposed method with different and more complex models and setups, including more realistic higher resolution models and assimilation of real data sets. We will report results in this regard in the future. References [1] F.X. Le Dimet, O. Talagrand, Variational algorithms for analysis and assimilation of meteorological observations: theoretical aspects, Tellus 38 (1986) 97–110. [2] W.C. Thacker, R.B. Long, Fitting models to inadequate data by enforcing spatial and temporal smoothness, J. Geophys. Res. 93 (1988) 10655–10664. [3] J. Carrera, S.P. Neuman, Estimation of aquifer parameters under transient and steady state conditions, Part 1: Maximum likelihood method incorporating prior information, Water Resour. Res. 22 (1986) 199–210. [4] P. Courtier, O. Talagrand, Variational assimilation of meteorological observations with direct and adjoint shallow water equations, Tellus 42 (1990). [5] B.F. Farrel, A.M. Moore, An adjoint method for obtaining the most rapidly growing perturbation to ocean floors, J. Phys. Oceangr. 22 (1991) 338–349. [6] D. Stammer, C. Wunsch, I. Fukumori, J. Marchall, State estimation improves prospects for ocean research, EOS Trans. Am. Geophys. Union 83 (2002) 289. [7] A.T. Weaver, J. Vialard, D.L.T. Anderson, Three and four-dimensional variational assimilation with a general circulation model of the tropical pacific ocean. Part 1: Formulation, internal diagnostics and consistency checks, Mon. Wea. Rev. 131 (2003) 477–492. [8] I. Hoteit, B. Cornuelle, A. Kohl, D. Stammer, Treating strong adjoint sensitivities in tropical eddy-permitting variational data assimilation, Quart. J. Roy. Meteorol. Soc. 131 (2005) 3659–3682. [9] I. Hoteit, B. Cornuelle, P. Heimbach, An eddy-permitting, dynamically consistent hindcast of the tropical pacific in 2000 using and adjoint-based assimilation system, J. Geophys. Res. 115 (2010). [10] P.G.J. Ten-Brummelhuis, A.W. Heemink, H.F.P. van den Boogard, Identification of shallow sea models, Int. J. Numer. Methods Fluids 17 (1993) 637–665. [11] R.W. Lardner, A.H. Al-Rabeh, N. Gunay, Optimal estimation of parameters for a two dimensional hydrodynamical model of the arabian gulf, J. Geophys. Res. Oceans 98 (1993) 229–242.

13

[12] D.S. Ulman, R.E. Wilson, Model parameter estimation for data assimilation modeling: temporal and spatial variability of the bottom drag coefficient, J. Geophys. Res. Oceans 103 (1998) 5531–5549. [13] A.W. Heemink, E.E.A. Mouthaan, M.R.T. Roest, Inverse 3D shallow water flow modeling of the continental shelf, Continental Shelf Res. 22 (2002) 465–484. [14] A. Griewank, Achieving logarithmic growth of temporal and spatial complexity in reverse automatic differentiation, Optim. Methods Softw. 1 (1992) 35–54. [15] G.F. Corliss, C. Faure, A. Griewank, L. Hasco, Automatic Differentiation: From Simulation to Optimization, Springer Verlag, New York, 2001. [16] I. Hoteit, B. Cornuelle, V. Thierry, D. Stammer, Impact of resolution and optimized ECCO forcing in simulation of the tropical pacific, J. Atmos. Oceanic Technol. 25 (2008) 131–147. [17] T. Kaminski, R. Giering, M. Scholze, An example of an automatic differentiation-based modeling system, Lect. Notes Comput. Sci. 2668 (2003) 5–104. [18] I. Hoteit, A. Khol, Efficiency of reduced-order, time-dependent adjoint data assimilation approaches, J. Oceanogr. 62 (2006) 539–550. [19] Y. Cao, J. Zhu, I.M. Navon, Z. Luo, A reduced-order approach to fourdimensional variational data assimilation using proper orthogonal decomposition, Int. J. Numer. Methods Fluids 53 (2007) 1571–1583. [20] P. Courtier, J.N. Thepaut, A. Hollingsworth, A strategy for operational implementation of 4D-VAR, using an incremental approach, Quart. J. Roy. Meteorol. Soc. 120 (1994) 1367–1387. [21] A. Schiller, J. Willebrand, A technique for the determination of surface heat and freshwater fluxes from hydrographic observations, using an approximate adjoint ocean circulation model, J. Mar. Res. 53 (1995) 433–451. [22] A.S. Lawless, N.C. Nichols, C. Boess, A. Bunse-Gerstner, Using model reduction methods within incremental 4DVAR, Mon. Wea. Rev. 136 (2008) 1511–1522. [23] A.C. Antoulas, Approximation of Large-scale Dynamical Systems, SIAM, USA, 2005. [24] M.D. Gunzburger, Reduced-order modeling, data compression and the design of experiments, in: Second DOE Workshop on Multiscale Mathematics, Broomfield, Colorado. [25] D.N. Daescu, I.M. Navon, A dual weighted approach to order reduction in 4DVAR data assimilation, Mon. Wea. Rev. 136 (2008) 1026–1041. [26] F. Fang, C.C. Pain, I.M. Navon, D. Piggott, G.J. Gorman, P.E. Farrell, P.A. Allison, A.J.H. Goddard, A pod reduced-order 4D-VAR adaptive mesh ocean modelling approach, Int. J. Numer. Methods Fluids (2008). [27] P.T.M. Vermeulen, A.W. Heemink, Model-reduced variational data assimilation, Mon. Wea. Rev. 134 (2006) 2888–2899. [28] M.U. Altaf, M. Verlaan, A.W. Heemink, Efficient identification of uncertain parameters in a large scale tidal model of European continental shelf by proper orthogonal decomposition, Int. J. Numer. Methods Fluids 68 (2012) 422–450. [29] M.U. Altaf, M. Verlaan, A.W. Heemink, Simultaneous perturbation stochastic approximation for tidal models, Ocean Dynam. 61 (2011) 1093–1105. [30] M.P. Kaleta, R.G. Hanea, A.W. Heemink, J.D. Jansen, Model-reduced gradientbased history matching, Comput. Geosci. 15 (2011) 135–153. [31] R. Courant, D. Hilbert, Methods of Mathematical Physics, Wiley Interscience, New York, 1953. [32] J.M. Lewis, C.J. Derber, The use of adjoint equations to solve a variational adjustment problem with advective constraints, Tellus 37 (1985). [33] P.F. Lermusiaux, Estimation and study of mesoscale variability in the strait of sicily, Dynam. Atmos. Oceans 29 (1999) 255–303.