ADAPTIVE WAVELET MODELLING OF GEOPHYSICAL FLOWS ON THE SPHERE By MATTHIAS AECHTNER, B.Sc., M.Sc. ... ADAPTIVE WAVELET MODELLING OF GEOPHYSICAL FLOWS O...

0 downloads 0 Views 7MB Size

No documents

By MATTHIAS AECHTNER, B.Sc., M.Sc.

A Thesis Submitted to the school of Graduate Studies in Partial Fulfilment of the Requirements for the Degree Doctor of Philosophy

McMaster University c

Copyright by Matthias Aechtner, May 2014

DOCTOR OF PHILOSOPHY (2014) (Computational Science and Engineering)

TITLE:

McMaster University Hamilton, Ontario

ADAPTIVE WAVELET MODELLING OF GEOPHYSICAL FLOWS ON THE SPHERE AUTHOR: Matthias Aechtner, B.Sc. (FAU), M.Sc. (KTH) SUPERVISOR: Dr. Nicholas Kevlahan NUMBER OF PAGES: viii, 108

ii

Abstract In this thesis the development of a dynamically adaptive wavelet method for geophysical applications is described. Being targeted at geophysical applications, a discrete shallow water model is derived in a way to retain mimetic properties of the continuous setting. Based on an investigation of properties of second generation wavelets, wavelet transforms for the sphere are designed for height and non-separable velocity that provide conservation of mass and consistent advection of vorticity. The model has been implemented in Fortran95 with careful choice of data-structure and algorithms with the result that the computational cost per grid point of the adaptive method is only three times as large as for an optimized non-adaptive method. The Message Passing Interface (MPI) has been used to enable the model to run on 100 to 1000 of computer cores with generally high parallel efficiency (above 80%), but depending on the test-case. Standard tests by Williamson (1992) and a more recent test-case by Galewsky (2004) have verified numerical accuracy and convergence of the adaptive method. A simulation of homogeneous shallow water turbulence demonstrates that the model is capable of compression ratios of 20-50 even in a challenging setting. Finally the 2004 tsunami in the Indian ocean is computed as a real application to ocean simulation.

iii

Acknowledgements First and foremost I offer my sincerest gratitude to my supervisor, Dr. Nicholas Kevlahan, for his encouragement, guidance and advice throughout my doctoral research project and thesis. Further I am grateful to the members of my Supervisory Committee, Dr. Walter Craig, Dr. Bartek Protas and Dr. Stephen Tullis for their input and numerous helpful suggestions. I also would like to express my appreciation to Dr. Thomas Dubos and his colleagues at LMD, Polytechnique, for their help and pleasant collaboration. And I want to say thanks to the people behind [email protected] for their prompt and effective help. Last but not least, I would like to thank my family, and also Dmitry, Mattias and Vida for their support during the time of the studies.

iv

Contents 1 Introduction 1.1 Geophysics and adaptivity . . . . . . . . . . . . . . . . . . . . 1.2 Goal and outline . . . . . . . . . . . . . . . . . . . . . . . . . 2 Wavelets: Sparse representations 2.1 Sparse representations . . . . . . . . . 2.2 Fourier series representation . . . . . . 2.3 The Haar System . . . . . . . . . . . . 2.3.1 Step functions . . . . . . . . . . 2.3.2 The Haar System . . . . . . . . 2.4 Multi-resolution analysis . . . . . . . . 2.5 Wavelet basis . . . . . . . . . . . . . . 2.5.1 Orthogonal wavelets . . . . . . 2.5.2 Biorthogonal wavelets . . . . . 2.6 Discrete wavelet transform using filters 2.7 Second generation wavelets . . . . . . . 2.7.1 Motivation . . . . . . . . . . . . 2.7.2 Generalizations . . . . . . . . . 2.7.3 Lifting . . . . . . . . . . . . . . 2.8 Custom wavelet construction . . . . . . 2.9 Wavelet compression . . . . . . . . . .

1 1 3

. . . . . . . . . . . . . . . .

4 4 5 6 6 7 11 13 13 15 16 18 18 18 19 21 23

3 Multidimensional wavelets and adaptivity 3.1 Adaptive wavelet methods . . . . . . . . . . . . . . . . . . . . 3.2 Wavelets on the sphere . . . . . . . . . . . . . . . . . . . . . .

27 27 31

v

. . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . .

3.3

Butterfly basis . . . . . . . . . . . . . . . . . . . . . . . . . . .

33

4 The 4.1 4.2 4.3

shallow water model in geophysics The spherical shallow water model . . . . . . . . . . . . . . . Discrete shallow water models in geophysics . . . . . . . . . . Ringler and Thuburn computer model . . . . . . . . . . . . .

36 36 38 40

5 The 5.1 5.2 5.3 5.4 5.5

adaptive spherical shallow water model Conservative scalar transform . . . . . . . . Non-separable vector transforms . . . . . . . Model overview . . . . . . . . . . . . . . . . Computing the adapted trend . . . . . . . . Flux restriction . . . . . . . . . . . . . . . . 5.5.1 Basic restriction . . . . . . . . . . . . 5.5.2 Corrective part . . . . . . . . . . . . Grid Optimization . . . . . . . . . . . . . .

. . . . . . . .

46 47 51 53 57 59 59 62 64

. . . . .

65 65 68 69 70 71

. . . . . . . . .

74 74 76 76 80 82 83 87 87 88

5.6

6 High performance parallel 6.1 Hybrid data structures . 6.2 Masks and adaptation . 6.3 Pentagon points . . . . . 6.4 Parallel implementation 6.4.1 Parallel scaling .

implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

7 Model verification 7.1 Unit testing and mimetic properties . . 7.2 Shallow water test suite by Williamson 7.2.1 Test 1: Advection of cosine bell 7.2.2 Test 2: Stationary flow . . . . . 7.2.3 Test 6: Rossby–Haurwitz Wave 7.3 Lauritzen nonlinear advection test . . . 7.4 Unstable jet by Galewsky . . . . . . . 7.4.1 Unperturbed flow . . . . . . . . 7.4.2 Perturbed inviscid flow . . . . . vi

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . .

. . . . .

. . . . . . . . .

. . . . . . . .

. . . . .

. . . . . . . . .

. . . . . . . .

. . . . .

. . . . . . . . .

. . . . . . . .

. . . . .

. . . . . . . . .

. . . . . . . .

. . . . .

. . . . . . . . .

. . . . . . . .

. . . . .

. . . . . . . . .

. . . . . . . .

. . . . .

. . . . . . . . .

. . . . . . . .

. . . . .

. . . . . . . . .

. . . . . . . .

. . . . .

. . . . . . . . .

7.5

Comparison with other adaptive models . . . . . . . . . . . .

8 Rotational shallow water turbulence 8.1 Initial condition . . . . . . . . . . . . 8.2 Structure of solution . . . . . . . . . 8.3 Efficiency: Compression ratio . . . . 8.4 Energy and spectrum . . . . . . . . .

89

. . . .

92 92 92 94 95

9 Tsunami simulation 9.1 Penalization boundary conditions . . . . . . . . . . . . . . . . 9.2 Bathymetry and coastlines . . . . . . . . . . . . . . . . . . . . 9.3 2004 Indonesian tsunami . . . . . . . . . . . . . . . . . . . . .

98 98 100 102

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

10 Conclusions and outlook 106 10.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106 10.2 Outlook . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107

vii

List of Figures 2.1

A periodic test function (dashed red) is approximated by a truncated Fourier series with N = 16 (solid black). . . . . . . . . .

6

The test function (dashed red) is approximated by step-functions with different step-sizes (solid black). . . . . . . . . . . . . . .

8

2.3

The Haar functions . . . . . . . . . . . . . . . . . . . . . . . .

9

2.4

The Haar basis provides a non-linear approximation (solid black) to the test function (dashed red). Blue dotted lines indicate the linear approximation by a step-function on the coarser resolution. 10

2.5

Wavelets constructed by lifting

. . . . . . . . . . . . . . . . .

24

2.6

Increasing the number of vanishing moments of the wavelet N reduces the order of magnitude of the wavelet coefficients . . .

25

The test-funtion (dashed red) is quite accurately approximated by 35 wavelets with 4 vanishing moments (solid black). . . . .

26

Wavelet adaption in one dimension: wavelet coefficients above threshold correspond to large errors in interpolation and activate grid-points (type 2). Then grid-points adjacent in space (3) and scale (4) are added to allow for changes in the solution during the next time-step. . . . . . . . . . . . . . . . . . . . .

30

Nested spherical grid from refining icosahedron projected onto the sphere . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

32

3.3

Butterfly basis . . . . . . . . . . . . . . . . . . . . . . . . . . .

33

3.4

Adjacency masks . . . . . . . . . . . . . . . . . . . . . . . . .

33

3.5

Representation efficiency of butterfly basis . . . . . . . . . . .

34

3.6

Error convergence of wavelet basis . . . . . . . . . . . . . . . .

34

2.2

2.7

3.1

3.2

viii

4.1 4.2 4.3 4.4

4.5

5.1 5.2 5.3

5.4

5.5

5.6 5.7

Longitude-latitude grid: topologically regular quadrangular grid but singularity at poles. . . . . . . . . . . . . . . . . . . . . .

39

Cubed sphere grid: composed of six regular quadrangular grids, obtained by projecting cube onto the sphere. . . . . . . . . . .

39

Icosahedral grid: triangular grid from refining an icosahedron via edge-bisection. . . . . . . . . . . . . . . . . . . . . . . . .

39

Primal (blue) and dual element (dashed) of triangular-hexagonal C-grid. For irregular geometry the dual locations, z, are circumcentres of the triangles so that edges of dual and primal are always perpendicular to each other. . . . . . . . . . . . . . . .

41

Workflow of TRiSK model to compute the trends from the values of height and velocity through various intermedate quantities according to the discretized shallow water equations. . . .

45

Interpolation at fine edges, u1 and u2 , from the 13 coarse edges U−6 , . . . , U6 . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

52

Computing velocity at inner edges from velocity at outer edges by assuming constant vorticity over coarse triangle. . . . . . .

52

An edge with velocity wavelet coefficient above threshold (thick red) adds neighbours in space (dashed blue) and scale (green) to the active mask. . . . . . . . . . . . . . . . . . . . . . . . .

55

Height at the green fine level position m is interpolated from red coarse level cells 1,2,3 and four using weights s computed from the areas Ak,m . . . . . . . . . . . . . . . . . . . . . . . .

55

Computing the height trend in adaptive setting by restricting the fluxes and computing wavelet transforms to correct mass conservation between levels. . . . . . . . . . . . . . . . . . . .

58

Small overlap regions between hexagons at successive levels need to be accounted for when restricting the thickness flux. . . . .

60

Arrangement of fine and coarse scale height nodes used in the calculation of the corrective flux restriction through coarse edge kk2j . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

63

ix

6.1

Icosahedron net obtained from tearing apart the two times refined icosahedron from figure 3.2 (missing two blue nodes). It can be decomposed into ten elements of the type in the bottom right. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

66

Grid element in primal (left) and dual (right) grid interpretation with surrounding entities in gray. . . . . . . . . . . . . . . . .

67

Strong parallel efficiency scaling. Perfectly balanced (solid) and realistic turbulence test case (dashed). All numbers are active degrees of freedom (four times the number of height nodes); N is the total over all cores, in the graph are per core. . . . . . .

72

6.4

Weak parallel scaling on up to 640 cores for balanced test-case

73

7.1

Violation of mass conservation and generation of unphysical vorticity for a wave propagation test case are at the level of round-off error. . . . . . . . . . . . . . . . . . . . . . . . . . .

75

Grid after one rotation with cosine bell in the centre (εh = 4.2, Jmin = 4). The maximum level is determined by the adaptation routine. . . . . . . . . . . . . . . . . . . . . . . . . . . .

78

Case 1 adaptive grid Jmin = 5, Jmax = 8: The height after one rotation (left) and for a reduced range to investigate behaviour around 0 (right). . . . . . . . . . . . . . . . . . . . . . . . . .

79

Case 1 uniform grid J = 6: The height after one rotation for a reduced range. . . . . . . . . . . . . . . . . . . . . . . . . . . .

79

Case 1: For equal number of grid-points, the error over time is lower for the adaptive model (εh = 0.45) . . . . . . . . . . . .

80

Case 2: convergence of the normalized error in the height field after 15 days. . . . . . . . . . . . . . . . . . . . . . . . . . . .

81

Case 6: convergence of the normalized error in the height field after 12 days. . . . . . . . . . . . . . . . . . . . . . . . . . . .

81

7.8

Case 6 with ε = 0.05: height after 7 days . . . . . . . . . . . .

82

7.9

Case 6 with ε = 0.05: height after 14 days . . . . . . . . . . .

83

7.10 Lauritzen test-case with ε = 0.001: The grid tracks the solution and refines during the deformation at T /2. . . . . . . . . . . .

85

6.2 6.3

7.2

7.3

7.4 7.5 7.6 7.7

x

7.11 Error of Lauritzen test-case at final time: While the non-adaptive model (left) shows an error due a phase shift of the solution, the adaptation introduces some noise but the overall error is much lower for the same number of grid-points. . . . . . . . . . . . . 7.12 Convergence of Lauritzen’s nonlinear advection test: The adaptive model achieves lower errors for the same number of nodes. 7.13 Unperturbed zonal jet test case (Galewsky). For similar numbers of active nodes the adaptive wavelet method maintains consistently lower error than the non-adaptive TRiSK scheme. 7.14 Unperturbed zonal jet test case (Galewsky). Vorticity field at day 6. Numerical instabilities have deformed the zonal flow. . 7.15 Galeswky test case with tolerance ε = 5 × 10−3 and Jmax = 9. Height perturbation at 2, 4 and 6 hours (left) and relative vorticity at 4, 5 and 6 days (right). The solution of Jmax = 10 non-adaptive reference simulation is dashed, but the lines are mostly indistinguishable. . . . . . . . . . . . . . . . . . . . . . 8.1 8.2

8.3 8.4 8.5 8.6

9.1 9.2 9.3 9.4

86 86

88 89

90

Zonal flow initial condition to develop homogeneous turbulence. Inviscid shallow water turbulence after 132 hours. The grid (right) shows refinment where vortex filaments occur in the solution (left, as relative vorticity). . . . . . . . . . . . . . . . . Shallow water turbulence after 132 hours with a viscosity of ν = 104 kg m−1 s−1 . . . . . . . . . . . . . . . . . . . . . . . . . Compression ratios around 20. . . . . . . . . . . . . . . . . . . Constant CPU time per grid-point. . . . . . . . . . . . . . . . Behaviour of total energy minus the total energy at rest (right). Energy spectrum for the rotational part of the velocity averaged over the interval t = [132h − 136h] (left). . . . . . . . . . . . .

93

Smooth indicator function given analytically. . . . . . . . . . . Smooth χs computed from bathymetry using radial basis functions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Maximum wave height . . . . . . . . . . . . . . . . . . . . . . Time of first wave arrival . . . . . . . . . . . . . . . . . . . . .

100

xi

93 94 95 95

96

100 102 102

9.5 9.6 9.7 9.8

Tsunami after 70 minutes . . . . . . . . . . . . . . . . . . . . Tsunami wavefront at high resolution. The time and the colour scale for height are the same as in figure 9.5. . . . . . . . . . . Tsunami after 4 hours . . . . . . . . . . . . . . . . . . . . . . Tsunami after 16 hours . . . . . . . . . . . . . . . . . . . . . .

xii

103 104 104 105

Nomenclature

b F D d e E f g h k k K l q Rearth t u x z

topography flux of height domain length of triangle edge index at edge bisection total energy Coriolis parameter gravitational acceleration height primal index surface normal kinetic energy length of hexagon edge potential vorticity earth radius time velocity vector Cartesian coordinate index at triangle centre xiii

ω Ωearth η ε ν φ ψ γ λ

relative vorticity rate of earth rotation absolute vorticity threshold viscosity scaling function wavelet wavelet coefficient scaling coefficient

xiv

Chapter 1 Introduction 1.1

Geophysics and adaptivity

Geophysical sciences are the subject of more intense research investigation than ever before. The Intergovernmental Panel of Climate Change [Stocker et al., 2013] has reported evidence for human impact on global warming, and global climate models are developed by scientists all over the world who work to predict the future of our planet and better understand the processes, in order to aid policy-makers in their decisions. Equally important is the modelling of natural hazards, where accurate and fast prognosis can save thousands of lives. It was as recent as 2004 that the deadliest tsunami in history spread in the Indian Ocean. The limitations in simulating climate, natural hazards or weather on the computer are due to the underlying physics. Reaching from cyclones on synoptic scales of thousands of kilometres to storms on scales of kilometres or plenty of processes on even smaller microscales, the range is so wide that models that capture the largest scales are unable to resolve down to the smallest ones. As a result processes below the model resolution require further assumptions and approximations that introduce additional uncertainty and errors. The good news is that typically the small scales are present only locally with respect to the entire domain (e.g. the sphere). This leads to the idea of grid adaptation: variable grid spacings are used to resolve every part of the 1

Ph.D. Thesis - M. Aechtner

McMaster University - CSE

domain only as much as necessary. Earliest approaches to grids with variable resolution defined the grid a-priori (i.e. before the simulation is started) and kept it unchanged at all time [Krinner et al., 1997]. This requires beforehand knowledge about the solution, e.g. the resolution can be increased close to boundaries or based on prior simulations or experimental data. This strategy has severe limitations when fine scale structures evolve over time. In the remainder of the text we will concern ourselves with dynamically adaptive models, where the grid changes with time. This means the adapted grid is computed based on the current solution of the underlying equation. Choosing the criteria for refinement (and coarsening) of the grid is critical for the success of such an adaptive model. An example for a simple criterion is to adapt locally depending on the magnitude of the gradient of the solution. More sophisticated approaches exist. Using wavelets for the grid adaptation we can make a significant step towards actual control over the numerical error. Essentially, we can define a tolerance for the error and (with some additional assumptions) the dynamic adaptation will assure that the error tolerance is observed while using only the minimum necessary number of total grid points. (Alternatively, we can choose the number of grid points and the grid adaptation will distribute them so that the error is minimized in the norm of choice.) Wavelet based adaptivity has been applied successfully to many areas of physics and engineering (e.g. [Roussel and Schneider, 2010], [Kevlahan and Vasilyev, 2005]). This text will present the first application of wavelet based grid adaptation to geophysical modelling on the sphere. Despite some hesitations due to the increased complexity and the difficulty to find suitable refinement criteria, adaptive models could be the future of geophysical science. Even so good progress has been made modeling processes that are not resolved by the grid separately, we should be aware that processes on different scales interact deeply with each other and adaptivity provides the possibility to handle this interaction appropriately in one model. Recent progress towards adaptive climate models has been made with two adaptive spherical shallow water models [St-Cyr et al., 2008] and [L¨auter et al., 2007] that use an empirical refinement criteria. Further discussion of adaptivity in atmospheric modeling can be found in [Behrens, 2009]. 2

Ph.D. Thesis - M. Aechtner

1.2

McMaster University - CSE

Goal and outline

The goal of this work is to design a spherical shallow water model with wavelet based dynamical adaptivity that incorporates latest developments in dynamical cores (particularly [Ringler et al., 2010]) and implement it for efficiency and parallel execution; and then to validate and test it on real geophysical applications. We begin in chapter 2 by introducing wavelet basis that efficiently approximate functions in one dimension, since compressing a function will be the first step towards wavelet based adaptivity. Moreover, second generation wavelets are designed. Only they permit extension to irregular geometries, and will be used in chapter 3 to develop spherical wavelets. This chapter also explains how to use wavelets to dynamically adapt the grid of numerical methods to solve partial differential equations. Chapter 4 discusses the shallow water equations and how they are used in geophysical science to construct discrete computer models. The adaptive model that constitutes the core of this work is presented in two chapters: chapter 5 deals with the design of the numerical method and chapter 6 provides details about the implementation addressing efficiency and parallelization using the Message Passing Interface (MPI). Numerical accuracy and convergence is verified on standard test-cases in chapter 7. In chapter 8 homogeneous rotating shallow water turbulence brings us closer to geophysical application, and finally chapter 9 presents results of a simulation of the 2004 Indian Ocean tsunami. Chapter 10 concludes this work.

3

Chapter 2 Wavelets: Sparse representations Since the work presented in this thesis builds on wavelet analysis this chapter aims to provide an introduction to wavelets and their application to compress a function. Experiments , conducted by the author, to construct wavelets with good compression properties conclude this chapter.

2.1

Sparse representations

In this chapter we will look at approximate discrete representations of a continuous function f ∈ L2 (D) . For the time being we will restrict ourselves to a one-dimensional interval of length P , D = [0, P ], and specify f to be periodic at its boundaries. A physical interpretation of this example problem could be that f (t) is one period of a P -periodic signal with t representing time. We will be concerned particularly with the case that f is sparse in some basis, meaning for example that information contained in f is to some degree localized in space and also in frequency; or, at least, we expect f to have fine scales (i.e. high frequencies) appear only locally. This is typically the case in physical applications. Then we look for a basis which efficiently represents such a sparse f : since on a computer we can store only a finite amount of information, we look for a basis where a small amount of stored information 4

Ph.D. Thesis - M. Aechtner

McMaster University - CSE

gives a relatively accurate approximation of f by exploiting the sparsity of f . Starting form the well-known Fourier series and a simple basis, the Haar system, our search for such a basis will lead us to wavelets. Further details about wavelet theory that extend the presentation in this chapter can be found in [Walnut, 2004] which proceeds in an order vaguely similar to this text. The courageous and those already familiar with the essentials of wavelets are recommended [Mallat, 2009] a valuable book by an expert in the field.

2.2

Fourier series representation

A Fourier series represents a periodic function in term of trigonometric basis functions (letting P = 2π for the remainder of this section): ∞

A0 X + Ak cos(kt) + Bk sin(kt) f (t) = 2 k=1

(2.1)

The Fourier coefficients Ak and Bk are amplitudes of sine and cosine functions of different frequencies, thus these coefficients give information about frequencies apparent in f . They are defined as the integrals Z Z 1 1 Ak = f (t) cos(kt)dt and Bk = f (t) sin(kt)dt. (2.2) π D π D The Fourier series (2.1) constitutes an orthogonal basis with trigonometric basis functions cos(kt) and sin(kt). The infinite sum in (2.1) gives an exact representation of f , but needs an infinite number of coefficients. If, instead, we cut off after N terms, N

A0 X fN (t) = + Ak cos(kt) + Bk sin(kt), 2 k=1 we obtain an approximate representation of f using a finite number of coefficients. Figure 2.1 shows f16 (t) (black solid line) and the corresponding f (red dashed line). The truncated Fourier series has difficulties approximating the small scale features and the discontinuity correctly. The smooth features will 5

Ph.D. Thesis - M. Aechtner

McMaster University - CSE

1 0 −1

0

π

2π

Figure 2.1: A periodic test function (dashed red) is approximated by a truncated Fourier series with N = 16 (solid black).

actually be approximated very well if N is increased, but difficulties at the discontinuity remain. To compare to the bases in the following sections, it is important to note that the sinusoidal basis functions in (2.1) are supported on the whole interval, which makes them not well suited for locally appearing oscillations. From the integrals (2.2) we can see that a local perturbation in f will affect all Ak and Bk ; this stands in conflict with the goal to have many coefficients of small absolute value, so that neglecting them will introduce little error into the approximation.

2.3

The Haar System

This section will present another orthogonal basis, the Haar System [Haar, 1910], which now builds on basis functions with local support, in the hope to find a better representation for our sparse function f .

2.3.1

Step functions

As first step towards the Haar System we want to define step-functions like the black solid lines in figure 2.2 based on nested subdivisions. Beginning from one interval over the entire domain I0,0 = [0, P ] = D we subdivide intervals into two equal parts I1,0 = [0, P/2], I1,1 = [P/2, 0]. One repetition of subdivision yields the four intervals I2,k = [kP/4, (k +1)P/4], k = 6

Ph.D. Thesis - M. Aechtner

McMaster University - CSE

0, 1, 2, 3 and after a total of j repetitions we have Ij,k = [kP/2j , (k + 1)P/2j ],

k = 0 . . . 2j − 1

with k the location of the interval in space and j the level of resolution. A step-function fj on D is defined as a function that is constant on intervals Ij,k , k = 0 . . . 2j − 1. It can be represented by a linear combination of basis functions φj,k , j −1 2X fj = λj,k φj,k , k=0

where φj,k (t) =

( √

j

2

0

t ∈ Ij,k . otherwise

Defining φ(t) to be the indicator function ( 1 t∈D φ(t) = χD = , 0 otherwise

t∈R

(2.3)

allows the functions φj,k to be obtained as dilations and translations of φj,k : √ j φj,k (t) = 2 φ(2j t − kP ) (2.4)

√ j The factor 2 normalizes the basis functions so that kφj,k k2 = 1 where the R norm is defined as kf k2 ≡ D f (t)f (t)dt. The step-function on level j can be interpreted as an approximation to a general function f ∈ L2 (D). The difference kf − fj k decreases with increasing level of resolution j. This is also illustrated in figure 2.2. Compared to the Fourier series the discontinuity is approximated better; however, the small scale features have not improved for a similar number of coefficients, and the low frequency background is represented worse. Even though the basis functions themselves are now localized in space, the basis is not efficient in representing local features.

2.3.2

The Haar System

We define a function ψ ∈ L2 (R): 7

Ph.D. Thesis - M. Aechtner

McMaster University - CSE N = 2j = 16

j = 4, 1 0 −1

0

π

2π

N = 2j = 64

j = 6, 1 0 −1

0

π

2π

Figure 2.2: The test function (dashed red) is approximated by step-functions with different step-sizes (solid black).

√ 2/P 0 < t < P/2 √ ψ(t) = − 2/P P/2 < t < P 0 otherwise

(2.5)

from which we construct a family of functions by dilation and translation ψj,k (t) =

√ j 2 ψ(2j t − kP ),

t ∈ D,

k = 0 . . . 2j − 1 j = 0 . . . ∞

called the Haar functions. Figure 2.3 shows examples from this family for small j. Again the functions are normalized kψj,k k2 = 1.

The critical step towards the Haar System is to combine basis functions of type φj,k with Haar functions ψj,k on level j to represent a step-function of the finer resolution j + 1. The φj,k give a coarse resolution approximation and the ψj,k add the details missing between resolution j and j + 1. X k

λj+1,k φj+1,k =

X k

λj,k φj,k +

X k

8

γj,k ψj,k

(2.6)

Ph.D. Thesis - M. Aechtner

McMaster University - CSE

ψ2,0

ψ2,1

ψ2,2

0.5

0.5

0.5

0

0

0

−0.5

0

π

−0.5

2π

0

ψ0,0

π

2π

−0.5

0.5

0.5

0

0

0

0

π

−0.5

2π

0

π

π

2π

ψ1,1

ψ1,0

0.5

−0.5

0

2π

−0.5

0

π

2π

Figure 2.3: The Haar functions

Looking at one coarse interval Ij,k , which is subdivided at the fine resolution j + 1 into the intervals Ij+1,2k and Ij+1,2k+1 , we observe that only the following functions are non-zero on Ij,k : φj+1,2k and φj+1,2k+1 on level j + 1 and φj,k and ψj,k on level j. Hence the following relations must hold for the corresponding coefficients: Ij+1,2k

Ij+1,2k+1

λj+1,2k φj+1,2k = λj,k φj,k + γj,k ψj,k √ j−1 √ j √ j λj+1,2k 2 = λj,k 2 + γj,k 2 √ −1 λj+1,2k = 2 (λj,k + γj,k )

λj+1,2k+1 φj+1,2k+1 = λj,k φj,k + γj,k ψj,k √ j−1 √ j √ j λj+1,2k+1 2 = λj,k 2 + γj,k (− 2 ) √ −1 λj+1,2k+1 = 2 (λj,k − γj,k )

or solving for λj,k and γj,k λj,k = (λj+1,2k + λj+1,2k+1 ) γj,k = (λj+1,2k − λj+1,2k+1 ) Given a step-function on level j + 1, fj+1 , with coefficients λj+1,k this allows to compute coefficients λj,k that represent a function fj , which on every interval R R Ij,k is the average of the values of fj+1 on Ij,2k and Ij,2k+1 , and D fj+1 = D fj . The γj,k represent the change of fj+1 between two neighbouring subintervals and can be used to reconstruct fj+1 from fj using (2.6). 9

Ph.D. Thesis - M. Aechtner

McMaster University - CSE N = 25 + 3 = 35

1 0 −1

0

π

2π

Figure 2.4: The Haar basis provides a non-linear approximation (solid black) to the test function (dashed red). Blue dotted lines indicate the linear approximation by a step-function on the coarser resolution.

Using (2.6) recursively starting from level j = 0 a step-function fJ can be expressed by φ0,0 = 1 and a series of Haar-functions on different levels: fJ = λ0,0 φ0,0 +

J X X j=0

γj,k ψj,k

(2.7)

k

As J approaches infinity, the step-size of fJ tends to zero. The Haar System {φ0,0 } ∪ {{ψj,k }k=0,...,2j }j=0,...,∞ is an orthogonal basis on D. Like the Fourier series, infinitely many basis functions allow to represent a continuous function. Now we can approximate f by choosing a particular subset of {{ψj,k }k }j . Figure 2.4 illustrates an approximation of f (red dashes) by a Haar basis. The approximation (black line) has halved step sizes where the irregularities appear and the blue dotted line indicates how longer steps would have looked. If you subtract the dotted line from the solid line, you get a Haar function like figure 2.3: Haar functions have been added to locally increase resolution. We will discuss the selection of which ψj,k to include to obtain a good approximation for as few basis functions as possible in section 2.9. But first we will try to find better basis functions that avoid problems of the Haar systems, which has difficulties to approximate smooths functions due to the discontinuity of the step-function approximation. 10

Ph.D. Thesis - M. Aechtner

2.4

McMaster University - CSE

Multi-resolution analysis

This and the following section will generalize from the Haar system above to a wider class of basis representations: wavelet bases. To this end we first introduce Riesz bases as a generalization of orthogonal bases, before we come to the definition of a multi-resolution analysis on which we will build wavelet bases in section 2.5. Definition 2.1. A collection of linearly independent functions {θk }k is a Riesz basis on D if 1. {θk }k span L2 (D) 2. there exist finite positive A,B, such that X A||f ||2 ≤ hf, θk i2 ≤ B||f ||2 .

(2.8)

k

P For an orthogonal basis {θk }k holds the energy equivalent kf k2 = k hf, θk i2 ([Walnut, 2004] Theorem 2.57d). When we define the Riesz basis, we do not demand orthogonality any more and lose this property. Instead, we demand the frame condition (2.8) which is a looser criterion than the energy equivalent and guarantees stability of the basis. (See [Walnut, 2004] Remark 10.5 for more details.) In the following we will be dealing with translates of functions that (in contrast to function (2.3)) can overlap; and thus, we have to make sure that functions are periodic: if they are shifted so that they extend beyond the end of the interval [0, P ] they must re-enter the interval at 0. For translates of a function θ ∈ L2 (R) we define an operator C : L2 (R) → L2 ([0, P ]) Cθ(t) =

∞ X

θ(t + iP ).

i=−∞

This way, if a function θ with support only inside [0, P ] is shifted by s, then Cθ(t−s) produces a circular shift. In the following the support of θ will always be smaller then P so that the sum is a selection rather then actual sum since θ(t + iP ) is non-zero for only one i. Now we can define a multi-resolution analysis on an interval with periodic boundaries. 11

Ph.D. Thesis - M. Aechtner

McMaster University - CSE

Definition 2.2. A sequence Vj of subspaces of L2 (D) on an interval D = [0, P ] with j ∈ N0 (and constant c) is a multi-resolution analysis (MRA) iff 1. Vj ⊂ Vj+1 2. limj→∞ Vj is dense in L2 (D) 3. There exists a function φ(t) ∈ L2 (R) such that {Cφ(t − ck)|k ∈ K(0)} forms a Riesz basis for V0 . 4. f (t) ∈ Vj ⇔ f (t − 2−j ck) ∈ Vj

∀k ∈ K(j)

5. f (t) ∈ Vj ⇔ f (2t) ∈ Vj+1 As above j is the level of resolution, in the following sometimes just called level. Vj is the space containing all functions representable at level j. In case of the Haar system Vj is the space of all step-functions on level j and the first condition is clearly fulfilled since all step-function on level j are also step-functions on level j + 1. The function φ(t) in condition 3 is called scaling function. At the coarsest resolution, j = 0, the basis for V0 is a set of translates of φ. M(j) is an index set (for the Haar system M(j) = {0, . . . , 2j − 1}). From the scaling function on level 0 we construct scaling functions on all levels by dilation and translation analogously to (2.4) for the step function basis, φj,k (t) =

√

j 2 φ 2j t − k ,

(2.9)

and define the spaces Vj as Vj = span {φj,k |k ∈ K(j)} . Then for a given j the {φj,k |k ∈ K(j)} form a Riesz basis for Vj (and if the {φ(t − k)|k ∈ K(0)} were orthogonal, the {φj,k |k ∈ K(j)} form an orthogonal basis for Vj , which is the case for the Haar system). Constructing the Vj via (2.9) guarantees conditions 4 and 5 to be fulfilled. Condition 2 means that the error of the best approximation in Vj of a continuous function converges to zero as j goes to infinity. 12

Ph.D. Thesis - M. Aechtner

McMaster University - CSE

Since φj,k is in Vj (as one of its basis vectors) from condition 1 it is also in Vj+1 . So φj,k can be represented by a linear combination of {φj+1,k |k ∈ K(j + 1)} and we can write √ X hl−2k φj+1,l . (2.10) φj,k = 2 l∈K(j+1)

We will later get to back to this equation, called the refinement equation, and the coefficients hk . From here there are two possible directions: Requiring {φ(t − ck)|k ∈ K(0)} to be an orthonormal basis (which is a stronger requirement than a Riesz basis) leads to orthogonal wavelet analysis. Alternatively we can work with two MRA whose scaling functions form dual Riesz bases. This will result in the bi-orthogonal wavelets.

2.5

Wavelet basis

In this section we introduce wavelets, functions with zero mean and finite support. Together with the MRA we will use these wavelets to construct basis with the ability to efficiently represent sparse functions, that we have been looking for. The Haar system is a simple example of such a wavelet basis.

2.5.1

Orthogonal wavelets

In context of a MRA wavelets can be defined from the scaling function via √ X ψ(t) = 2 gk φ(2t − k). (2.11) k

Then a central result in wavelet analysis states (for prove see e.g. [Walnut, 2004]) that for an orthogonal MRA (i.e. where the basis in condition 3 is orthogonal), if wavelets are defined by (2.11) and hk and gk by gk = (−1)k h1−k ,

1 hk = √ hφ(t), φ(2t − k)i , 2

(2.12)

then the families of wavelets and scaling functions, {ψj,m |j ∈ {0 . . . J}|m ∈ M(j)} ∪ {φ0,k |k ∈ K(0)} , 13

(2.13)

Ph.D. Thesis - M. Aechtner

McMaster University - CSE

form an orthogonal basis. Here the ψj,k are again dilations and translations: ψj,k (t) =

√

j 2 ψ 2j t − k .

(2.14)

That means that given a scaling function φ, it is possible to compute the coefficients hk and gk and use them in (2.11) to compute a wavelet; and from this wavelet we can construct a new orthogonal basis, a wavelet basis which will be discussed in a moment, after applying the procedure to the Haar example: equations (2.12) compute ( hk =

√1 2

0

k ∈ {0, 1}

otherwise

gk =

√1 2 − √12

0

k=0 k=1

.

otherwise

Using these gk in (2.11), and comparing to (2.5) we can confirm that the Haar system indeed is an orthogonal wavelet basis. The basis (2.13) consists of scaling functions on level 0, φ0,k , that give a coarse approximations and wavelets φj,k that add details at different levels j and positions k. A finite J results in (2.13) to be an orthogonal basis for the subspace VJ . If J is replaced by infinity, we get a orthogonal basis for L2 (D). Additionally to the space Vj , which is an approximation on level j and spanned by scaling functions on this level (or as we now discovered, a combination of scaling functions and wavelet on lower levels), we define the space Wj which is the span of all the wavelets on level j: Wj = span {ψj,k |k ∈ M(j)}. This space, Wj , adds details from Vj+1 that are missing in Vj : Vj+1 = Vj ⊕ Wj With the orthogonal basis (2.13) we can represent f as f (t) =

X

λ0,k φ0,k +

∞ X X

γj,k ψj,k

(2.15)

λj,k = hf (t), φj,k i .

(2.16)

j=0 k∈M(j)

k∈K(0)

with the coefficients defined as γj,k = hf (t), ψj,k i , 14

Ph.D. Thesis - M. Aechtner

McMaster University - CSE

Computation of the coefficients by (2.16) is called the (orthogonal) forward wavelet transform, and reconstructing f from the coefficients via (2.15) the P inverse wavelet transform. If we limit j to replace f by fJ = k∈K(J) λJ,k φJ,k then the transforms mean computing the coefficients γj,k (and λ0,k ) from the scaling coefficients λJ,k and back, for which a fast algorithm exists that is explained in section 2.6.

2.5.2

Biorthogonal wavelets

If the basis in condition 3 of the MRA is not orthogonal but only a Riesz basis, the construction of a wavelet basis as in the previous section is not possible anymore. Instead we make n use o of the fact that for every Riesz basis {θk } there exists a dual Riesz basis θ˜k such that function f can be represented as E XD f, θ˜k θk . f= k

So every MRA has a dual MRA that replaces the basis in condition 3 by its dual basis. In the special case of an orthogonal basis the primal MRA and the dual MRA are identical. Primal basis and dual basis are interchangeable so P that f = k hf, θk i θ˜k would equally be possible. ˜ k by replacing φ by the dual φ˜ we define With hk defined as in (2.12) and h ˜ 1−k , gk = (−1)k h

g˜k = (−1)k h1−k ,

and the wavelet and dual wavelet are defined as in (2.11): √ X √ X ˜ = 2 ˜ − k) ψ(t) = 2 gk φ(2t − k), ψ(t) g˜k φ(2t k

(2.17)

k

from which two wavelet families are constructed as in (2.14). With the two dual basis we can state the biorthogonal wavelet transform consisting of the forward transform γj,k = hf (t), ψj,k i , λj,k = hf (t), φj,k i

(2.18)

and inverse transform f (t) =

X

λ0,k φ˜0,k +

∞ X X j=0 m∈M(j)

k∈K(0)

15

γj,m ψ˜j,m .

(2.19)

Ph.D. Thesis - M. Aechtner

McMaster University - CSE

We have chosen ψ˜ as the synthesis wavelet and ψ as the analysis wavelet. (This choice is inconsistent in literature where ψ˜ is often called the dual wavelet and sometimes used for synthesis and other times for analysis.) The biorthogonal wavelets allow more freedom in their design, which will be discussed section 2.8. They owe their name to the fact that elements in the primal basis are orthogonal to elements in the dual basis: D

2.6

E ˜ 0 0 ψk,j , ψk ,j = 0 if k 6= k 0 or j 6= j 0

Discrete wavelet transform using filters

Given the coefficients {λJ,k |k ∈ K(J)}, which correspond to a fine representation at level J, the structure of the MRA enables an efficient computation of the wavelet coefficients on all levels, {γj,m |j = 0, . . . , J − 1|m ∈ M(j)}, and scaling coefficients on the lowest level {λ0,k |k ∈ K(0)}. In the following only the biorthogonal setting will be considered, since the orthogonal case follows by simply setting φ = φ˜ (and accordingly for wavelet and filters). Combining definition (2.14) with equation (2.11), and then using (2.9) and l = k + k 0 we obtain ψj,k =

X k0

gk0 φ(2j+1 t − 2k − k 0 ) =

X

gl−2k φj+1,l .

(2.20)

l

Together with the refinement relation (2.10) and the forwards transform (2.18) this yields * λj,k =

f,

+ X

hl−2k φj+1,l

=

l

l

* γj,k =

f,

X

hl−2k hf, φj+1,l i =

X

gl−2k hf, φj+1,l i =

X

hl−2k λj+1,l

(2.21)

gl−2k λj+1,l .

(2.22)

l

+ X l

gl−2k φj+1,l

=

X l

l

This allows to compute coarse level coefficients from fine level scaling coeffi16

Ph.D. Thesis - M. Aechtner

McMaster University - CSE

cients. Conversely, we can also derive X X X λj+1,k φ˜j+1,k = γj,k ψ˜j,k + λj,k φ˜j,k k

k

k

=

X

=

X X

γj,k

X

k

l

g˜l−2k φ˜j+1,l +

X

l

λj,k

X

k

g˜l−2k γj,k +

k

X

˜ l−2k φ˜j+1,k h

!l

˜ l−2k λj,k h

φ˜j+1,k ,

k

where the first equation is related to Vj+1 = Vj ⊕ Wj (a consequence of the definition of MRA and wavelet [Walnut, 2004]); and the second step inserts the dual versions of (2.10) and (2.20). Matching coefficients allows to conclude X X ˜ l−2k λj,k , l ∈ K(j + 1). h (2.23) g˜l−2k γj,k + λj+1,l = k

k

The notation can be simplified by defining linear operators such that the forward step (2.21) becomes λl = Hj λj+1

and γj = Gj λj+1

(2.24)

and the reconstruction/backwards step (2.23) reads ˜ T λl + G ˜ T γj , λj+1 = H l j

(2.25)

where H and G are non-square matrices and λj and γj vectors of coefficients. Repeating (2.24) for j = J −1, . . . , 0 we can compute wavelet coefficients on all levels and scaling coefficients at level 0 from scaling coefficients at level J, the discrete forward transform; and repeatedly applying (2.25) we can recover the λJ from the wavelet coefficients, the discrete inverse transform. These transforms are referred to as fast wavelet transforms. They have an interpretation that connects to filter banks [Mallat, 2009]. The low pass filter Hj filters the low frequencies and the high pass filter Gj the high frequencies/details. The conditions for biorthogonality, which guarantee that f or fJ can be exactly reconstructed from the coefficients γj,k and λ0,k , are related to the following conditions on the filters: ˜ j H T = 1, H j

˜ j GT = 1, G j

˜ j H T = 0, G j 17

˜ j GT = 0, H j

(2.26)

Ph.D. Thesis - M. Aechtner

McMaster University - CSE ˜ j + GT ˜ HjT H j Gj = 1

(2.27)

We have now created a framework on which to build wavelets that form an efficient representation of a sparse signal, and fast algorithms to compute coefficients. Before we come to the actual designing of wavelets, we introduce second generation wavelets which we will use for the remainder of this text for reasons explained in the following.

2.7 2.7.1

Second generation wavelets Motivation

The wavelets presented above are subject to certain restrictions. While they have been extended to several dimensions for regular domains, it seems not possible to apply them to a spherical domain. Sweldens et al. [Sweldens, 1997] generalized above definitions and constructed second generation wavelets that allow for an even more efficient discrete transform and easy applications of wavelet to the following scenarios: • boundaries (non-periodic) • irregularly spaced grid-points • irregular multidimensional domains The last item means in particular that second generation wavelets can and have been applied to represent a function defined on the sphere [Schr¨oder and Sweldens, 1995]. In this section second generation wavelets are introduced by generalizing the definitions of the MRA and wavelet, and compared to “traditional” wavelets as presented so far. We will conclude by replacing the fast wavelet transform by the even more efficient lifting scheme.

2.7.2

Generalizations

Removing the conditions 4, 5 which are related to translation and dilation, the definition of the MRA is now less restricted. 18

Ph.D. Thesis - M. Aechtner

McMaster University - CSE

Definition 2.3. A sequence Vj of subspaces of L2 (D) with D ⊂ Rn and j ∈ N0 is a second generation multi-resolution analysis (MRA2) if 1. Vj ⊂ Vj+1 2. limj→∞ Vj is dense in L2 (D) 3. for each j there exists a set of functions {φj,k ∈ L2 (D)|k ∈ K(j)} that forms a Riesz basis for Vj . Condition 3 has be extended to all levels and dissociated from translation. The locations of the basis functions are nested in the sense that K(j) ⊂ K(j + 1). Accordingly, the definition of the wavelet is also adjusted. Definition 2.4. A set of functions {ψj,m |j ∈ N0 , m ∈ M(j)} with M(j) = K(j + 1) \ K(j) form a set of second generation wavelets if 1. Wj = span {ψj,m |m ∈ M(j)} is a complement of Vj in Vj+1 and Wj ⊥ V˜j . 2. {ψj,m |m ∈ M(j)} ∪ {ψ0,k |k ∈ K(0)} forms a Riesz basis for L2 (D). Generally speaking, certain properties that before were consequences of translation and dilation are now explicitly included in the definitions. While offering more flexibility, this generalized setting still maintains the important properties of wavelets to be localized in space in order to efficiently approximate functions and to allow fast computation of the coefficients.

2.7.3

Lifting

An important aspect of second generation wavelet analysis is the possibility to construct wavelet through lifting. Starting from a biorthogonal wavelet, the operators are modified in a way to improve properties of the wavelets while retaining biorthogonality. ˜ 0 , G0 , G ˜0 Theorem 2.7.1. [Sweldens, 1997]. Given a set of biorthogonal filters H 0 , H 19

Ph.D. Thesis - M. Aechtner

McMaster University - CSE

(i.e. filters that fulfill (2.26) and (2.27)), lifting Hj1 = Hj0 ˜1 = H ˜ 0 + Sj G ˜0 H j j j

(2.28)

G1j = G0j − SjT Hj0 ˜ 1j = G ˜ 0j G

(2.30)

(2.29) (2.31)

˜ 1 , G1 , G ˜ 1 which are biorthogonal, too. yields new filter operators H 1 , H Proof. Because of the biorthogonality, the conditions (2.26), (2.27) hold for the old filters. Then, with (2.28) - (2.31) we compute (dropping the index j and superscript 1): T T ˜ T˜ T ˜ ˜ ˜0 H H + G G = H0 H0 + S G0 + G0 − S T H0 G ˜ T HH ˜ T GG ˜ T GH ˜ T HG

T ˜ ˜ 0 + GT G ˜ ˜ = H0T H 0 0 + H0 S G0 − H0 S G0 = 1 ˜ old H T + S G ˜ old Hold T = 1 ˜0 + SG ˜0 HT = H = H old old ˜ old G0 − S T H0 T = G ˜ old GT − G ˜ old Hold T S = 1 =G old T ˜ old Hold =G =0 ˜0 + SG ˜ 0 G0 − S T H0 T = H

˜ old GT − H ˜ old Hold T S + S G ˜ old Gold T − S G ˜ old Hold T S = 0 =H old Hence we conclude that the conditions (2.26), (2.27) hold for the new filters, too, which makes them biorthogonal filters. The lifting (2.28) to (2.31) is combined with a second lifting step, effecting the dual wavelet (hence called the dual lifting). ˜2 = H ˜1 H j j

(2.32)

Hj2 = Hj1 + S˜j G1j ˜ j1 ˜ 2j = G ˜ 1j − S˜jT H G

(2.33)

G2j = G1j

The choice of S and S˜ will be discussed in section 2.8. 20

(2.34) (2.35)

Ph.D. Thesis - M. Aechtner

McMaster University - CSE

˜ 2 , G2 , G ˜ 2 are To carry out the discrete transforms, the new operators H 2 , H not actually computed. Instead in every step the old operators are applied, which are chosen very simple so that they typically have little to no (see lazy wavelet section 2.8) computational cost, λ0j = Hj0 λj+1 ,

γj0 = Hj0 γj+1

˜ Applying S and then followed up by applying the operators S and S. γj = γj0 − Sλ0j

(2.36)

is called the predict step based on the idea that S predicts the value of γj0 from λ0j , making λ0j the error in the prediction. Application of S˜ is referred to as the update step: ˜ j λj = λ0j + Sγ (2.37) It has been shown that first generation wavelets can be factorized into simple wavelets and lifting steps [Daubechies and Sweldens, 1998] leading to more efficient computations. I have confirmed the theoretical results from [Daubechies and Sweldens, 1998] through computer experiments for the orthogonal Daubechies wavelets.

2.8

Custom wavelet construction

A wavelets defined on the entire real line is considered to have d vanishing moments if it is orthogonal to polynomials of order smaller than d Z ψj,k (t)tn dt, n = 0, . . . , d − 1, (2.38) D

and smooth functions can be repretented more efficiently by wavelets with more vanishing moments [Mallat, 2009]. If locally the function is well represented by a polynomial of order d, the magnitude of the wavelet coefficient will be small. We begin the design of wavelets with defining the lazy wavelet: K(j) is chosen to be the even and M(j) the odd indices of K(j + 1), and Hj0 and G0j simply sub-sample Hj0 : λj,k = λj+1,k

if

G0j : γj,m = λj+1,m

k ∈ K(j), 21

if

m ∈ M(j).

Ph.D. Thesis - M. Aechtner

McMaster University - CSE

These filters fulfil the criteria for biorthogonality (2.26), (2.27), have zero computational cost and are thus a great starting point for lifting. Now we want to find the lifting operator S such that the number of vanishing moments for the analysis wavelet is increased. To this end we note that from equation (2.36) or (2.30) we can conclude for the wavelet X 0 1 = ψj,m − ψj,m sj,m,k φ0j,k , k

where Sj is now written out as coefficients. If we integrate this equation against the monomial tn , then because of (2.38) we require Z Z Z X 1 n 0 n sj,m,k φ0j,k (t)tn dt = 0 ψj,m (t)t dt = ψj,m (t)t dt − D

D

k

D

1 to have d vanishing moments. Since the for n = 0, . . . , d − 1 in order for ψj,m R 0 (t)dt = lazy wavelets are Dirac functions, they have the property that D f (t)ψj,m R 0 f (tm ) and D f (t)φj,k (t)dt = f (tk ) where tm and tk are the locations of ψj,m and φj,k respectively. Therefore we can write X tnm = sj,m,k tnk . k

From this equation we can determine the coefficients sj,m,k by choosing the number of non-zero coefficients to match the number of vanishing moments d and solving a linear system of equations. Like the predict operator S can be used to improve properties of the analysis wavelet, the update operator S˜ can be used for the synthesis wavelet. It is possible to use it to increase the number of vanishing moments of the synthesis wavelet, too [Mallat, 2009], [Claypoole and Baraniuk, 1998]; but it is more difficult than for the analysis wavelet. [Mallat, 2009] recommends to have an equal or similar number of vanishing moments for both wavelets and claims that this would improve the stability of the basis. The argument is that then the shapes of both wavelets are more similar and thus closer to an orthogonal basis. I have tried to verify this claim by computing the discrete wavelet transform for different wavelets, disturbing the wavelet coefficients, and measuring the dependence of the error after computing the inverse transform on the magnitude of the disturbance. Different test functions have been 22

Ph.D. Thesis - M. Aechtner

McMaster University - CSE

used: cusp singularity, a discontinuous step function and a Gaussian. The results of the experiments show no amplification of the disturbance and the same stability for bases with the same number of vanishing moments (2, 6 and 8 for both wavelets) and bases with differing number of vanishing moments (8 for the analysis and 0 or 2 for the synthesis wavelet) – against the claim in [Mallat, 2009]. In the remainder of the work the freedom from lifting the synthesis wavelet will be used to guarantee conservation between different levR R els: D fj = D fj 0 for any j and j 0 . In fact, for the wavelets constructed in this section, the conservation is equivalent to the synthesis wavelet having 1 vanishing moment. Figure 2.5 shows different wavelets that have been constructed by lifting the lazy wavelet basis. The wavelets and scaling functions from row 2 to 4 show the effect of increasing numbers of vanishing moments: the basis functions become smoother. On the first row results from lifting with different filter lengths for predict and update operator are shown. The size of the wavelet coefficients using the constructed wavelets is shown in figure 2.6. As expected, the magnitude of the coefficients decreases as the number of vanishing moments of the wavelet increases.

2.9

Wavelet compression

We saw that most wavelet coefficients are very small while some are large. If we choose a threshold ε and separate small and large wavelet coefficients Λε = {(j, m)| |γj,m | ≥ ε} ,

¯ ε = {(j, m)| |γj,m | < ε} Λ

then we can write (2.19) as fJ =

X

γj,m ψ˜j,m +

X

γj,m ψ˜j,m

¯ε j,m∈Λ

j,m∈Λε

X

λ0,k φ˜0,k .

k∈K(0)

Including only coefficients above certain threshold gives a non-linear approximation X X fJ,ε = γj,m ψj,m + λ0,k φ0,k j,m∈Λε

k∈K(0)

23

Ph.D. Thesis - M. Aechtner

McMaster University - CSE

wavelet magnitude wavelet magnitude

6

8

8

10 12

10 12

1 0 −1 2

4

6

8

10 12

1

0

0

−1

2

4

6

8

10 12

−1

1

1

0

0

−1

2

4

6

8

10 12

−1

1

1

0

0

−1

2

4

6

8

10 12

−1

1

1

1

0

0

0

−1

2

4

6 8 10 12 grid points

−1

2

4

6 8 10 12 grid points

−1

2

4

6

8

10 12

2

4

6

8

10 12

2

4

6

8

10 12

2

4

˜ =8 N =N

wavelet magnitude

4

6

1

˜ =4 N =N

2

2

4

synthesis scaling function

˜ =2 N =N

4 3 2 1 0 −1 −2

2

synthesis wavelet

˜ =2 N = 8, N

3 2 1 0 −1 −2

wavelet magnitude

analysis wavelet

Figure 2.5: Wavelets constructed by lifting

24

6 8 10 12 grid points

Ph.D. Thesis - M. Aechtner

frequency distribution

N =8

McMaster University - CSE

N =6 N =2

200 N =4

100

0 10−19

10−15 10−11 10−7 10−3 wavelet coefficients magnitude

Figure 2.6: Increasing the number of vanishing moments of the wavelet N reduces the order of magnitude of the wavelet coefficients

with error fJ,ε − fJ = −

X

γj,m ψj,m .

¯ε j,m∈Λ

For interpolating wavelets as discussed in section 2.8 and smooth functions, [Donoho, 1992] has shown that there exists a constant c such that the error is bounded by the threshold kfJ,ε − fJ k∞ ≤ cε. Note that here the supremem norm is used. Wavelets can be constructed using any norm as long as it is used consistently. In this chapter we have used the L2 -norm. Later, when we come to solving differential equations, we will use the supremum norm in order to select the coefficients so that the maximum error is minimized. Figure 2.7 shows the test-function used in previous sections (red dashes) compared to an approximation fJ=10,ε=0.05 (black line), in a wavelet basis, where wavelet have been lifted to have four vanishing moments. The theshold ε = 0.05 has been chosen so that 35 coefficients are included, |Λ0.05 | = 35. Compared to the Haar wavelet (figure 2.4) the smooth parts are now approximated better. The discontiuity poses difficulties for the smooth wavelets 25

Ph.D. Thesis - M. Aechtner

McMaster University - CSE

1 0 −1

0

π

2π

Figure 2.7: The test-funtion (dashed red) is quite accurately approximated by 35 wavelets with 4 vanishing moments (solid black).

(though less severe than for the Fourier series); however, since now the selection of basis functions has been made as to equally distribute the error, these difficulties do not show locally as a deficit in the approximation but cause an increased consumptions of coefficients at this location. The true strength of the lifted wavelets (compared to the Haar system, equally-spaced piece-wise polynomial approximations or Fourier series) will come to the fore when dealing with larger problem-sizes arising from requiring small tolerances and/or working in several dimensions (see section 3.2).

26

Chapter 3 Multidimensional wavelets and adaptivity This chapter continues the introduction to wavelets. It discusses how wavelets relate to grid adaptivity and how they can be extended to irregular geometries, particularly the sphere. This sets the basis for adaptive wavelet methods for spherical geometries. Towards the end of the chapter work of the author on spherical wavelets is presented, which includes to show that the butterfly wavelet basis does not have very good compression properties on the sphere and to propose a modification to achieve the high compression that can be obtain on the plane by restoring on the sphere the fourth order interpolation.

3.1

Adaptive wavelet methods

The approximations by selecting basis functions with their corresponding coefficient above threshold from section 2.9 can be considered adaptive, compared to non-adaptive approximations where all wavelet are included up to level J − 1 (or equivalently using only scaling functions on level J). All wavelets and scaling functions are associated with a location in space, and locally increasing resolution by adding wavelets where required in order to control the approximation error, corresponds to adapting the resolution to the behaviour 27

Ph.D. Thesis - M. Aechtner

McMaster University - CSE

of the test-function f . In the following we want to use this wavelet adaptation in order to adapt the computational grid while numerically solving a time-dependent hyperbolic differential equation ∂q ∂ 2 q ∂q = F (t, q, , 2 ) (3.1) ∂t ∂x ∂x that can be treated by an explicit numerical scheme. The initial condition q(t = 0, x) is assumed to be known. First, we let x refer to one spacial dimension restricted to an interval, x ∈ I = [0, P ] with periodic boundaries. Later we will proceed to more dimensions. Also note that while previously we were representing a time-periodic signal f (t) in a wavelet basis, from now on x will be the dimension of wavelet adaptation. Using wavelets to construct adaptive methods for solving hyperbolic equations dates back to 1990 [Liandrat and Tchamitchian, 1990]. According to [Vasilyev and Bowman, 2000] there are two different types: Wavelet-Galerkin methods (starting with [Liandrat and Tchamitchian, 1990]) that integrate the equation in wavelet space and Wavelet-Collocation methods (like [Vasilyev and Bowman, 2000] itself) that integrate in physical space (i.e. working with scaling coefficients) with the advantage that it is easier to deal with non-linearity in the equation. For us, working in physical space also has the advantage that the wavelet adaptation can be combined with existing methods for solving the differential equation. That is important when applying the method to specific areas where particular methods with application relevant properties are preferred. Since the aim of this work is to develop an adaptive model for geophysics, we will make use of this flexibility by integrating a geophysical model presented in section 4.3 within a wavelet based adaptive model. In order to find a numerical solution to the differential equation (3.1), we discretize in space x uniformly to obtain 2J grid-points and approximate derivatives by finite differences, and in time t using a time-integration scheme, e.g. of Runge-Kutta class. This would lead to a simple non-adaptive method. Now we want to construct an adaptive model. From here on we will work with the supremum norm; with the consequence that scaling functions are now normalized to have the same maximum value on each level and scaling coefficients are approximately the value of the function at their x-location. We 28

Ph.D. Thesis - M. Aechtner

McMaster University - CSE

begin by applying the initial condition to the grid-points at level J, and accept these values as approximations to the scaling coefficients. Then we carry out the following steps repeatedly until we have reached the final time (see also figure 3.1): 1. The fast forward wavelet transform computes wavelet coefficients and scaling coefficients on each level. (We do not delete or overwrite scaling coefficients during this. This will result in the solution at the current time to be available on all resolutions in form of the scaling functions.) 2. A mask is created that consists of all grid points that correspond to wavelet coefficient above the threshold ε 3. The mask is extended by all grid points that are adjacent to a grid point masked in the previous step. This allows the grid to adapt to a solution that is advected in space. 4. Additionally the mask is extended by all grid points adjacent to the nodes from step 2 on the next finer level (adjacency in scale). This accounts for the solution to developing finer scales. 5. All grid points on the coarsest level 0 are added. 6. The mask is further extended as to ensure that for every node in the mask all nodes (on the next coarser level) that are required to compute the associated wavelet coefficient are included in the mask as well. 7. At all points that are now contained in the mask, F in (3.1) is computed using finite differences. Neighbouring values required for the finite difference stencils can be obtained by interpolation using operator S. Then F can be used in the time-integration method to advance to the next time-step. This adaptation algorithm (steps 1-5) is also illustrated in figure 3.1. The smooth black line is the current solution (e.g. a wave propagating to the right). The small vertical lines are values of wavelet coefficients λj,k and the grey 29

Ph.D. Thesis - M. Aechtner

McMaster University - CSE

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 4 3 2 3 2 3 0 0 0 0 0 0

0

0

5

0

0

3

4

4

2

0

0

0

3

3

3

2

2

2

2

3

2

0

0

3

Figure 3.1: Wavelet adaption in one dimension: wavelet coefficients above threshold correspond to large errors in interpolation and activate grid-points (type 2). Then grid-points adjacent in space (3) and scale (4) are added to allow for changes in the solution during the next time-step.

30

Ph.D. Thesis - M. Aechtner

McMaster University - CSE

horizontal lines mark the threshold ε. One wavelet coefficient is coloured red and its computation is illustrated. Using a lifted wavelet with two vanishing moments, the predict operator S in (2.36) corresponds to linear interpolation. The distance of the two points supporting the interpolation is due to the level (J − 3) of the red coefficient. The scaling coefficients on lower levels having the same value as on the finest level (i.e. the function) means that for this example (for reasons of easier illustration) the synthesis wavelet has not been lifted S˜ = 0. For all coefficients that reach outside the grey lines, Λε , the corresponding grid point is activated (red 2). Then the other numbers correspond to the other steps in the adaptation algorithm.

3.2

Wavelets on the sphere

First generation wavelets can and have been used in two and three dimensions (e.g. [Roussel et al., 2003]). However, due to the dilation and translation property of the wavelets, they are limited to simple regular domains. Second generation wavelets do not have this restriction and have successfully been applied to represent a function defined on a spherical domain [Schr¨oder and Sweldens, 1995]. In this section these spherical wavelets will be introduced, as a basis for an adaptive model for spherical differential equations. As in the one-dimensional case, every index corresponds to a location and the levels are nested K(j + 1) = K(j) + M(j) (3.2) To construct a nested grid on the sphere we begin with an icosahedron whose 12 vertices constitute the positions of scaling coefficients at the lowest level K(0) (blue circles in figure 3.2). Then the wavelet coefficients M(0) (green squares) are located at the bisection of the edges projected onto the sphere. The positions of wavelets on finer levels are obtained by subdividing the triangular faces and repeating the edge bisection. Wavelets can be constructed by the same lifting procedure as in sections 2.7 to 2.8, just that the predict operator S is now interpolating in two dimensions and S˜ will be designed to conserve the integral over the sphere between different levels. 31

Ph.D. Thesis - M. Aechtner

McMaster University - CSE

M(1) M(0) K(0)

Figure 3.2: sphere

Nested spherical grid from refining icosahedron projected onto the

Spherical wavelets have been used to solve differential equations on a dynamically adaptive icosahedral grid [Mehra and Kevlahan, 2008], with a collocation method that follows the algorithm described in section 3.1, but on a spherical domain. Figure 3.4 shows how the masking of neighbours in space and scale is extended nested triangular grids (colors and numbers are the same as in figure 3.1 and the algorithm in section 3.1). Two different wavelet basis are used in both, [Schr¨oder and Sweldens, 1995] and [Mehra and Kevlahan, 2008]. For both basis the prediction S interpolates the value on grid-points belonging to M(j), marked M in figure 3.3, from neighbouring nodes of K(j). A linear basis can be obtained by taking the average of the values at E and W: sW = sE = 1/2 Using all the neighbours shown in figure 3.3 and because of the look of this figure called butterfly basis, the following choice of weights leads fourth order interpolation provided the triangles are equilateral: sW = sE = 1/2,

sN = sS = 1/8,

sSW = sSE = sNW = sNE = 1/16

I have implemented both wavelet bases in Fortran with MPI. The implementation includes adaptive data-structures that assure that the memory 32

Ph.D. Thesis - M. Aechtner

McMaster University - CSE

N

NW

NE

3

3 4

W

M

5

4

E

SW

2 4

3

4 4

5

4 3

SE

S

Figure 3.3: Butterfly basis

Figure 3.4: Adjacency masks

consumption depends only on grid points included in the masks and not on the number of grid levels. This way I have been able to use very fine resolutions of less then 10 meters on the earth corresponding to 20 and more refinements of the icosahedron (J ≥ 20) for a highly localized test function.

3.3

Butterfly basis

As mentioned above, the butterfly basis achieves its fourth order accuracy for the interpolation only on equilateral triangles. However the nested icosahedral grids have irregular geometry. In the limit of J → ∞ the average angle at valence-five points (vertices of the original icosahedron) must be 72 degree, while at the other valence-six points it must be 60 degree. Convergence experiments (blue markers on dashed line “bfly” in figure 3.6) show that in terms of the maximum error the butterfly interpolation is only first order accurate, and therewith worse than the linear basis where the maximum error shows second order convergence. In two dimensions an n-th order accurate interpolation operator can be constructed using n(n + 1)/2 (linearly independent) neighbours. Hence in general a fourth order operator would require 10 neighbours whereas the butterfly operator requires only 8. Including 2 additional neighbours means losing the nice 33

√ N ∼ 1/ ε N ∼ 1/ε linear bfly fix wgt bfly var wgt bfly refine

106

10

McMaster University - CSE

10−1 10−2

10−4

5

lin, max error lin, mean error bfly, max error bfly, mean error bfly2, max error bfly2, mean error bfly3, max error bfly3, mean error

10−5 10

10

2j

10−3 error

num. of coeff. above thresh.

Ph.D. Thesis - M. Aechtner

−6

4

10−7

10−6

10−5

10−4 threshold ε

10−3

10−2

Figure 3.5: Representation efficiency of butterfly basis

2

3

4

5 6 level j

4j

42j

7

8

9

Figure 3.6: Error convergence of wavelet basis

symmetry of the butterfly stencil, the maximum distance from neighbour to M increases and problems can arise when the topology changes around the 12 valence-five points. I have been able to construct an operator that uses only the 8 neighbours of the butterfly operator, is third-order accurate for non-equilateral triangles and converges to the butterfly operator for equilateral triangles (where it thus achieves the fourth-order accuracy) by solving a least square problem for each m in Mj: A two-dimensional version of a 8 × 6 Vandermonde matrix X with i-th row i h Xi = 1 xi yi x2i xi yi yi2 and each row corresponding to one neighbour i = E, W, S, N, SE, SW, NW, SW is constructed, where x, y are local coordinates from projecting each threedimensional coordinate x onto a plane tangential at M . Then the system X T s = XM is handed to a least square solver (e.g. from the Lapack library) to obtain the weights for interpolation s. (Mathematically the s constitute the non-zero elements in one row of the operator Sj , which is now different at every level j). The convergence of this variable-weight butterfly interpolation operator is 34

Ph.D. Thesis - M. Aechtner

McMaster University - CSE

shown in figure 3.6 under the name “bfly2”. Another strategy to improve the original butterfly operator has been suggested by [Dubos, 2012]: The constant weights are kept and instead grid-points locations are modified. Instead of bisecting edges, which is equivalent to using the linear interpolation to compute the coordinate, grid-points on the next finer level are obtained by using the weights of the butterfly operator. This leads to still second order accuracy for non-equilateral triangles and obviously also converges to the original butterfly operator when the triangles are equilateral (“bfly3” in figure 3.6). Figure 3.5 shows the impact of the accuracy of interpolation on the efficiency of the wavelet basis. When refining the icosahedron the number of non-equilateral triangle grows only as 2j , whereas the number of near equilateral triangles increases (like the total number of triangles and nodes) as 4j and accordingly the average quality of the triangle improves with j. This is the reason why for the interpolation a convergence in the supremum norm needs to be only at least half the order as for the 1-norm (i.e. the average) in order for the asymptotic behaviour of the efficiency not to decline.

35

Chapter 4 The shallow water model in geophysics Being equipped with tools for dynamical grid adaptation, this chapter will take a look at geophysical models so that we have everything we need to build an adaptive model for geophysical flows in chapter 5. In terms of mathematical model we will limit ourself to the shallow water equations, “one of the simplest useful models in geophysical fluid dynamics” [Vallis, 2006], which will be presented in section 4.1. Then sections 4.2 and 4.3 will provide an idea of discrete shallow water models with a focus on geophysical applications to climate and weather simulation. As a reference for more detailed presentations of the shallow water model in geophysics [Vallis, 2006] is recommanded.

4.1

The spherical shallow water model

In atmosphere and oceans, vertical accelerations are small compared to gravity. This justifies the hydrostatic assumption that the change of pressure p in the vertical direction r is balanced by the gravity ∂p = −ρg ∂r with gravitational acceleration g and density ρ (which is assumed constant). Then, a thin layer of height h of fluid can be described by the spherical rotating 36

Ph.D. Thesis - M. Aechtner

McMaster University - CSE

shallow water equations ∂h +∇·F=0 ∂t ∂u + (u · ∇)u + f k × u = −g∇(h + b) + µk. ∂t

(4.1) (4.2)

The velocity field u exists in R3 and is restricted to the sphere via the Lagrange multiplier µk. Vector k is normal to the sphere and of unit length. The Coriolis parameter f accounts for rotation, b includes bottom topography into the model, and we used the height flux F = hu. Using the identity |u|2 (u · ∇)u = (∇ × u) × u + ∇ 2 equation (4.2) can be stated in “vector invariant form” ∂u |u|2 + [(∇ × u) + f k] × u = −g∇(h + b) + µk − ∇ . ∂t 2 Defining potential vorticity q

ω+f h with (relative) vorticity ω = k · ∇ × u and kinetic energy K q=

K = |u|2 /2 the momentum equation can be written as ∂u + q hu⊥ = −g∇(h + b) − ∇K, ∂t

(4.3)

where u⊥ = k × u is the velocity perpendicular to u. By multiplying (4.3) from the left by k·∇× and ∇· we can obtain equations for absolute vorticity η = ω + f = qh and velocity divergence δ = ∇ · u: ∂η + ∇ · [ηu] = 0 ∂t ∂δ + ∇ · ηu⊥ = −g∇2 (h + b) − ∇2 K. ∂t

(4.4) (4.5)

The Helmholtz decomposition theorem states that vorticity η and divergence δ form a complete description of the underlying velocity vector: u is uniquely defined by η and δ. 37

Ph.D. Thesis - M. Aechtner

McMaster University - CSE

Equation (4.4) together with (4.1) yields the conservation of potential vorticity q: Dq = 0, Dt

with the material derivated defined as

Dq ∂q = + u · ∇q Dt ∂t

The equation for evolution of kinetic energy is ∂ (hK) + ∇ · (hKu) = hu · [−g∇(h + b)] ∂t and for potential energy 1 ∂ gh h+b = −g(h + b)∇ · (hu). ∂t 2 The equation for total energy E = hK + gh (h/2 + b) is obtained by adding equations for kinetic and potential energy ∂E + ∇ · (hKu) = −∇ · [g(h + b)hu] . ∂t Energy conservation states d dt

4.2

Z Edx = 0. D

Discrete shallow water models in geophysics

A wide variety of different discretizations for spherical differential equations exists. The icosahedron subdivision from figure 3.2 leads to a triangular grid, and by interchanging faces and nodes a dual grid consisting of hexagons (and 12 pentagons) can be obtained (figure 4.3 with the dotted lines showing the triangular grid). Additionally irregular triangular or hexagonal grids are possible. Traditionally longitude-latitude grids (figure 4.1) used to be popular in geophysical models. Their disadvantage is that singularities occur at the poles. Another choice is the cubed-sphere grid (figure 4.2) where a cube with a regular grid on each face is projected onto the sphere. For each of these grids different staggerings are possible. If all variables are located at the same positions the grid is called unstaggered, otherwise 38

Ph.D. Thesis - M. Aechtner

Figure 4.1: Longitude-latitude grid: topologically regular quadrangular grid but singularity at poles.

McMaster University - CSE

Figure 4.2: Cubed sphere grid: composed of six regular quadrangular grids, obtained by projecting cube onto the sphere.

Figure 4.3: Icosahedral grid: triangular grid from refining an icosahedron via edgebisection.

(if there is at least one variable that is located at a different position) the grid is staggered. An unstaggered grid using height and velocity as prognostic variables is called an Arakawa A-grid. This grid type has been found to lead to numerical modes that need to be removed by adding artificial viscosity (e.g. [Randall, 1994]). Examples for the Arakawa A type is the high order model MCore [Ullrich and Jablonowski, 2012] on a cubed sphere grid and the FIM model [Lee et al., 2008] using an icosahedral grid. The problem of the A-grid can be avoided by staggering the grid in a way that the velocity components are located at edges, leading to the C-grid and D-grid staggering. The C-grid works particularly well for quadrilateral grids (cubed sphere and longitude-latitude) which have exactly two times as many edges (for the two velocity components) as nodes (for the height), in contrast to the icosahedral grids where the 3:1 or 3:2 depending if height is at hexagon centre or triangle centre. The CAM-FV model [Neale et al., 2010] and ENDgame model, both for longitude-latitude grids, use the D-grid and C-grid staggering respectively. Still the triangular could be preferable because it does not possess problems at poles or edges of the projected cube (as the case for the cubed sphere). A triangular icosahedral C-grid will be used in section 4.3. The MPAS model [Skamarock et al., 2012] uses C-grids on unstructured/non39

Ph.D. Thesis - M. Aechtner

McMaster University - CSE

icosahedral triangulations. Another approach, named the Z-grid, is to use divergence and vorticity as prognostic variables instead of the two-dimensional velocity [Randall, 1994]. Even though the Z-grid is unstaggered it does not inhibit the numerical noise problem of the A-grid, and has the advantage that the evolution of potential vorticity in the discrete setting can be controlled (e.g. to enforce monotonicity). This comes at the cost that a system of equations needs to be solved in every time step (while the other grid types that use velocity allow for an explicit method). A shallow water model using the Z-grid is presented in [Heikes and Randall, 1995]. Instead of using finite differences/finite volumes, spectral transform models expand the solution over spherical harmonics and spectral element models over local basis functions (most often Legendre polynomials). This allows very high order methods but also causes some restrictions. An example spectral element model buiad on a cubed-sphere grid is CAM-SE [Neale et al., 2010].

4.3

Ringler and Thuburn computer model

In this section a shallow water model for an icosahedral grid with Arakawa C staggering [Ringler et al., 2010] will be presented. This model is designed to assure consistency with explicit advection of potential vorticity, thus incorporating a main advantage of Z-grid methods. We have chosen this model as a basis for the adaptive model presented in section 5, because of the benefits arising from the combination of a C-grid staggering and explicit advection of potential vorticity discussed later in this section. The icosahedral grid allows to use the spherical wavelet described in section 3.2 (with some modifications). Figure 4.4 shows two hexagonal elements of the grid with dual triangles. Corners of the primal mesh, xz , are circumcentres of dual mesh cells. Centres of the hexagons of the primal mesh, xk , are defined as the corners of the dual mesh. The edges of the primal and dual mesh are perpendicular to each other. The positions of the variables are C-grid staggered: height is defined as a cellaverage on the primal mesh; and velocity is defined at the bisection of the edges of the dual mesh, xe , as the projection along these dual edges, i.e. the 40

Ph.D. Thesis - M. Aechtner

McMaster University - CSE

k3 z2 e3

e2

e−1

e0 z1

e4

k1 e5

k2

e1 e−4

e6

e−2 e−3

Figure 4.4: Primal (blue) and dual element (dashed) of triangular-hexagonal Cgrid. For irregular geometry the dual locations, z, are circumcentres of the triangles so that edges of dual and primal are always perpendicular to each other.

velocity is normal to hexagon edges. (Note that this means that velocity is nowhere given as a full vector. If the velocity vector is needed it has to be reconstructed from the projected values.) Our goal will be to find discrete approximations to all spacial derivatives in equations (4.1) and (4.3) so that we can bring it into a form that is a twodimensional version of (3.1) and hand it to a method for numerical integration in time. Equation (4.2) does not contain a viscous term. While the model will be extended to include physical viscosity in chapter 8, the basic model presented here and applied in chapter 7 is inviscous. Artificial viscosity is not required because the TRiSK scheme is stable without dissipation, although small scales can be generated by high Reynolds number flows. These small scales can be tracked by the scheme, or removed by the adaptivity, depending on the threshold. First some geometric quantities need to be defined. Then the discrete operators are presented using the numbering in figure 4.4. As in figure 3.2 all geometry is on a sphere with radius a: coordinates x are always on the sphere, kxk = a, edges are spherical arcs connecting two coordinates and areas are spherical polygons. Edge bisection can be carried out in two ways that result in the same coordinate: bisecting the straight line segment between the arc endpoints and projecting onto the sphere, and dividing the arc into 41

Ph.D. Thesis - M. Aechtner

McMaster University - CSE

two parts of equal arc-length. (Note that the maximum occurring arc-lengths are limited by the arc-dimensions of the projected icosahedron.) Handling of the valence-5/pentagon cases is ignored here and discussed in section 6.3. The lengths of an edge of the primal (hexagon/pentagon) grid, l, and of an edge of the dual (triangular) grid, d, are defined as arc-lengths/spherical distances: d1 = de1 = arcdist (xk1 , xk2 ) l2 = le2 = arcdist (xd1 , xd2 ) The areas of hexagons Ak and triangles Az are defined as spherical areas. In figure 4.4, Ak1 is the area of the spherical hexagon enclosed by the solid blue line and Az1 the area of the spherical triangle enclosed by the dashed black lines. (The figure is to be understood to be at a fine enough resolution, so that the fact that edges are arcs is not visible.) Double indices for areas mean that the area corresponds to the intersection of two spherical polygons, e.g. Ak1 ,z1 is the area of the spherical quadrangle with vortices xk1 , xe1 , xz1 , xe2 . In the following the word spherical will be omitted and assumed to be implied for all geometry. The C-grid grid staggering allows for simple discretizations of the differential operators. In the limit of equilateral triangles, cell-average values form a second order approximation to values at the cell-centre (otherwise order one to two). Therewith the gradient in direction of the projected velocity ue at an edge location xe of a function that is defined on the primal grid cells (whether centre of average value), can be simply approximated from the values at the endpoints of the dual edge without decreasing the order of accuracy gradk→e :

(∇h)e1 =

1 (hk1 − hk2 ) . d1

With Gauss’ divergence theorem the divergence at a hexagon location of a vector field defined as the same projection as the velocity is obtained by integrating along the primal edges dive→k :

(∇ · u)k1

6 1 X = (−1)i uei li . Ak1 i=1

42

Ph.D. Thesis - M. Aechtner

McMaster University - CSE

Similarly vorticity, over a triangle or at the circumcentre xz is computed by a discrete curl operator that uses the Kelvin-Stokes theorem curle→k :

(k · ∇ × u)z1 = −

2 1 X u e di . Az1 i=0 i

While their definitions are given for the grid element in figure 4.4, the operators gradk→e , dive→k and curle→k are understood as operators acting on vectors consisting of all values at the corresponding locations. (Care has to be taken with the sign at locations other then the one illustrated.) Discretization of the height evolution (4.1) can be completed by computing fluxes along dual edges, Fe , using linear interpolation for the height from the endpoints to the edge bisection F e1 =

1 (hk1 + hk2 ) ue1 . 2

For equation (4.3) we need to compute the kinetic energy at hexagons, Kk , and the term u⊥ hq. Discretization of these two terms will be critical to achieve energy conservation and consistent advection of potential energy. In order to understand the behaviour of the potential energy we look at the evolution on the dual grid. Assuming height to be defined on triangles, hz , then the divergence operator in the height evolution could be discretized as 2 1 X ⊥ dive→z : (∇ · F)z1 = − F di . (4.6) Az1 i=0 ei The discrete divergence operator is the same as the curl operator with the difference that the fluxes need to be perpendicular (to the dual grid edges). A Z-grid method on a triangular grid would advect the absolute vorticity η = qh explicitly by discretizing the divergence term in equation (4.4) to yield 2 ∂ηz1 1 X ⊥ = F qz di . ∂t Az1 i=0 ei 1

This equation, which is consistent with the discrete height evolution, can also be obtained by applying the operator curle→k to the discrete velocity evolution and using the fact that curle→k (gradk→e (h)) = 0 due to cancellation. That 43

Ph.D. Thesis - M. Aechtner

McMaster University - CSE

means that the vorticity obtained from the edge velocity through curle→k , is consistent with hz . The key is now to make a connection between the evolution of hk and hz . To that end [Thuburn et al., 2009] derives two operators: M e→e for mapping fluxes F to fluxes on the dual grid F ⊥ M e→e :

6 X

⊥

F e1 =

we1 ,ei li Fei

i=−4,i6=1

with weights computed from areas, ( 1 Pn we1 ,en (−1)

en

=

Ak1 1 Ak1

i=2 Azi−1 ,k2 P0 i=n Azi+1 ,k2

−

−

1 2 1 2

n > 1 (left side) n < 1 (right side)

and an area weighted mapping Z k→z (there called I) for mapping height at hexagons hk to height at triangles hz Z k→z :

3 1 X hz1 = Az ,k hk Az1 i=1 1 i i

such that dive→z (M (F )) = Z(dive→k (F )).

(4.7)

Since −dive→k (F ) is the derivative of hk , −Z(dive→k (F )) and also −dive→z (M (F )) is equal to the derivative of hz . Therefore, (4.7) connects height evolution on the primal and dual grid in the sense that integrating hz explicitly as a prognostic variable is equivalent to integrating hk and computing hz from Z k→z ; and ηz computed from curle→k at any time is the same as integrating it explicitly using (4.4) with dive→z . The potential vorticity is then obtained as qz = ηz /hz , and since it is required at edge position qe = (qz + qz ). This scheme conserves enstrophy; however, the Fe ⊥ qe term can produce spurious kinetic energy. Defining Q⊥ = F⊥ q and changing the M e→e operator to 6 X qe + qe1 ⊥ (4.8) Qe1 = we1 ,ei li Fei i 2 i=−4,i6=1 produces a Q⊥ e that is energetically neutral, at the cost of not conserving enstrophy any more. 44

Ph.D. Thesis - M. Aechtner height hi

McMaster University - CSE

height he multiply

velocity ue height hi

height hv curl

div

Q⊥ e

pot. vorticity qe

pot. vorticity qv velocity ue

height trend dh

flux Fe

velocity trend du

vorticity ωv grad

height hi velocity ue

Bernoulli Bi kin. energy Ki

Figure 4.5: Workflow of TRiSK model to compute the trends from the values of height and velocity through various intermedate quantities according to the discretized shallow water equations.

Energy will be conserved [Ringler et al., 2010] if, besides choosing (4.8), the kinetic energy K is computed at hexagons as K=

6 1 X 2 u li di . 4Ak1 i=1 ei

The model in this section is introduced in [Ringler et al., 2010] and commonly referred to as TRiSK. Figure 4.5 summarizes the flow for computing the trend. In summary the model, has the following mimetic features (that the discrete model shares with the continuous model): • Mass is conserved by the finite volume technique. • Energy is conserved (provided (4.8) is chosen): the u⊥ hq term is energetically neutral and kinetic and potential energy are converted into each other without loss or generation of spurious energy. • Potential vorticity is advected consistently to the way height is advected. • The gradient produces no vorticity.

45

Chapter 5 The adaptive spherical shallow water model In this chapter the actual adaptive shallow water model will be introduced. Since the underlying equations (4.1), (4.3) involve two types of prognostic variables (vectors and scalars) we require two different wavelet transforms: the transform for the scalar height h is discussed in section 5.1 and the one for the vector-valued velocity u is presented in section 5.2. Then section 5.3 provides an overview of the model and the remainder of the chapter discusses the details, mainly dealing with challanges originating from the non-regular spherical geometry. The spherical model presented in this chapter bases on a model for planar geometry by [Dubos and Kevlahan, 2013]. The main challenge intruduced by the spherical geometry is to in dealing with the irregularity of the grid: especially triangles are no longer equilateral is in the planar case. This means genarilizations of the constuctions of operators and includes dealing with overlap areas between grid cells that did not exist for regular geometry. This chapter presents original work by the author in collaboration with the authors of [Dubos and Kevlahan, 2013]. 46

Ph.D. Thesis - M. Aechtner

5.1

McMaster University - CSE

Conservative scalar transform

Since the height is a scalar value on an icosahedral grid, in contrast to the velocity, the spherical wavelet basis from section 3.2 can in principle be used, though with some modifications. In particular, we have to account for the fact that height is now defined as cell averages (see section 4.3). We thus want to construct wavelets that conserve mass in a scenario where the integral over the domain is the sum of the average values times the area of each corresponding hexagon/pentagon. For this section we borrow the notation from section 2.6 that λj is a vector containing all λj,k , k ∈ K, and since the coefficients are the cell average values from section 4.3, we will use hj and hj,k to denote scaling coefficients for height. ˆ j,m . Then, The corresponding wavelets coefficients will be denoted with a hat h we write out the predict step (2.36) as X j ˆ j,m = hj+1,m − h sk,m hj+1,k (5.1) k∈K(j)

and the update step (2.37) as ˆ j,m . s˜jk,m h

X

hj,k = hj+1,k +

(5.2)

m∈M(j)

We also define a restriction operator for height, Rhj , that computes the height on level j from the values on level j + 1. Combining (5.1) and (5.2) its action is given as X j X j hj,k = hj+1,k + s˜km hj+1,m − sk0 m hj+1,k0 . (5.3) k0 ∈K(j)

m∈M(j)

Accordingly the fast transform for lifted wavelets the inverse operations of (5.2) and (5.1) are given as X j ˆ j,m hj+1,k = hj,k − s˜k,m h (5.4) m∈M(j)

ˆ j,m + hj+1,m = h

X k∈K(j)

47

sjk,m hj+1,k .

(5.5)

Ph.D. Thesis - M. Aechtner

McMaster University - CSE

The following theorem proposes lifting operators S and S˜ and proves that they lead to the desired wavelets: wavelets that assure mass conservation between levels and mass conservation when wavelets are set to zero during the adaptation step. Like in section 4.3 double indices for areas mean intersection areas. This time hexagons at different levels are intersected: Ak,m is the area of intersection between hexagon k on level j and hexagon m on level j + 1. Theorem 5.1.1. Given the weights sjkm

=

Aj+1 k,m Aj+1 m

s˜jkm

,

=

Aj+1 k,m

(5.6)

Ajk

1. Mass is conserved between levels: X

X

Ajk hj,k =

k∈K(j)

Aj+1 k hj+1,k

where hj = Rh hj+1

(5.7)

k∈K(j+1)

ˆ j,m , the total mass is not 2. For any perturbation in a wavelet coefficient, δ h affected: If ∃(j 0 , m0 ) such that ( ˆ j,m + δ h ˆ j,m (j, m) = (j 0 , m0 ) h ηˆj,m = ˆ j,m otherwise h η0,m = h0,m then X

X

Aj+1 k ηj+1,k =

k∈K(j+1)

Aj+1 k hj+1,k .

k∈K(j+1)

Proof. Define j : Rbasic

h∗j,k = hj+1,k +

X m∈M(j)

s˜jkm (hj+1,m − hj+1,k )

and j Rcorr :

hj,k = h∗j,k +

X m∈M(j)

s˜jkm hj+1,k − 48

X k0 ∈K(j)

sjk0 m hj+1,k0

Ph.D. Thesis - M. Aechtner

McMaster University - CSE

so that j j Rhj = Rbasic + Rcorr .

(5.8)

Intersection areas sum up as X

Aj+1 m =

Aj+1 km

(5.9)

k∈K(j)

Ajk

= Aj+1 + k

X

Aj+1 km .

(5.10)

m

1. Using the assumption (5.6) and quence of (5.9)) gives X

sjk0 m hj+1,k0 =

Aj+1 km

X Aj+1 0

k0

k0

km hj+1,k0 Aj+1 m

= hj+1,k −

=

= Aj+1 m −

P

k6=k0

Aj+1 k0 m (a conse-

X Aj+1 Aj+1 km k0 m 0 h + j+1 j+1,k j+1 hj+1,k Am A m 0 k 6=k

X Aj+1 k0 m 0 j+1 (hj+1,k − hj+1,k ) A m 0 k 6=k

from which follows j Rcorr :

X

hj,k = h∗j,k +

s˜jkm

X k0 6=k

m∈M(j)

sjk0 m (hj+1,k − hj+1,k0 ) .

(5.11)

When summing over k cancellation occurs because of the symmetry in P P k and k 0 and hence k hj,k = k h∗j,k P Insert (5.6) and multiply by Ajk and use Aj+1 = Ajk − m Aj+1 km (consek j quence of (5.10)), then the equation for Rbasic becomes X Ajk h∗j,k = Ajk hj+1,k + Aj+1 km (hj+1,m − hj+1,k ) m∈M(j)

X

= Aj+1 k hj+1,k +

Aj+1 km hj+1,m .

m∈M(j)

Summing over k with (5.9) and (3.2) leads to X X j X j+1 Ak hj+1,k + Ak h∗j,k = k∈K(j)

k∈K(j)

=

X

Aj+1 km hj+1,k

k∈K(j) m∈M(j)

Aj+1 k hj+1,k +

k∈K(j)

=

X

X

X m∈M(j)

Aj+1 k hj+1,k .

k∈K(j+1)

49

Aj+1 m hj+1,k

Ph.D. Thesis - M. Aechtner

McMaster University - CSE

2. Equations (5.4) and (5.5) for the perturbed coefficient ηˆj 0 ,m0 with j = j 0 read X j ηj+1,k = hj,k − s˜km ηˆj,m m

! = hj,k −

ˆ j,m0 + s˜jkm0 δ h

X

ˆ j,m s˜jkm h

m

ˆ j,m + δ h ˆ j,m + ηj+1,m = h

X

sjkm ηj+1,k ,

k

where k, m are any index that is affected by m0 . We subtract unperturbed from perturbed transformation step for scaling function positions ˆ j,m0 ηj+1,k − hj+1,k = −˜ sjkm0 δ h and wavelet function positions (this time integrated) X X Aj+1 Aj+1 η − j+1,m m hj+1,m m m

m

ˆ = Aj+1 m0 δ hj,m0 +

X

Aj+1 m

X

m

=

ˆ Aj+1 m0 δ hj,m0

−

k

X

Aj+1 m

sjkm (ηj+1,k − hj+1,k )

j+1 X Aj+1 k,m Ak,m0

m

Aj+1 Ajk m

k

ˆ j,m0 . δh

In the following the proof is completed by using first (3.2) then the last two equations, followed by rearrangement, then applying (5.10) and finally (5.9) X X Aj+1 η − Aj+1 j+1,k k k hj+1,k = k∈K(j+1)

X m

Aj+1 m ηj+1,m

+

X k

ˆ Aj+1 m0 δ hj,m0 −

Aj+1 k ηj+1,k

−

k∈K(j+1)

X

Aj+1 m hj+1,m

m j+1 X X j+1 Ak,m0 ˆ j,m0 Ak,m j δ h A k m k

" ˆ j,m0 Aj+1 δh m0 −

X k

+

X

Aj+1 k hj+1,k =

k

+

Aj+1 k,m0 Ajk

X

Aj+1 k

Aj+1 k,m0

k

Ajk

!# X

j+1 Aj+1 k,m + Ak

=

m

#

" ˆ j,m0 Aj+1 δh m0 −

50

ˆ j,m0 = δh

X k

Aj+1 k,m0 = 0.

Ph.D. Thesis - M. Aechtner

McMaster University - CSE

In the proof the height restriction is split into basic restriction and correction. Basic restriction yields the same total mass on the coarse level as on the fine level. The correction only redistributes mass between neighbours and integrates to zero.

5.2

Non-separable vector transforms

On every level the velocity is represented as the edge-projection described in section 4.3. This requires a non-separable wavelet transform. In this section an interpolation based wavelet transform for vector valued functions is presented. The goal is to have a second order accurate interpolation for the prediction operator in order to match the accuracy of the discrete differential operators and the height transformation which are first to second order accurate. While the pressure restriction has been designed to conserve mass, the restriction for the velocity will be designed to retain the property of the discrete differential operators that the gradient of the curl is zero. The wavelet transform in this section is a generalization of [Dubos and Kevlahan, 2013] to geometry with non-equilateral triangles. The interpolation distinguishes between outer edges that have been created by subdividing an edge (all edges ui , vi , wi with i > 0 in figures 5.1 and 5.2) and inner edges that result from subdividing the triangles (edges u0 , v0 , w0 ). Steps in the wavelet transform that require interpolation, namely the predict step and its inverse, are first computed for all outer edges so that the computation at inner edges has the interpolated values at outer edges available. Looking at figure 5.1 fine level values at outer edges, u1 , u2 , are computed by interpolating from the 13 values at coarse grid (capital letters). In the following an interpolation I + from the velocity at the solid lines (U−2 to U6 ) will be defined. Then the final interpolation that is used in the adaptive model takes the average of this interpolation and a similar interpolation I − using the values at the locations symmetric with respect to U0 (U−6 to U2 ).

In order for interpolation I + to be second order accurate we reconstruct a 51

Ph.D. Thesis - M. Aechtner

U4 U3

McMaster University - CSE

U5 U1

U2

u2

v2 ω0 U1

u1

w0

v1 U−3

U−2 U−1

ω3

U−4

w2

v0

ω2

ω1

u2 U−5

U2

u0

U0 U−6

w1

U6

u1

U0

Figure 5.2: Computing velocity at inner edges from velocity at outer edges by assuming constant vorticity over coarse triangle.

Figure 5.1: Interpolation at fine edges, u1 and u2 , from the 13 coarse edges U−6 , . . . , U6 .

linear function u v

! =

a11 a21

!

a12 a22

+

! x+

a13 a23

! y

on a plane tangential to the sphere at the location of U0 onto which the coordinates of all involved velocities are projected. Directions are defined within a local coordinate system oriented at the direction of U0 (which lies inside this plane). The six unknowns aik require six equations. The following choice avoids singularities from linear dependencies in the system: From the Helmholtz decomposition, the three velocities U0 , U1 , U2 represent the rotational and the sums (U4 + (−U5 )), (U4 + (−U5 )), (U4 + (−U5 )) the irrotational component. Values at the inner edges can be obtained from values at the outer edges by assuming that vorticity inside a coarse level triangle is constant, meaning that the vorticity on the four fine level triangles, the ωz in figure 5.2, are equal to each other and equal to the coarse level vorticity Ω: ω0 = ω1 = ω2 = ω3 = Ω For example, w0 dw0 = ω2 Aω2 − u2 du2 − v1 dv1 52

Ph.D. Thesis - M. Aechtner

McMaster University - CSE

and thus the inner velocity w0 can be computed as w0 =

1 (ΩAω2 − u2 du2 − v1 dv1 ) . dw0

The restriction operator Ru is defined as Ru :

U0 =

1 (u1 + u2 ) 2

due to the bisection, dU0 = 2du2 = 2du1 , and the restriction Ru implies U0 dU0 = u1 du2 + u2 du1 . With the vorticity wz understood as cell average, the integral over the coarse level is the same on the coarse and the fine level AΩ Ω = U0 dU0 + U1 dU1 + U2 dU2 = u1 du1 + u2 du2 + v1 dv1 + v2 dv2 + w1 dw1 + w2 dw2 = Aω1 ω1 + Aω2 ω2 + Aω3 ω3 so the velocity restriction Ru conserves vorticity. The forward wavelet transform is then computed as follows: First velocity scaling coefficients on level j are computed by restricting the velocities from level j + 1 using Ru . Then the wavelet coefficients are computed as the difference between the velocities on the fine level and the interpolated values. Note that this is a wavelet transform in a more general sense that still aheres to the concept of restriction and computing wavelet coefficients as interpolation errors but does not use nested grids.

5.3

Model overview

The algorithm for computing the solution to the shallow water equations (4.1), (4.3) builds on the grid adaptation algorithm from section 3.1 with the following modifications to the individual steps: 1. The wavelet coefficients for height and velocity are computed by the interpolating wavelet transforms described in sections 5.1 and 5.2, respectively. 53

Ph.D. Thesis - M. Aechtner

McMaster University - CSE

2. The mask of wavelet coefficients above threshold is now created using different thresholds for height, εh , and velocity, εu . These two thresholds are computed from a set threshold ε so that ε controls the error in the trend, and thus the error in the solution. Analysis of the shallow water system show [Dubos and Kevlahan, 2013] that for the quasi-geostrophic regime, i.e. when Ro = U/f L 1 then εu = f U L Ro ε3/2 ,

εh = U Ro ε3/2 ,

and in the inertia-gravity regime, T ∼ L/c, εu = c U ε3/2 ,

εh = U ε3/2 ,

√ are suitable choices for the two thresholds, where c = gH is the wavespeed, and characteristic values for vertical length, L, height, H, Coriolis parameter, f , velocity, U , and time, T , have to be determined. 3. For the height, the adjacent zone masking in figure 3.4 can be used. For velocity the adjacency masking is depicted in figure 5.3: the thick red edge adds neighbours in space (long blue dashed lines) and scale (short green lines), and the dotted gray edges are added because outer edges must always come in pairs (when refining a coarse level edge both new fine level edges need to be present for consistency). Additionally there is a cross-masking step where height points are added if all adjacent velocity edges are already present and edges are added if both endpoints are already in the mask. 4. (see 3. above) 5. The coarsest level will no longer be 0. Instead we have a coarsest level Jmin that corresponds to Jmin refinements of the icosahedron so that the number of height locations on Jmin is Nh = 10 · 4Jmin + 2. The finest level will be called Jmax . All grid-points on Jmin are added to the masks. 6. In order to interpolate height at point m in figure 5.4 by the predict operator, values at the hexagons 1,2,3 and 4 are required. Thus the mask is extended to include every point that is required to interpolate 54

Ph.D. Thesis - M. Aechtner

McMaster University - CSE 1

A1,m A3,m m

A2,m

2

3

A4,m

4

Figure 5.3: An edge with velocity wavelet coefficient above threshold (thick red) adds neighbours in space (dashed blue) and scale (green) to the active mask.

Figure 5.4: Height at the green fine level position m is interpolated from red coarse level cells 1,2,3 and four using weights s computed from the areas Ak,m .

height at a point that is already in the mask. Similarly, edges are added that are required for interpolating velocity according to the procedure described in section 5.2.

7. Naively applying the differential operators would lead to losing the mimetic properties of the TRiSK scheme described in section 4.3. Instead section 5.4 will discuss how to compute the trend in the solution while retaining important properties of the continuous system.

Given an algorithm for grid adaptation and discretization of the differential operators, a numerical solution to the shallow water equations is computed by integrating in time via an explicit four-step third-order stability conserving Runge Kutta method (RK34) [Spiteri and Ruuth, 2002] which has the corre55

Ph.D. Thesis - M. Aechtner

McMaster University - CSE

sponding Butcher tableau: 1 2

1 1 2

1 2 1 2 1 6 1 6

0 0 0 1 0 0 2 1 1 0 6 6 1 6

1 6

1 2

Optionally the model supports fourth, second and first order Runge Kutta schemes. The choice of this method stands in the context of the CourantFriedrichs-Lewy (CFL) condition. In order to produce a stable solution, an explicit method for solving the hyperbolic shallow water equations require a restriction on the time step ∆t such that ∆t < C min d/ max |u| where C depends on the time integration scheme and is typically C = 1. However the RK34 method allows for C = 2. The first order Euler method is not suitable since it cannot even match the order of the space discretization. Comparing second order methods (e.g. Heun or Modified Euler) to RK34 we observe that the required amount of computation is the same: the second order methods with C = 1 need twice as many steps whereas RK34 requires twice as much computations per step. In conclusion RK34 provides an increased order of accuracy at no extra cost. Additionally the grid adaptation is carried out after each time step, and thus RK34 needs only have as many adaptation steps. Multi-step methods like the Leap-frog scheme, are less interesting for the adaptive method since the fact that the grid will be different at each previous time step introduces additional complications. For the adaptive shallow water model a viscous term is added to the equations (4.1) and (4.3). This additional term can be deactivated by setting viscosity ν to zero. The equations then become: ∂h = −∇ · F ∂t ∂u = −q hu⊥ − g∇(h + b) − ∇K − ν∇2 u ∂t

(5.12)

On the staggered grid the extra term is discretized by diffusing the rotating and the divergent part of the velocity separately: ν∇2 u

e

= ν (gradk→e dive→k ue − gradz→e curle→k ue ) 56

Ph.D. Thesis - M. Aechtner

McMaster University - CSE

With out the viscous term the adaptive method, unlike the TRiSK scheme, still provides some implicit numerical diffusion. This is because neglecting small wavelet coefficient during grid compression dissipates energy on the locally smallest scales. The dissipation introduced this way is controlled by the threshold.

5.4

Computing the adapted trend

In this section we discuss how the trend can be computed on the adapted grid while retaining the mimetic properties if the TRiSK scheme (section 4.3). In this scheme mass is conserved by the discrete divergence operator because of cancellation: every flux is added to the adjacent cell with alternating sign. Similarly the property that the curl of the gradient is zero is retained in the discrete setting is due to cancellation: looking at one triangle the curl is computed by summing the integrated velocities at the edges while the integrated velocities are the differences of the endpoints, and hence every triangle vortex contributes to the curl two times with alternating sign. In the adaptive method we compute on several levels of different resolution and want the levels to be connected by the wavelet transform such that the values on level j are the restriction of level j + 1. This can be accomplished naively by computing the prognostic values on level j by the restriction operators wherever values are available on level j + 1. However, if we look at level j, this will break the effect of the cancellation: In case of the divergence, one of two adjacent cells could be computed by summing the integrated fluxes and the other from restriction; and for the curl the integrated velocities on the triangle sides can be a mixture of restricted velocities and velocities computed from the gradient on level j, which then leads to using different values at the vortices. As a result and mass conservation or the zero curl of the gradient is lost on level j. The remedy to this situation is not to restrict the prognostic variables (or their trend) but to restrict the fluxes Fe and Bernoulli function values Bk = hk + Kk . Then divergence and gradient is always computed by the TRiSK operators which guarantee the mimetic properties. This however re57

Ph.D. Thesis - M. Aechtner

hj+1 , uj+1 e i Rh

Fej+1

McMaster University - CSE

div

j+1

RF

hji , uje

ˆ j+1 dh i

FWT

˜ i j+1 dh

IWT

dhj+1 i

Rh

Fej

div

j

dhji

Figure 5.5: Computing the height trend in adaptive setting by restricting the fluxes and computing wavelet transforms to correct mass conservation between levels.

quires careful design of the restriction operators for F and B in order not to lose the property that different resolutions are connected by the wavelet operators. In particular, we would require restriction operators to fulfil the commutation relationships: divje→k ◦ RF = Rh ◦ divj+1 e→k

gradjk→e

◦ RB = Ru ◦

gradj+1 k→e

(5.13) (5.14)

A restriction operator for the Bernoulli values, RB , to fulfil (5.14) can be obtained by simple injection. This can be seen by looking at Ru ◦ gradj+1 k→e : On the triangular dual grid, the level-j-edge e0 connecting endpoints k1 and k2 is bisected at point km into edges e1 (connecting k1 and km ) and e2 . Then Ru ◦ gradj+1 k→e at this edge yields 1 ((∇B)e1 + (∇B)e2 ) 2 1 = ((Bkm − Bk1 )/de1 + (Bk2 − Bkm )/de2 ) 2 = (Bk2 − Bk1 )/de0 ,

(∇B)e0 =

where the bisection property de1 = de2 = de0 /2 has been used in the last step. An operator RF to fulfil (5.13) is the most challenging to design amongst all operators of this model and earns its own section 5.5. Figure 5.5 shows the complete computation of the trend for the height. All fluxes between a cell where (the trend of the) height is to be computed from restriction and a cell where the trend of the height is computed by the divergence operator, must be computed from RF . All to be restricted height trends can be computed in two ways which give the identical result (using 58

Ph.D. Thesis - M. Aechtner

McMaster University - CSE

either the right or left hand side in (5.13)). As the figure also shows, the trend computation is not finished at this points. Wavelet coefficients are computed from a forward wavelet transform and used together with the restricted values to compute scaling coefficients on all levels. This step is only required at transitions between different resolutions: at active points with inactive neighbours it will correct value for mass conservation, and at inactive points with active neighbours interpolated value is obtained to be used in the next step. Everywhere else it would have no effect and is thus omitted. Computation of the velocity trend is similar. The Bernoulli function B = K + h and Q⊥ e are computed by the TRiSK operators on the fine level and restricted by RB and Ru respectively. Then the gradient is computed everywhere and added to Q⊥ e . Computing wavelet coefficients and inverse transform completes the trend computation.

5.5

Flux restriction

This section presents a flux restriction operator RF that fulfils the commutation property (5.13) for the height restriction operator Rh defined in (5.3). Additionally RF is constructed to be of second order accuracy, to be consistent with the rest of the model. We split Rh similar to (5.8), and look for a basic flux restriction operator RF 0 to fulfil divje→k ◦ RF 0 = Rbasic ◦ divj+1 e→k

(5.15)

and a corrective flux restriction operator RF ∗ for which holds divje→k ◦ RF ∗ = Rcorr such that RF = RF 0 + RF ∗ ◦ divj+1 e→k .

5.5.1

Basic restriction

In the following, notation from figure 5.6 will be used, where all quantities (particularly u, v, w, x and F ) are integrated fluxes (total flux through an edge or part of an edge) except that A stands for area. As shown in figure 5.6, 59

Ph.D. Thesis - M. Aechtner

McMaster University - CSE

v−1,2 w1,2 u0,1 v−1,1

F2

w−1,1

v1,1

u−1,0 A F1 A 1 2 v−1,−1

x A−11

′ w−1,1

w1,1 u1,0

w1,−1

v1,−1

w−1,−1 A3 x−1 A4

u0,−1 w−1,−2

v1,−2

Figure 5.6: Small overlap regions between hexagons at successive levels need to be accounted for when restricting the thickness flux.

the total area A of the central green fine scale hexagon is decomposed as A = A1 + A2 + A3 + A4 according to the way it overlaps with the two adjacent red coarse scale hexagons sharing the solid red edge. We assume that we are given all fluxes u, v, w on the fine grid (green) and want to compute the flux through the solid red coarse edge F = F1 + F2 shown in figure 5.6. F2 is the flux through the part of the coarse edge outside the green fine hexagon and F1 is the flux through the part of the coarse edge inside the green fine hexagon. Here we consider the case where one end of the coarse edge is inside the fine (green) hexagon and the other end is outside. In this way the procedure for both cases (ending inside and ending outside) is explained. In the case that both, or no, edge is inside the fine hexagon the same procedure is simply applied at both ends. The sum of fluxes entering the fine (green) hexagon on the left of the F1 + x−1 connection is defined as Fin and the flux leaving on the right is 60

Ph.D. Thesis - M. Aechtner

McMaster University - CSE

defined as Fout : 0 Fin = −w−1,1 + u−1,0 − v−1,−1 + w−1,1

0 Fout = −v1,1 + u1,0 − w−1,1 + w−1,1

(5.16) (5.17)

The divergence theorem says that the average divergence of a vector field over an area A, divA , is equal to the net flux through the boundary of A divided by A, divA = (Fout − Fin ) /A. (5.18) Therefore, average divergence over the small area shown in figure 5.6 may be written as divA1 +A2 =

Fout − (F1 + x−1 ) (F1 + x−1 ) − Fin = . A1 + A3 A2 + A4

The above expression can be solved for the flux F1 + x−1 , using A2 + A4 + A1 + A3 = A, to find F1 + x−1 =

Fin (A2 + A4 ) + Fout (A1 + A3 ) . A

An expression similar to (5.18) also holds for the small triangle associated with area A−1 , 0 A−1 divA−1 = −F2 − w−1,1 − x1 . Solving for F2 yields an expression for the flux F2 , 0 F2 = −A−1 divA−1 − w−1,1 − x1 . 0 Combining these results, w−1,1 cancels, and we find that the total basic restricted flux F0 corresponding to the action of the operator RF 0 on the fine scale fluxes is

F0 = F1 + F2 A2 + A4 (−w−1,1 + u−1,0 − v−1,−1 ) = A A1 + A3 + (−v1,1 + u1,0 − w−1,1 ) A −A−1 divA−1 − x1 − x−1 . 61

(5.19)

Ph.D. Thesis - M. Aechtner

McMaster University - CSE

The remaining step is to compute the fluxes x1 and x−1 through the small boundaries shown in the zooms in figure 5.6. The flux through boundary x1 is interpolated using the fluxes at fine edges on the upper half u0,1 , v1,1 , w−1,1 , u−1,0 − v−1,1 , v−1,2 − w1,2 , w1,1 − u1,0 . The flux on the lower half through x−1 is found in the same way from the fluxes u0,−1 , w−1,1 , v−1,−1 , u−1,0 − w−1,−1 , w−1,−2 − v1,−2 , v1,−1 − u1,0 . We employ the interpolation formula used for interpolating velocities in [Dubos and Kevlahan, 2013], which has the following advantages: 1. Second-order accurate. 2. Reliable in the case of equilateral triangles. 3. Computationally efficient as it reuses components. (Note that the commutation relation 5.13 is satisfied irrespective of the interpolation formula used to compute the fluxes x1 and x−1 .) This completes the computation of the restricted flux obtained from the basic operator RF 0 in (5.15). We now explain how to compute the corrective part RF ∗ of the flux restriction in order to obtain the full flux restriction operator RF .

5.5.2

Corrective part

j+1 The commutation equation restricts the flux divergence ck = divj+1 , with k Fk the operator Rh , so we use the Rcorr in the form (5.11) applied to the divergence cj,k X j X j cj,k = c∗j,k + s˜km sk0 m (cj+1,k − cj+1,k0 ) . (5.20) m∈M(j)

k0 6=k

This computes flux-divergence on level j from flux-divergences on level j + 1. We want to now split this operator Rcorr into the corrective flux restriction operator (acting on fine level flux-divergences) and the divergence operator on the coarse grid with coarse level fluxes as intermediate quantities. The double sum in (5.20) is the contribution of all fine divergences to the coarse divergence. In the following we use the notation in figure 5.7 and consider only neighbours for which the weights s and s˜ are non-zero (meaning that the corresponding areas overlap). 62

Ph.D. Thesis - M. Aechtner

McMaster University - CSE

l2j+4

l2j+1

m2j+2

m2j+1 Aj+1 lm

Aj+1 km

l2j

m2j

k

m2j−1 m2j−2

l2j−4

l2j−1

Figure 5.7: Arrangement of fine and coarse scale height nodes used in the calculation of the corrective flux restriction through coarse edge kk2j .

If we compute integrated coarse fluxes as lm2i Fm2i = (km2i k2i ) + (km2i+2 k2i ) + (km2i−2 k2i ) + (km2i+1 k2i ) + (km2i−1 k2i ) 1 1 1 1 + (km2i+1 k2i+1 ) + (km2i−1 k2i−1 ) + (k2i+4 m2i+2 k2i ) + (k2i−4 m2i−2 k2i )), 2 2 2 2 (5.21) with j+1 Aj+1 j+1 j+1 km Ak0 m (kmk ) = c − c , k k Aj+1 m 0

(5.22)

then summing over the fluxes on the left hand side corresponds to the divergence operator on the coarse grid, whereas summing over the right hand side results in (5.20). 63

Ph.D. Thesis - M. Aechtner

5.6

McMaster University - CSE

Grid Optimization

In section 3.3 we have already been confronted with non-equilateral triangles reducing the order of convergence of operators that perform very well on regular planar geometry. On the icosahedral grid obtained from edge bisection, the Laplace operator shows no convergence [Heikes and Randall, 1995]. The Operators of the TRiSK scheme can reduce from second to first order for the non-equilateral case. For most operators the accuracy depends on the distance between bisection of primal and dual edges, and for some second order is achieved if they coincide even if the triangles are not equilateral. The model allows to read in optimized grids are provided by [Heikes et al., 2013]. These grid have modified locations of the dual grid vertices that have been determined from minimizing the distance of the primal-dual edge bisections. If the dual grid vortices at a fine resolution J are changed the grid is no longer nested with respect to the coarser level j < J. Since the nesting property is critical for some of the inter-scale operators the adaptive model uses the optimized grids for the minimum level Jmin and levels j > Jmin are obtained from bisection. When using the optimized grids a larger Jmin means an increase in accuracy due to a stronger effect of the optimization but also a reduced degree of adaptivity. The model also implements the grid optimization suggested by [Xu, 2006] as an alternate option. This algorithm works locally and replaces all coordinates xk by X 1 xk ← (cot αij + cot βij ) (xk − xk0 ) , 4 0 k ∈A(k)

where A(k) are the indices of all nodes that are connected through an edge to the node at xk . and αkk0 and βkk0 are the angles opposite the edge from xk to xk0 in the two triangles that share this edge. Which is followed by projecting all xk onto the sphere. This is repeated until desired convergence is reached.

64

Chapter 6 High performance parallel implementation This chapter discusses the implementation of the adaptive model by first introducing efficient data-structures and then procceeds with some specific details and finally arrives at the parallel implementation with MPI. The goal is to develop a computer model that is not only a prototype but an actual efficient program. The work presented in this and the following chapters has been carried out by the author.

6.1

Hybrid data structures

In terms of data-structures there are two difficulties: the spherical geometry (non-regular domain) and the adapted grid. Additionally the grid is staggered and one has to deal with nodes, edges and triangles (or from the dual grid perspective hexagons/pentagons, edges and nodes). In this section we begin by suggesting efficient data-structures for the non-adaptive case, and then extend them for the multilevel adaptive scenario which will lead to a hybrid grid approach. (Note that the non-adaptive TRiSK scheme in section 4.3 can be obtained from the adaptive model by setting Jmin = Jmax .) Figure 6.1 shows how the icosahedron from figure 3.2 is unfolded. Its net (on the left) can be subdivided into ten lozenges (like the one shown in the bottom right). 65

Ph.D. Thesis - M. Aechtner

McMaster University - CSE

Figure 6.1: Icosahedron net obtained from tearing apart the two times refined icosahedron from figure 3.2 (missing two blue nodes). It can be decomposed into ten elements of the type in the bottom right.

On level j = 1 (where green and blue nodes are joined) each lozenge can be further subdivided into four elements of the type shown in the top right. More generally a non-adaptive icosahedral grid of resolution j (i.e. a grid obtained by j times refining the icosahedron) can be stored on ten regular domains consisting of 2j × 2j elements, where each element consists of one node and three adjacent edges and two triangles. Figure 6.2 (right) shows one such element (solid lines) with surrounding grid entities as dotted lines. On the primal grid an element accordingly consists of one hexagon three edges and two nodes (figure 6.2 left). This means that a triangular grid (with hexagonal dual on the plane) can be accommodated within a regular data-structure, and the icosahedral grid can be stored on ten regular grids (with exemption of the poles). At this point we are neglecting the fact that in the data-structures and figure 6.1 there are two nodes missing one of which connects the five ends at the top of the net and the other connects the ends at the bottom. We will call these two special nodes the poles since the icosahedron can be oriented so that these two nodes are located at north and south pole, and discuss this issue later. The advantages of this data-structure are to be understood in comparison to the alternative of an irregular triangular grid data-structure, which could easily be used on any domain, hence also on the sphere. The irregular grid stores nodes, edges and faces as separate lists where each entity (node, edge or triangle) references adjacent entities by knowing their indices. The disadvantage of this solution is that finding neighbours becomes expensive if they are 66

Ph.D. Thesis - M. Aechtner

UP

McMaster University - CSE

DG

UPLT

UP DG

LORT RT

RT

Figure 6.2: Grid element in primal (left) and dual (right) grid interpretation with surrounding entities in gray.

further away, additional data (for the references) has to be stored and moved around, and it is difficult to keep data locality under control. Therefore the data-structure with the ten lozenges that can be treated as structured grids is preferable. There the elements inside one lozenge can be numbered and stored consecutively and neighbours can be found as constant offsets. This regular approach allows for reduced storage requirement (since no grid topology needs to be stored) and promises significant improvement in computational efficiency. This data-structure, however, makes it necessary to deal with edges of the icosahedron that connect different lozenges, especially where the icosahedron has been teared apart and neighbouring lozenges are rotated with respect to each other. With the consequent parallel implementation (section 6.4) in mind, each lozenge is treated as a separate grid. The grids are extended by 2 rows of ghost elements, which are elements that are not computed themselves but are used as neighbours by the discrete differential operators on “real” elements. This means that each ghost value on a ghost element is connected to a real element where the value is computed and later copied to the ghost element before it is required there for computations that follow. The situation gets more complicated for the adapted grid which consists of 67

Ph.D. Thesis - M. Aechtner

McMaster University - CSE

several possibly incomplete levels. The nested grid can be stored in a tree datastructure where each element on level j has four elements on the finer level j + 1 whose 8 triangles arise from subdividing the 2 triangles of the element on level j. Adaptive data-structure can then be obtained by removing subtrees if they contain no element whose node or edges are in the mask or required as neighbours. However, with the tree data-structure the advantages of the regular grids are lost. We confront this problem by grouping elements into patches. Each patch is a small regular grid of typically 8x8 elements. Then the patches are used as the smallest elements in the tree structure. So the model is implemented using hybrid data-structures that combine the adaptation feature of quad-trees and the efficiency of regular grids. Similar hybrid approaches have also been suggested in [Behrens, 2009] and [Hejazialhosseini et al., 2010]. Further the patches are connected to their neighbours via references so that neighbours on patch boundaries can be found without a tree-traversal. The relative overhead of the references decreases as the patch size increases. Of course too large patch sizes mean an overhead of storing elements that are not in any mask. Patches of 8 × 8 elements provide a good compromise and overheads are negligible if the total problem size is large (so that patches are small compared to geometrical scales). The data structures and the algorithm managing them are designed in a way that neighbouring patches can differ by an arbitrary number of refinement levels. In fact we have a hierarchical structure of the grid: The entire domain (the sphere) is divided into sub-domains which can be the ten lozenges or 10 · 4k lozenges if each lozenge is sub-divided k times. A sub-domain has a tree structure of patches and is surrounded by ghost elements. A patch consists of a number of elements arranged regularly and each element is made up of one node, three edges and two triangles.

6.2

Masks and adaptation

Given the mask of all nodes and edges where height and velocity wavelet coefficients, respectively, are above threshold, we have already discussed how a mask is created that contains adjacent zones in space and scale. In order 68

Ph.D. Thesis - M. Aechtner

McMaster University - CSE

to avoid unnecessary computations, further masks are created. For this it is to be noted that the model is optimized for using multi-stage time integration schemes which means that grid is adapted once every time step while operators are evaluated once per stage and hence several time per time step. Therefore, after the adaptation, masks are computed and stored for F and B and qe to know where values are restricted and where they are computed; so that during the stages the mask is accessed instead of recomputed. This also allows to avoid unnecessary computation by skipping the expensive flux-mapping operator M edges where qe is in no mask. Besides the masks, during grid adaptation new patches need to be created to guarantee that all operators have all required neighbours available on any patch. We assume that the grid at time t has enough neighbours for all operators and then ensure that at time t + ∆t this will be the case as well. Therefore, after the new wavelet coefficients have been computed, all patches that do not have children in the patch tree structure make a check if children will be required in the next step (e.g. to hold nodes adjacent in space to nodes where the wavelet coefficient has just increased to be above threshold).

6.3

Pentagon points

Pentagon points are all nodes of the original icosahedron (including the two poles) since they have only 5 adjacent edges and the primal cell to which they correspond is a pentagon. Since the discretization of the operators in section 4.3 and chapter 5 in principle work for any Voronoi diagram, pentagon and hexagon points are treated consistently in the discrete model. For the implementation it is preferable in terms of performance and code-complexity to be able to treat all elements equally (as hexagons). Therefore the strategy is employed to take the pentagon points into consideration when computing geometric quantities, so that during the performance critical evaluation of the operators, all nodes can be treated equal. That means that e.g. the divergence operator at a pentagon adds six fluxes one of which is zero since it is multiplied with a zero-length edge or the flux mapping M uses some zero weights. In cases where this strategy does not work, operators are still com69

Ph.D. Thesis - M. Aechtner

McMaster University - CSE

puted equally everywhere, and wrong values at the twelve pentagon points are corrected afterwards, in order not to clutter the main loops for the operator with specialized code. Two special pentagon points are the two poles since they do not fit into the data structure in the sense that they are not uniquely assigned to an element patch or sub-domain. They are stored and computed on a ghost element on each sub-domain that neighbours the pole. The special treatment of the poles takes place when copying the values for ghost elements. Then only one subdomain is considered to have the correct value and the others use this value to override theirs. This way no inconsistencies can build up from roundoff errors. Note that this concerns only quantities defined on nodes and operators that compute these quantities, while the most complicated operators work on edges.

6.4

Parallel implementation

The parallel implementation makes use of the data structures that already consist of several independent domains. In order for the method to run on large numbers of cores the sub-domains can be recursively subdivided into four parts until the desired number of sub-domains is reached. Having the subdomains distributed over the cores, inter-process communication is required to replace the copying of values to the ghost elements. This has been done using the Message Passing Interface (MPI). Additionally communication is also required during grid adaptation. When a new patch is created on one sub-domain, its index is communicated through its parent. Also here the different orientation of sub-domains that are neighbours where the icosahedron was opened has to be considered. The algorithm has been carefully designed to have communications during grid adaptation rather then during the more frequent evaluation of differential operators whenever possible. Updating the ghost boundaries is implemented by marking values as old when they have been computed and let methods that require values on a boundary communicate them only if they are old and then mark them as up-to-date. Methods that modify values mark these values as 70

Ph.D. Thesis - M. Aechtner

McMaster University - CSE

old and methods that require values on a boundary update them only if they are old and then mark them as up-to-date. This way unnecessary duplication of expensive communication is avoided in the complex algorithm and adjust automatically if the algorithm is modified. Critical communications are carried out locally point-to-point rather than using global communication where possible. Where applicable communication is non-blocking so that the computations can continue while communication is taking place in the background. When the grid is adapted, the computational load on each core can change as nodes are added or removed. This requires a redistribution of the grid over the cores called load balancing. Currently metis [Karypis and Kumar, 1995] is used to compute the load distribution which is then obtained by redistributing the grid during check-pointing.

6.4.1

Parallel scaling

We quantify parallel performance with respect to both weak and strong scaling efficiency. All calculations are performed on the sharcnet cluster requin, which has 1541 AMD Opteron cores and a Quadrics Elan4 interconnect. Each processor has two cores and 8GB of local memory. Strong scaling efficiency ES is defined as ES =

t1 ≤ 1, N tN

where t1 is the time to do a given computation on one core, N the number of cores used and tN the time to do the computation N cores. ES measures how cpu time decreases as the number of cores increased for a fixed problem size. Ideally, tN should decrease proportionally with increasing N since the processes can divide the (constant) work. However, in practice when the portion of the total computation allocated to each core reaches a lower bound tN no longer decreases due to the non-parallelized part of the code (Amdahl’s law) or because of communications overhead. Figure 6.3 shows the strong parallel scaling efficiency for a perfectly balanced load (solid line) and the turbulence test-case (dashed line). As expected, the unbalanced test case has a lower 71

Ph.D. Thesis - M. Aechtner

McMaster University - CSE

1 0.9

parallel efficiency

0.8 0.7

24156

0.6

21760

0.5 N = 3,481,600 (balanced)

0.4

10880

N = 13,926,400 (balanced)

0.3

N = 3,865,014 (unbalanced)

0.2 0.1 0

1

2

5

10 20 40 80 number of cores

160

320

640

Figure 6.3: Strong parallel efficiency scaling. Perfectly balanced (solid) and realistic turbulence test case (dashed). All numbers are active degrees of freedom (four times the number of height nodes); N is the total over all cores, in the graph are per core.

efficiency. The reason for the lower efficiency is explained below. This result suggests that for both balanced and unbalanced problems strong parallel efficiency is acceptable for at least 102 cpus. In practice, weak scaling efficiency is a more useful measure since high performance codes are intended for large problems. To measure weak scaling efficiency the computation per core is kept approximately constant as number of processors is increased. Weak scaling efficiency EW is defined as EW =

t1 ≤ 1, tN

where tN is the time needed when running on N processors. However, unlike strong parallel efficiency, an efficient parallel code should maintain EW ≈ 1 independently of the number of cores used. Weak scaling is shown in figure 6.4 for a balanced test case. It demonstrates that good weak parallel efficiency can be achieved for at least 640 cores. In particular, if there are at least 20 000 height nodes per core the weak scaling efficiency is 90% on 640 cores. Scaling for larger number of cores could not be tested given resource limitations, although these results suggest that the code should have acceptable parallel 72

Ph.D. Thesis - M. Aechtner

McMaster University - CSE 1

parallel efficiency

1 0.9

0.9

0.8

0.8

0.7

0.7

0.6

0.6

0.5

0.5

d.o.f./height nodes per cpu

0.4

344064/86016

0.4

0.3

86016/21504

0.3

21504/5376 0.2

0.2

5376/1344

0.1

0.1 0

10

40 160 number of cores

640

0

Figure 6.4: Weak parallel scaling on up to 640 cores for balanced test-case

performance for at least 1000 cores.

73

Chapter 7 Model verification Before we come to the real physical applications, we want to verify correct implementation of the model and investigate numerical accuracy by comparing it to established test-cases.

7.1

Unit testing and mimetic properties

In a series of tests the correct functioning of individual components has been validated. The accuracy of velocity interpolation (section 5.2) and flux restriction operator (section 5.5) has been confirmed to be second order in a convergence study where the error was compared to the grid-spacing. The order of individual differential operators has been tested similarly, where spherical harmonics have been used to construct test-functions with known derivatives. The commutations properties of divergence with restriction (5.13) and the flux M mapping with height interpolation Z (4.7) have been confirmed to be fulfilled by the implemented model through numerical experiments. Further the mimetic properties have been confirmed to be observed by tests. The mass, which is proportional to the integral of the height, stays equal to the initial mass during time integration, including the zero-ing of wavelet coefficients and the height integrates to the same value on each level of resolution. Starting from a non-rotating initial condition (zero vorticity) the solution stays non-rotating at all time. For all tests differences that are 74

Ph.D. Thesis - M. Aechtner

McMaster University - CSE

10−8 10−9

mass deviation spurious vorticity

10−10 10−11 10−12 10−13

10

20 time [h]

30

40

Figure 7.1: Violation of mass conservation and generation of unphysical vorticity for a wave propagation test case are at the level of round-off error.

caused by round-off errors are ignored. Figure 7.1 shows the conservation of mass and the property of the method not to generate spurious vorticity. Therefor a wave propagation test has been used which has a zero initial vorticity and hence should not develop any vorticity at any time. The initial condition for this test case is a zero velocity, and a two-dimensional exponential for the height h = eα

2 r2

, r 2 = x2 + y 2

projected onto the north pole. The solution is a wave that spreads away from the pole towards the equator. The exact solution develops a velocity field with zero vorticity, and the model can thus be tested not to develop spurious vorticity. For the figure, a threshold of ε = 0.01 has been used. Mass conservation is measured as the deviation of the integrated height to the integral of the initial height, and shows similar patterns for the test-cases presented in the following sections. In order to put the folloing simulations into physical context table 7.1 provides translations between the number of refinements of the icosahedron to the number of grid-points (for uniform grids) and resolution. For all simulations the time step size ∆t has been computed from the solution to assure ∆t 75

Ph.D. Thesis - M. Aechtner J 3 4 5 6 7 8 9 10 11 12

McMaster University - CSE

N d.o.f ∆x [km] ∆x [deg] 642 2,562 959.3 8 2,562 10,242 479.6 4 10,242 40,962 239.8 2 40,962 163,842 119.9 1 163,842 655,362 60.0 1/2 655,362 2,621,442 30.0 1/4 2,621,442 10,485,762 15.0 1/8 10,485,762 41,943,042 7.5 1/16 41,943,042 167,772,162 3.7 1/32 167,772,162 671,088,642 1.9 1/64

Table 7.1: Number of bisection refinement levels J of the icosahedron, number of height nodes N , total degrees of freedom (d.o.f.), average edge length ∆x (on earth) and resolution in degree.

adheres to the CFL condition, by taking velocity and wavespeed cwave = into consideration.

7.2

√

gh

Shallow water test suite by Williamson

The test-cases for spherical rotating shallow water equations by [Williamson et al., 1992] are considered de facto standard in the geophysics community. In this section we will be using the following parameter to model the dimensions of the Earth: gravitational acceleration g = 9.80616 ms−2 , rotation rate of Ωearth 7.292 × 10−5 s−1 and a radius of Rearth = 6.37122 × 106 m.

7.2.1

Test 1: Advection of cosine bell

In the first case presented in [Williamson et al., 1992] a cosine bell is advected around the sphere. A fixed velocity is prescribed as u(θ, λ) = U (cos θ cos α + sin θ cos λ sin α) , v(θ, λ) = −U (sin λ sin α) , 76

Ph.D. Thesis - M. Aechtner

McMaster University - CSE

with U = 2πRearth /(12 days). α the angle of rotation and the longitude λ and latitude θ related to Cartesian coordinates (x, y, z) by θ = arcsin(z/R),

λ = atan2(y/x).

Even so only the equation for height evolution is integrated, this is a good test of the grid adaptation routines and grid stability. The solution is known analytically as the initial condition rotated around the sphere. Creating of finer scales can be attributed solely to numerical errors. In order to investigate grid stability we do not set the maximum resolution, which is then automatically determined from the grid adaptation routine. Besides the initial condition for height suggested in [Williamson et al., 1992] (but with the angle of the rotation fixed to equatorial direction (α = 0) since for the icosahedrol the poles are not special; instead, however, test-cases have been rerun with a rotated geometry and confirmed that the position of the icosahedron vertices does not have a significant influence on global error norms) h=

H (1 + cos(πr/L)) , 2

(7.1)

the experiment is also carried out with a smooth bell inspired by [Galewsky et al., 2004] h = Her

2 /(r 2 −2L2 )

.

In both cases r = Rearth arccos (cos θ cos λ) , and H = 1000 m and L = R/3. The second initial condition is included because the cosine bell is only C 1 continuous at r = L. Because our grid adaptivity routine is based on second-order interpolation this non-smoothness at the edge of the cosine bell could potentially affect grid stability. Both initial conditions constitute a localized bell that is advected once around the sphere. Figure 7.2 shows the grid after one full rotation of the cosine bell. The cosine bell is thus located in the centre. The height field is visually indistinguishable from the initial condition. In this case the threshold for height has been set directly to εh = 4.2 (note that height is of the order of 1000) and the minimum resolution is Jmin = 4. While there was no restriction on the 77

Ph.D. Thesis - M. Aechtner

McMaster University - CSE

Figure 7.2: Grid after one rotation with cosine bell in the centre (εh = 4.2, Jmin = 4). The maximum level is determined by the adaptation routine.

maximum level, the grid adaptation uses only resolutions up to j = 9 throughout the simulation and the grid refinement is stable. Due to numerical errors, the east-moving bell leaves a short trail which is cleaned up by the adaptation routine after a while. That unphysical wriggles are present in the solution can also be seen in figure 7.3 which shows the cosine bell after one rotation. On the left the displayed range has been reduced to [−15, 15] and clearly shows that the bell is followed by a wave of negative height (blue colour). These errors are also present in the non-adaptive model (figure 7.4), and are actually less pronounced for adaptive runs with the same number of grid-points since in that case the resolution at the bell is increased. The convergence of the error after one rotation has been studied: in both the maximum and L2 norm it decreases quadratically with increasing number of grid points and decreasing threshold. That confirms second order accuracy. The results are similar for both bell shapes. Further figure 7.5 shows the behaviour of the normalized maximum error with time. The adaptive model 78

Ph.D. Thesis - M. Aechtner

0

200

400 600 height

McMaster University - CSE

800

1000 −25 −20 −15 −10 −5

0 5 height

10

15

20

25

Figure 7.3: Case 1 adaptive grid Jmin = 5, Jmax = 8: The height after one rotation (left) and for a reduced range to investigate behaviour around 0 (right).

−25 −20 −15 −10 −5

0 5 height

10

15

20

25

Figure 7.4: Case 1 uniform grid J = 6: The height after one rotation for a reduced range.

79

normalized maximum error

Ph.D. Thesis - M. Aechtner

McMaster University - CSE

10−1

10−2

adaptive J5-J8 non-adaptive J6

10−3

10−4

2

4

6 8 time [days]

10

12

Figure 7.5: Case 1: For equal number of grid-points, the error over time is lower for the adaptive model (εh = 0.45)

ei has been run with εh = 0.45 leading to an avearge of 37509 nodes and produces a lower error than the non-adaptive method with 40962 nodes.

7.2.2

Test 2: Stationary flow

The second test case uses the full shallow water equations. Height h is defined by U2 gh = gH − RΩU + cos θ, 2 and velocity as u = U cosθ, with U = 2πRearth /(12 days) and gH = 2.94 × 104 m2 /s2 . The flow is in geostrophic balance so that the exact solution is equal to the initial condition at all times (steady solution). The adaptive method has been run for different thresholds and figure 7.6 compares the degrees of freedoms (i.e. sum of height and velocity values) used for the thresholds to the resulting errors. It shows 80

Ph.D. Thesis - M. Aechtner

McMaster University - CSE

10−2

10−3 error

error

10−1

10−4

10−2

L2 error max. error order 2 order 1

L2 error max. error order 2 order 1

105 106 degrees of freedom

10−3

Figure 7.6: Case 2: convergence of the normalized error in the height field after 15 days.

105

106 number of active gridpoints

Figure 7.7: Case 6: convergence of the normalized error in the height field after 12 days.

that the convergence of the global time integration error is between first and second order accurate.

This test-case has also been run with a statically refined. The grid is refined locally as a round patch, similar to the initial grid of the cosine bell, provided to the solver which for this test does not adapt to the solution. The purpose of this experiment is to investigate errors introduced by the adapted grid. We compute the L2 error for a non-adaptive run after 15 days with 163842 d.o.f. and compare to a run with a refined patch of 63074 additional d.o.f. The error is expected to increase since the local refinement is not in favour of the global solution and on the hand has potential to introduce errors at the transitions between different levels of refinement. The experiments show a moderate increase of the L2 error by 17%. 81

Ph.D. Thesis - M. Aechtner

McMaster University - CSE

90

10500 10000 9500

30

height

latitude

60

9000 8500

0 −180

−120

−60

0 longitude

60

120

180

Figure 7.8: Case 6 with ε = 0.05: height after 7 days

7.2.3

Test 6: Rossby–Haurwitz Wave

Rossby–Haurwitz waves are a standard test case for the full shallow water equations. The initial conditions are a non-divergent velocity field u = aω cos θ + aK cosR−1 θ R sin2 θ − cos2 θ cos Rλ, v = −aKR cosR−1 θ sin θ sin Rλ,

and a height chosen to ensure the flow is in geostrophic balance. This initial field rotates around the North–South axis with slight deformations. For an adaptive run with ε = 0.05 after 7 days the height still resembles closely the initial condition (figure 7.8; after 14 days (figure 7.9 the instability starts to develop that will eventually break the wave. Increasing or decreasing the threshold results in the wave breaking at an earlier or later similarly to increasing J for the non-adaptive TRiSK model. Figure 7.7 shows that the convergence for this test case is second order. The thresholds ε used for the this figure are 0.2, 0.1414, 0.1, 0.0707, 0.05, √ 0.03535, 0.025 (steps of 2 on a logarithmic scale). Again the experiments are from the adaptive model and the number of grid-points are a result of changing thresholds. The slightly irregular behaviour of the normalized error norms is due to the fact that the error drops more rapidly when the reduced 82

Ph.D. Thesis - M. Aechtner

McMaster University - CSE

90

10500

10000

9500 30

height

latitude

60

9000

8500 0 −180

−120

−60

0 longitude

60

120

180

Figure 7.9: Case 6 with ε = 0.05: height after 14 days

threshold causes the lowest completely full level to increase by one. Since analytical solutions are not available, solutions from the Spectral Transform Shallow Water Model stswm at resolution T 514 are used as a reference.

7.3

Lauritzen nonlinear advection test

In order to further test the grid adaptation an advection test-case suggested by [Nair and Lauritzen, 2010] has been applied. Contrary to the cosine bell the initial scalar underlies severe deformation into filament structures by the prescribed flow. After half of the simulation time T , a reverse flow brings the scalar field back to the initial shape and position. This is a more challenging test of advection and grid adaptation than simple cosine bell advection. In the following results will be discussed for the variant with two cosine bells with a shape as defined in (7.1) but a maximum height of 1 (for this case dimensions are normalized Rearth = 1) and at positions (θ = −π/3, λ = π) and (θ = π/3, λ = π). The non-divergent advecting velocity field changes in time 83

Ph.D. Thesis - M. Aechtner

McMaster University - CSE

(case 1 in [Nair and Lauritzen, 2010])

u(λ, θ, t) = U sin2 (λ/2) sin(2θ) cos(πt/T ) U v(λ, θ, t) = sin(λ) cos(θ) cos(πt/T ) 2

(7.2) (7.3)

with a velocity scale U = 2.4 and final time T = 1. The goals of this test are to examine whether the adaptation introduces additional errors or grid imprinting compared to the non-adaptive TRiSK scheme, and whether the adaptive simulation is able to out-perform the nonadaptive simulation in terms of the error compared to the number of grid points. Figure 7.10 shows how the grid adapts to the solution a run with ε = 0.001, Jmin = 5 and Jmax = 9. From the image at half time t = T /2 (bottom) we can see that additional grid levels are added during deformation in order to control the error. At final time t = T (top right) grid resolution has been reduced again; also during the entire simulation the grid is automatically cleared again as soon as the perturbation has passed by. The error at final time (i.e. the difference of the numerical solution at final time to the exact solution, which is equal to the initial condition) is shown in figure 7.11. The plot on the left corresponds to a non-adaptive grid with a uniform resolution of 60km, and on the right the adaptive method has been used with a threshold of ε = 0.001 resulting in about the same number of grid points (160962 and 169248 respectively) for both grids. The figure indicates that on both grids the solution underlies a phase shift which especially for the non-adaptive run dominates the error. Clearly the adaptive run has lead to lower overall error; it does, however, show some numerical noise caused by the adaptation. That the adaptive simulations produce better results in terms of a lower maximum error can also be seen from figure 7.12; e.g. the adaptive run can achive a maximum error of 2% with about 20,000 nodes whereas the non-adaptive case requires about 160,000 nodes for the same maximum error. 84

Ph.D. Thesis - M. Aechtner

McMaster University - CSE

t=0

t=T

t=T/2

5

6

7 grid level

8

9

Figure 7.10: Lauritzen test-case with ε = 0.001: The grid tracks the solution and refines during the deformation at T /2.

85

Ph.D. Thesis - M. Aechtner

McMaster University - CSE

Non-adaptive 60km res.

−0.006

−0.004

−0.002

Adaptive with ǫ = 0.001

0 error at t=T

0.002

0.004

0.006

normalized maximum error

Figure 7.11: Error of Lauritzen test-case at final time: While the non-adaptive model (left) shows an error due a phase shift of the solution, the adaptation introduces some noise but the overall error is much lower for the same number of grid-points.

10−1 adaptive non-adaptive

10−2 104

105 number of nodes

106

Figure 7.12: Convergence of Lauritzen’s nonlinear advection test: The adaptive model achieves lower errors for the same number of nodes.

86

Ph.D. Thesis - M. Aechtner

7.4

McMaster University - CSE

Unstable jet by Galewsky

The standard test cases above are supplemented by a strongly nonlinear test case proposed by [Galewsky et al., 2004]. The initial conditions consist of a basic zonal flow and a height perturbation. For the basic flow, the zonal velocity component u is only a function of latitude θ 0 θ ≤ θ0 umax 1 exp θ0 ≤ θ ≤ θ1 (7.4) u(θ) = exp(−4/(θ1 − θ0 )2 ) (θ − θ0 )(θ − θ1 ) 0 θ > θ1 and the height is obtained by integrating the balance equation Z tan(θ0 ) 0 0 gh(θ) = gH − Rearth u(θ ) f + u(θ ) dθ0 R earth θ using 64 Gaussian quadrature points. The constant H = 10157.9 m is chosen to yield a global mean layer depth of 10 km. The other constants are umax = 80 m s−1 , θ0 = 2π/14 and θ1 = 5π/14 so that the jet’s midpoint is at θ2 = 1 2π+5π = π/4. This balanced flow is perturbed by adding a localized bump 2 14 0 h (λ, θ) of the form h0 (λ, θ) = 120 m cos(θ) exp(−(λ/α)2 − [(θ2 − θ)/β]2 ) with α = 1/3 and β = 1/15. This perturbation leads to a barotropic instability which eventually develops turbulent vortex structures. This test case involves both, a fast moving pressure wave and slower vortex dynamics.

7.4.1

Unperturbed flow

As suggested in [Galewsky et al., 2004], the simulation is first run without the perturbation to assure that the numerical scheme is able to maintain balance for at least five days. Figure 7.13 shows the error in height for the first five days for the non-adaptive TRiSK scheme at resolution Jmax = 7 compared with results from the adaptive method with threshold chosen so that the total degrees of freedom are comparable to the non-adaptive simulation (6 × 87

Ph.D. Thesis - M. Aechtner

McMaster University - CSE

average normalized error (L1 )

10−3 non-adapt (60km res.) ε = 0.01 10−4

10−5

10−6

1

2 3 time [days]

4

5

Figure 7.13: Unperturbed zonal jet test case (Galewsky). For similar numbers of active nodes the adaptive wavelet method maintains consistently lower error than the non-adaptive TRiSK scheme.

105 ). These results show that, for a similar number of degrees of freedom and the same discretization scheme, the adaptive wavelet method maintains a significantly lower error (about three times lower). Since this section uses no adaptivity and the method cannot represent zonal flow exactly numerical errors trigger an instability that deformes the initially zonal flow. Figure 7.14 shows the viscosity field after six days. The deformations reflect the icosahedral grid which has the same geometry repeating five times in longitudal direction. The deformations are also present in the non-adaptive model and disappear (at day six and occure later) as finer resolutions are used. Indead the adaptive method leads to less deformation for the same number of nodes as the non-adaptive model.

7.4.2

Perturbed inviscid flow

We now consider the results for the perturbed jet flow after the instability develops. Results for tolerance ε = 5 × 10−3 , coarse scale Jmin = 7, and finest 88

latitude

Ph.D. Thesis - M. Aechtner

McMaster University - CSE

60

30

0 −180

−120

−60

0

60

120

180

Figure 7.14: Unperturbed zonal jet test case (Galewsky). Vorticity field at day 6. Numerical instabilities have deformed the zonal flow.

scale Jmax = 9 are shown in figure 7.15. This simulation uses about 2 × 106 degrees of freedom, for a compression ratio of 5.25. The contours (solid) of the adaptive wavelet simulation nearly overlap with those of a reference simulation with the non-adaptive TRiSK scheme at the finer uniform resolution Jmax = 10 showing that the adaptive wavelet simulation is quite accurate, even for this highly nonlinear time-dependent test case.

7.5

Comparison with other adaptive models

The advecting cosine bell (Williamson case 1) has been applied to two adaptive shallow water models presented in [St-Cyr et al., 2008]. These model use a height threshold of 5.3% as a criterion for adapting the grid to add 3 refinement levels. Combined with additional refinement at intermediate levels caused by the requirement of the adaptation that adjacent blocks can differ by only one level, this refinement strategy, is favourable for this simple test-case in the sense that refinements due to numerical noise as in figure 7.2 is not possible. For the Rossby–Haurwitz wave (Will case 6) [St-Cyr et al., 2008] compare the height error for low uniform resolution to one level of local refinement. For one of the models (using finite volumes) the error is considerably increased by the refined region, whereas the other model (using spectral elements) produces only little additional error. For this test-case the wavelet method produces a moderate amount of error for comparable setups, which in terms of order lies between the two models in [St-Cyr et al., 2008]. 89

Ph.D. Thesis - M. Aechtner

McMaster University - CSE

90 90 60 30

0

latitude

latitude

45

0

−45

−120

−60

0

60

120

90

−180 90

45

60

−120

−60

0

60

120

180

−120

−60

0

60

120

180

−120

−60

0 longitude

60

120

180

30 latitude

latitude

−90

0

0

−45 −90

−120

−60

0

60

120

−180 90

90

60

45 latitude

latitude

30

0

0

−45 −90

−120

−60

0 longitude

60

−180

120

Figure 7.15: Galeswky test case with tolerance ε = 5 × 10−3 and Jmax = 9. Height perturbation at 2, 4 and 6 hours (left) and relative vorticity at 4, 5 and 6 days (right). The solution of Jmax = 10 non-adaptive reference simulation is dashed, but the lines are mostly indistinguishable.

90

Ph.D. Thesis - M. Aechtner

McMaster University - CSE

The Galewsky test-case has also been simulated in [St-Cyr et al., 2008] adding four levels of refinement using a criterion based on vorticity. They present results for the perturbed setup without explicit viscosity and show that the vorticity after 6 days is visually indistinguishable for both models and both, in the adaptive setting and with a uniform grid at resolution of the refined region. The wavelet method also produces visually the same vorticity field. For the unperturbed case (the only case with analytical solution) the wavelet method has a lower error than the non-adaptive TRiSK scheme for the same number of nodes, whereas the errors produced with our implementation of TRiSK are consistent with those shown in [Weller, 2012].

91

Chapter 8 Rotational shallow water turbulence 8.1

Initial condition

Closer to geophysically relevant applications, we consider initial conditions designed to generate shallow water turbulence. As shown in figure 8.1 eight zonal flows are arranged from North to South, with the zonal velocity u set to u = u−0.16π,0 + u−0.26π,−0.1π + u−0.36π,−0.2π + u−0.46π,−0.3π + u0,0.16π + u0.1π,0.26π + u0.2π,0.36π + u0.3π,0.46π , where uθ0 ,θ1 is the function defined in (7.4) with parameters θ0 and θ1 given through subscripts. The height is again computed by numerical integration so that the unperturbed flow is in geostrophic balance and H = 10000.

8.2

Structure of solution

After two days vortices form on each of the jets that then interact to generate the approximately homogeneous and isotropic turbulence shown in figures 8.2 and 8.3. Figures 8.2 and 8.3 show the simulation results after 132 hours with a threshold of ε = 0.05 for the inviscid and viscous runs, respectively. Since the 92

Ph.D. Thesis - M. Aechtner

McMaster University - CSE

zonal velocity

80 60 40 20

height [km]

0 −90

−45

0

45

90

−45

0 latitude

45

90

12

11

10 −90

Figure 8.1: Zonal flow initial condition to develop homogeneous turbulence.

−0.0002

−0.0001 0 0.0001 relative vorticity [m s−2 ]

0.0002

5

6

7

8 grid level

9

10

Figure 8.2: Inviscid shallow water turbulence after 132 hours. The grid (right) shows refinment where vortex filaments occur in the solution (left, as relative vorticity).

93

Ph.D. Thesis - M. Aechtner

−0.0002

−0.0001 0 0.0001 relative vorticity [m s−2 ]

McMaster University - CSE

0.0002

5

6

7

8 grid level

9

10

Figure 8.3: Shallow water turbulence after 132 hours with a viscosity of ν = 104 kg m−1 s−1 .

viscosity is used in order to simulate a turbulent flow with a defined Reynolds number rather than to stabilize the simulation, the same amount of viscosity is used on every level. The left hand figures show the relative vorticity and the right hand figures show the adapted grid. Each grid level is identified by a distinct colour. The most refined regions corresponding to the darkest colours, and are located near the intense vorticity filaments that characterize two-dimensional turbulence.

8.3

Efficiency: Compression ratio

Figure 8.4 shows that the compression ratio at t = 132 hours is about 15 for the inviscid case (8.2) and 21 for the viscous case (8.3). It is important to note that at this time the compression ratio is at its lowest level since the turbulence is most intense (compared with both the initial conditions and dissipated flow at later times). Figure 8.5 shows that the CPU time per active point remains roughly constant (left) even though the compression ratio changes significantly 94

Ph.D. Thesis - M. Aechtner

McMaster University - CSE

cpu-time per active grid-point

compression ratio

40

30 ν = 1 · 104 20 inviscid 10

0

0

100 time t [h]

Figure 8.4: Compression ratios around 20.

1.5×10−5

1×10−5

5×10−6

0

200

0

100 time t [h]

200

Figure 8.5: Constant CPU time per grid-point.

when turbulence first develops and then decays again (left). This shows that there is no appreciable computational overhead associated with the degree of grid compression (sparse or dense grids). For this run the metis partitioner [Karypis and Kumar, 1995] has been used with a resulting load-imbalance of 5% to 7%. The amount of imbalance has been about the same at different times during the simulation. Since this homogeneous test-case is a rather well balanced, even without load-balancing the imbalance stays at moderate levels (10% - 12%). Not surprisingly, the compression ratio is lowest when the flow is most turbulent. Nevertheless, the viscous adaptive wavelet code is still about six times faster than the non-adaptive TRiSK code at this time for an equivalent maximum resolution, and about four times faster than the optimized spectral code swbob [Rivier et al., 2002]. This result confirms that adaptive methods can still be advantageous for statistically homogeneous and isotropic flows, like fully-developed two-dimensional turbulence.

8.4

Energy and spectrum

The total energy E(t) is defined as Z 1 1 E(t) = gh gh + |u|2 dS − c4wave AS , 2 2 95

Ph.D. Thesis - M. Aechtner

McMaster University - CSE 10−8

2.9×1022

inviscid viscous

2.8×10

10−9

rotational energy E(k)

energy E(t)

3×1022

10−10

10−11 k −2

22

10−12

0

100 200 time t [h]

101

102 k

Figure 8.6: Behaviour of total energy minus the total energy at rest (right). Energy spectrum for the rotational part of the velocity averaged over the interval t = [132h− 136h] (left).

where AS is the area of the sphere and the wave speed cwave is s Z g cwave = hdS. As Due to mass conservation, cwave is constant. Figure 8.6 (right) shows that the total energy for both the viscous and inviscid runs first decreases and then stays at about the same level once the turbulence has developed. The energy spectrum of the turbulent flows can be estimated by interpolating the adaptive results on a uniform grid and using spherical harmonics f=

N X l X

Flm Ylm .

l=0 m=−l

The power spectrum is then defined as Sf (l) =

l X m=−l

|Flm |2 .

Figure 8.6 (right) show the spectrum of the rotational part of the velocity ωv = curlv u. The energy spectrum has a clearly defined power-law range, with a slope of about −2.2. This slope is flatter than the slope of −3 that is expected for two-dimensional turbulence spectra on the plane (see 96

Ph.D. Thesis - M. Aechtner

McMaster University - CSE

[Boffetta and Ecke, 2012]), and closer to the −2 spectrum of shocks in the divergent part of the flow. The discrepancy to the −3 slope can be due to the spherical geometry and the earth rotation. In general there are not many results on shallow water turbulence theory and none that presents or discusses spectra on the sphere.

97

Chapter 9 Tsunami simulation In this chapter the adaptive method is applied to ocean modelling. In order to account for continents the model is extended by penalization boundary conditions. Therewith the 2004 tsunami in the Indian ocean is simulated. This simulation tests the adaptive model in a scenario of fast dynamics (inertia-gravity regime) with a highly localized initial condition, and it involves interaction with coast-line and topography which lead to development of small isolated structures.

9.1

Penalization boundary conditions

Brinkman penalization has been used to model solid boundaries with shallow water equations by [Reckinger et al., 2012] who also use an adapted grid (based on wavelets but on the plane and disregarding the mimetic properties). Rather then treating boundaries explicitly the Brinkman volume penalization technique enforces no-slip boundaries by an additional term in the governing equations. Therefore this method is particularly well suited for complex geometry (like the continents) and adaptive methods which make explicit treatment of boundaries cumbersome. The same time the grid adaptation will ensure adequate resolution at the boundary. The principal approach of the Brinkman penalization is to treat the solid part as porous medium. It has been shown that the penalized equations con98

Ph.D. Thesis - M. Aechtner

McMaster University - CSE

verge to the exact solution in the limit as the penalization parameter (hence the porosity) tends to zero. The following penalization is a modification of [Reckinger et al., 2012] where the penalization is included into the prognostic height variable, with the advantage that coasts are refined only where the height perturbation (i.e. deviation from sea-level) in non-zero. This is particularly important in order to unfold the full potential of the adaptive method for highly localized cases like the tsunami during the first hours after the earthquake. An indicator function χ is defined that is zero in the fluid and one in the solid part of the domain. Then the penalization is controlled by two small positive parameters: α and Brinkman penalization parameter ηp . The penalized height hp is the deviation of physical height relative to the sea-level, δh, multiplied by φ hp (t) = h(t)φ, where φ = 1 + (α − 1)χs and the smoothed indicator function χs will be discussed below. Therewith the shallow water equations become ∂hp = −∇ · u (hp + bφ) ∂t ∂u hp 1 ⊥ = −q (hp + bφ) u − ∇ g + K − χs ∂t φ ηp with

η . (hp + bφ) The sudden change in the indicator function χ from zero to one, and the resulting discontinuity in the penalized height hp , can cause the heightinterpolation to produce negative values. This can be avoided by replacing the indicator function χ by a function χs that has smooth transitions at the coast. For simple geometry χs can be defined analytically 1 r − R < −a (r−R)π 1 χs (r) = 1 − sin 2a |r − R| ≤ a . 2 0 r−R>a q=

99

Ph.D. Thesis - M. Aechtner

χs

McMaster University - CSE

2a

1 0 -R

R

x

Figure 9.1: Smooth indicator function given analytically.

Figure 9.2: Smooth χs computed from bathymetry using radial basis functions.

Figure 9.1 shows this χ(s) for the one-dimensional case with r = |x|. Parameter a = 0.01 controls the width of the transition. In two dimensions we can set p r = x2 + y 2 , which models a lake or an island (if changing to χs ← 1−χs ). For the sphere χs can be defined on a tangent plane and projected onto semi-sphere (assuming R is much smaller then the radius of the sphere).

9.2

Bathymetry and coastlines

In order to simulate real flows in the oceans of the earth, it is necessary for the model to use the correct bathymetry b; and an indicator function that resembles the coastlines of our planet is required that provides smooth transitions. Therefore,, the model has been extended to read in bathymetry data provided by [Amante and Eakins, 2009] in longitude-latitude coordinates with 1 arc-minute resolution. The raw read-in bathymetry data br naturally tends to zero as we approach the coast; however, the shallow water model requires a certain minimum depth Hmin . Therefore we set br ≤ −Hmin br b= −Hmin −Hmin < br < 0 . 0 otherwise 100

Ph.D. Thesis - M. Aechtner

McMaster University - CSE

The non-smooth indicator function is ( 0 br < 0 χ= . 1 otherwise Radial basis functions (RBF) allow to define a smooth function everywhere on domain D based on a finite number of samples. Hence they provide interpolation from the longitude-latitude grid to any coordinate desired for our model. Here we will use the simple exponential RBF 2

RBF(r) = e−(ar) . To do the actual interpolation the RBF are used as weights Ns X

f (x) =

Ns X

s=−Ns t=−Ns Ns X

RBF(arcdist (x, xi+s,j+t )) · fi+s,j+t Ns X

RBF(arcdist (x, xi+s,j+t ))

s=−Ns t=−Ns

such that the interpolated value f at x is influenced by neighbouring longitude latitude samples fi,j depending on their distance to x. The number of neighbours, Ns is chosen so that the truncation error is small taking into consideration the fast decay of RBF(r). The RBF can introduce a certain amount of smoothing which is controlled by the parameter a which decreases or increasing the influence of further away samples. This provides us the with a single tool that we can use to interpolate from the long-lat grid, to introduce smoothing for b if fluctuation in the measured br require it, and to obtain χs . The result for χs at the coast of a half-island is shown in figure 9.2. During grid refinement the bathymetry is computed (from RBF) at the newly introduced grid-points. This way the total mass of water is no longer conserved; instead the mass relative to the sea-level is conserved by the numerical scheme. That means that the mass defect introduced by the discrete model which can accumulate over time is still of the order of round-off errors, and the mass defect caused by changes in the bathymetry can not accumulate and is bound at all times. 101

Ph.D. Thesis - M. Aechtner

0.0

0.1

0.2 0.3 0.4 0.5 wave amplitude [m]

McMaster University - CSE

0.6

Figure 9.3: Maximum wave height

9.3

0

0.7

1

2

3 4 5 6 first arrival time [h]

7

8

Figure 9.4: Time of first wave arrival

2004 Indonesian tsunami

As a final test to the model and the extensions presented in this chapter the 2004 tsunami in the Indian ocean is Simulated. The initial height is computed as suggested in [Okada, 1985]. Since the solution is very localized, the dynamical adaptation comes into effect and, while a global model, at the wave fronts high resolutions of up to 0.5km are feasible. The thresholds εh and εu are adjusted as h and u change during the simulation. This allows the adaptation to track the waves, when after several hours their height has reduced below 10%. Figure 9.3 shows the maximum absolute height relative to sea-level that occurred at all time. The Maldives and especially the Seychelles islands can be recognized in the picture since they locally strengthen and in their wake weaken the wave. The light blue stripe on the bottom left (south of Madagascar) shows the effect of the Southwest Indian Ridge to support wave propagation. Figure 9.4 shows the time when the first time a wave of at least 5cm arrives. Again the Seychelles show effect since the waves in their wake are too low to be tracked (i.e. less then 5cm in amplitude) at some places. Figures 9.5 shows the tsunami after 70 minutes into the simulation when the tsunami has reached Indonesia and is now approaching Sri Lanka. The little square in this figure is magnified in figure 9.6 which shows a close-up of 102

Ph.D. Thesis - M. Aechtner

−0.6 −0.4 −0.2 0 0.2 height [m]

McMaster University - CSE

0.4

0.6

9

10

11 12 grid level

13

14

Figure 9.5: Tsunami after 70 minutes

the solution at the wave-front. This is to illustrate the relation of the resolution at level J = 14 of 0.5km to the dimensions of the tsunami. After 4 hours the tsunami has passed India and figure 9.7 shows how it is running against the Maldive islands. The grid on the right shows that refinement is focused at the wave fronts. Figure 9.8 shows the solution after 16 hours. The scale for the height has been changed since the wavefronts are of the order of 10 centimeters when the tsunami has reached the Atlantic Ocean. At that point the tsunami has become a far less localized test-case. This shows how the adaptive method is able to track even smaller waves accurately. However, for hazard prediction the first few hours would be the most important. The tsunami simulation uses simple initial conditions of a single surface elevation as suggested in [Okada, 1985] and thus does not primarily aim at producing a solution that most closely resembles the real propagation of the tsunami. Instead it provides a demonstration of adaptive method for a fast evolving and highly localized test case. Local resolutions below 1km have 103

Ph.D. Thesis - M. Aechtner

McMaster University - CSE

resolution: 500m 1km 10km 100km

Figure 9.6: Tsunami wavefront at high resolution. The time and the colour scale for height are the same as in figure 9.5.

−0.6 −0.4 −0.2 0 0.2 height [m]

0.4

0.6

8

9

Figure 9.7: Tsunami after 4 hours

104

10 11 grid level

12

13

Ph.D. Thesis - M. Aechtner

−0.05

0 height [m]

McMaster University - CSE

0.05

8

9

10 11 grid level

12

13

Figure 9.8: Tsunami after 16 hours

been achieved on a global model with modest consumptions of computational resources: the simulation until the arrival at the African coast requires only two to three days on 100 cores of a computing cluster.

105

Chapter 10 Conclusions and outlook 10.1

Summary

A dynamically adaptive computer model for the shallow water equations on the sphere has been developed. Wavelets and their abilities to compress a function and adapting the corresponding grid have been studied. This included the designing of wavelets on an interval and on the sphere considering stability and efficiency/quality of the wavelet representation. Thereby existing results have been reproduced and in some cases been extended by new developments. With an aim for geophysical applications the adaptive model has been designed to retain mimetic properties of the continuous system in the discrete setting, most importantly mass conservation. This included developing interscale operators that take into account the irregularity of the geometry: flux restriction and wavelet transforms for conserved height and non-separable velocity. The model has been implemented in Fortran with focus on efficiency and parallel execution, with the result that (for serial execution) the computational cost per grid point is three times as big as for a non-adaptive implementation. This is a very moderate overhead, when considering that the adaptive setting increases the cost by several aspects: It is more difficult achieve good memory access pattern. Symmetries cannot be exploited as in the non-adaptive case (where there are e.g. 10 lozenges with the same areas and lengths). Further106

Ph.D. Thesis - M. Aechtner

McMaster University - CSE

more there is overhead for the inter-scale operators and for the grid adaptation routine. Typically, the overhead is in memory footprint and operation count leading to non-linear growth of computing time. On the other hand the increased cost per grid point has to be compared to compression ratios of 10 to 100 meaning an improved efficiency of 3 to 30 times. The Evaluation of the parallel performance shows that the method achieves good computational efficiency (of 75-90%) for up to 100 cores in case of realistic test cases and scales up to 1000 cores for load-balanced test-cases. Further the model has potential to further improvements, e.g. by using hybrid shareddistributed memory approach, multi-constraint load-balancing and/or timing based load-balancing. The model has been extended to allow explicit vorticity and non-slip boundary conditions based on volume penalization. Therewith rotating shallow water turbulence and the 2004 tsunami have been simulated. This has shown that high compression ratios can be achieved even in a challenging homogeneous turbulence case and that for very localized test-cases (like the tsunami) the model is able to resolve scales of half a kilometer. An advantage of the model is that the wavelet components have a potential to be reused with different discretizations. For a Z-grid method the flux restriction and conservative scalar transform could be used for conserving advection of height, vorticity and velocity divergence. Since this method needs to solve a system in every time-step, the overhead of the adaptivity would be reduced. Further the nested structure invites for multi-grid solvers whose optimum complexity keeps the cost of solving the equation-system at a minimum.

10.2

Outlook

Currently work is in progress to apply the adaptive model with penalization boundary condition to gyre flow. This will accompany the fast and localized tsunami simulation by an application with much slower dynamics and test the model in a scenario where integration over several month is required. From there, there are two possible ways: Extending the model to multiple 107

Ph.D. Thesis - M. Aechtner

McMaster University - CSE

layers would allow more realistic simulation of atmospheric flow. In this case the adaptation strategy will only be applied horizontally. Therefore reduced overhead of the adaptive grid is to expect for this hydrostatic extension. The alternative possibility is to stay with oceans and extend the model to a full ocean model. Potential future work further comprises research of the interplay of the adaptive model and sub-grid parametrizations, particularly cumulus clouds. This is considered one of the hardest challenges for adaptive methods because current sub-grid models, while already complex, are designed on the basis of uniform grid resolution. To overcome this hurdle would pave a crucial part of the way towards adaptive dynamical cores for a next generation of climate models.

108

109

Bibliography [Amante and Eakins, 2009] C. Amante and B.W. Eakins. ETOPO1 1 arcminute global relief model: Procedures, data sources and analysis. NOAA Technical Memorandum NESDIS NGDC-24. National Geophysical Data Center, NOAA, 2009. [Behrens, 2009] J¨orn Behrens. Adaptive Atmospheric Modelling. Springer, 2009. [Boffetta and Ecke, 2012] Guido Boffetta and Robert E. Ecke. dimensional turbulence. Annu. Rev. Fluid Mech., 44:427–51, 2012.

Two-

[Claypoole and Baraniuk, 1998] Roger L. Claypoole and Richard G. Baraniuk. Flexible wavelet transforms using lifting. In Proceedings of the SEG Meeting, New Orleans, Louisiana/USA, Sep. 1998. [Daubechies and Sweldens, 1998] I. Daubechies and W. Sweldens. Factoring wavelet transforms into lifting steps. J. Fourier Anal. Appl., 4(3):245–267, 1998. [Donoho, 1992] David L. Donoho. Interpolating wavelet transforms. Technical report, Stanford University, 1992. [Dubos and Kevlahan, 2013] T. Dubos and N.K-R. Kevlahan. A conservative adaptive wavelet method for the shallow water equations on staggered grids. Q. J. R. Meteorol. S., 2013. [Dubos, 2012] Thomas Dubos. personal communication, 2012. 110

Ph.D. Thesis - M. Aechtner

McMaster University - CSE

[Galewsky et al., 2004] Joseph Galewsky, Richard K. Scott, and Lorenzo M. Polvani. An initial-value problem for testing numerical models of the global shallow-water equations. Tellus, 56A:429–440, 2004. [Haar, 1910] A. Haar. Zur Theorie der orthogonalen Funktionensysteme. Math. Ann., 69:331–371, 1910. [Heikes and Randall, 1995] Ross P. Heikes and David A. Randall. Numerical integration of the shallow-water equations on a twisted icosahedral grid. Part I: Basic design and results of tests. Mon. Wea. Rev., 123:1862–1880, 1995. [Heikes et al., 2013] Ross P. Heikes, David A. Randall, and Celal S. Konor. Optimized icosahedral grids: Performance of finite-difference operators and multigrid solver. Mon. Weather Rev., 141:4450–4469, 2013. [Hejazialhosseini et al., 2010] Babak Hejazialhosseini, Diego Rossinelli, Michael Bergdorf, and Petros Koumoutsakos. High order finite volume methods on wavelet-adapted grids with local time-stepping on multicore architectures for the simulation of shock-bubble interactions. J. Comput. Phys., 229(22):8364–8383, 2010. [Karypis and Kumar, 1995] George Karypis and Vipin Kumar. Metis - Unstructured graph partitioning and sparse matrix ordering system, version 2.0. Technical report, Karypis Lab – University of Minnesota, 1995. [Kevlahan and Vasilyev, 2005] Nicholas K.-R. Kevlahan and Oleg V Vasilyev. An adaptive wavelet collocation method for fluid-structure interaction at high reynolds numbers. SIAM J. Sci. Comput, 26(6):1894–1915, 2005. [Krinner et al., 1997] G. Krinner, Z.-X. C. Genthon, and P. Le Van. Studies of the antarctic climate with a stretched-grid general circulation model. J. Geophys. Res. D: Atmos., 102:13731–13745, 1997. [L¨auter et al., 2007] Matthias L¨auter, D¨orthe Handorf, Natalja Rakowsky, J¨orn Behrens, Stephan Frickenhaus, Meike Best, Klaus Dethloff, and Wolf111

Ph.D. Thesis - M. Aechtner

McMaster University - CSE

gang Hiller. A parallel adaptive barotropic model of the atmosphere. J. Comput. Phys., 223(2):609–628, 2007. [Lee et al., 2008] J Lee, R Bleck, A MacDonald, J Bao, S Benjamin, J Middlecoff, N Wang, and J Brown. FIM: A vertically flow-following, finite-volume icosahedral model. In Preprints, 22nd Conf. Wea. Analysis Forecasting/18th Conf. Num. Wea. Pred., Park City, UT, Amer. Meteor. Soc., 2008. [Liandrat and Tchamitchian, 1990] J. Liandrat and PH Tchamitchian. Resolution of the 1D regularized Burgers equation using a spatial wavelet approximation. Technical report, NASA, 1990. [Mallat, 2009] Stephane Mallat. A wavelet tour of signal processing. Elsevier Inc., 2009. [Mehra and Kevlahan, 2008] M. Mehra and N.K.-R. Kevlahan. An adaptive wavelet collocation method for the solution of partial differential equations on the sphere. J. Comput. Phys., 227:5610–5632, 2008. [Nair and Lauritzen, 2010] Ramachandran D. Nair and Peter H. Lauritzen. A class of deformational flow test cases for linear transport problems on the sphere. J. Comput. Physics, 229(23):8868–8887, 2010. [Neale et al., 2010] R. B. Neale, C.-C. Chen, A. Gettelman, P. H. Lauritzen, S. Park, D. L. Williamson, A. J. Conley, R. Garcia, D. Kinnison, J.-F. Lamarque, D. Marsh, M. Mills, A. K. Smith, S. Tilmes, F. Vitt, H. Morrison, P. Cameron-Smith, W. D. Collins, M. J. Iacono, R. C. Easter, S. J. Ghan, X. Liu, P. J. Rasch, and M. A. Taylor. Description of the NCAR Community Atmosphere Model (CAM 5.0). NCAR Technical Note NCAR/TN-486+STR, National Center for Atmospheric Research, Boulder, Colorado, June 2010. 268 pp., available from http://www.cesm.ucar.edu/models/cesm1.0/cam/. [Okada, 1985] Y. Okada. Surface deformation due to shear and tensile faults in a half-space. Bull. Seismol. Soc. Am., 75(4):1135–1154, 1985. 112

Ph.D. Thesis - M. Aechtner

McMaster University - CSE

[Randall, 1994] David A. Randall. Geostrophic adjustment and the finitedifference shallow-water equations. Mon. Wea. Rev., 122:1371–1377, 1994. [Reckinger et al., 2012] S. M. Reckinger, O. V. Vasilyev, and B. Fox-Kemper. Adaptive volume penalization for ocean modeling. Ocean Dynamics, 62(8):1201–1215, 2012. [Ringler et al., 2010] T.D. Ringler, J. Thuburn, J.B. Klemp, and W.C. Skamarock. Unified approach to energy conservation and potential vorticity dynamics for arbitrarily-structured C-grids. J. Comput. Phys., 229:3065– 3090, 2010. [Rivier et al., 2002] L. Rivier, R. Loft, and L. M. Polvani. An efficient spectral dynamical core for distributed memory computers. Mon. Weather Rev., 130(5):1384–1396, 2014/01/23 2002. [Roussel and Schneider, 2010] Olivier Roussel and Kai Schneider. Coherent vortex simulation of weakly compressible turbulent mixing layers using adaptive multiresolution methods. J. Comput. Physics, 229(6):2267–2286, 2010. [Roussel et al., 2003] Olivier Roussel, Kai Schneider, Alexei Tsigulin, and Henning Bockhorn. A conservative fully adaptive multiresolution algorithm for parabolic PDEs. J. Comput. Phys., 188:493–523, 2003. [Schr¨oder and Sweldens, 1995] Peter Schr¨oder and Wim Sweldens. Spherical wavelets: Efficiently representing functions on the sphere. Computer Graphics Proceedings (SIGGRAPH 95), pages 161–172, 1995. [Skamarock et al., 2012] William C. Skamarock, Joseph B. Klemp, Michael G. Duda, Laura D. Fowler, Sang-Hun Park, and Todd D. Ringler. A multiscale nonhydrostatic atmospheric model using centroidal voronoi tesselations and c-grid staggering. Mon. Weather Rev., 140:3090–3105, 2012. [Spiteri and Ruuth, 2002] Raymond J. Spiteri and Steven J. Ruuth. A new class of optimal high-order strong-stability-preserving time discretization methods. J. Numer. Anal., 40:469–491, 2002. 113

Ph.D. Thesis - M. Aechtner

McMaster University - CSE

[St-Cyr et al., 2008] Amik St-Cyr, Christiane Jablonowski, John M. Dennis, Henry M. Tufo, and Stephen J. Thomas. A comparison of two shallowwater models with nonconforming adaptive grids. Mon. Weather Rev., 136(6):1898–1922, June 2008. [Stocker et al., 2013] Thomas F. Stocker, D. Qin, Gian-Kasper Plattner, Melinda M.B. Tignor, Simon K. Allen, Judith Boschung, Alexander Nauels, Yu Xia, V. Bex, and P.M. Midgley (eds.). Working Group I Contribution to the Fifth Assessment Report of the Intergovernmental Panel on Climate Change. Cambridge University Press, 2013. [Sweldens, 1997] W. Sweldens. The lifting scheme: A construction of second generation wavelets. SIAM J. Math. Anal., 29(2):511–546, 1997. [Thuburn et al., 2009] J. Thuburn, T. Ringler, J. Klemp, and W. Skamarock. Numerical representation of geostrophic modes on arbitrarily structured Cgrids. J. Comput. Phys., 228:8321–8335, 2009. [Ullrich and Jablonowski, 2012] Paul A. Ullrich and Christiane Jablonowski. MCore: A non-hydrostatic atmospheric dynamical core utilizing high-order finite-volume methods. J. Comput. Phys., 231(15):5078–5108, 2012. [Vallis, 2006] Geoffrey K. Vallis. Atmospheric and oceanic fluid dynamics. Cambridge University Press, 2006. [Vasilyev and Bowman, 2000] Oleg V. Vasilyev and Christopher Bowman. Second-generation wavelet collocation method for the solution of partial differential equations. J. Comput. Phys., 165:660–693, 2000. [Walnut, 2004] David F. Walnut. Birkh¨auser Boston, 2004.

An Introduction to wavelet analysis.

[Weller, 2012] Hilary Weller. Controlling the computational modes of the arbitrarily structured C grid. Mon. Weather Rev., 140:3220–3234, 2012. [Williamson et al., 1992] David L. Williamson, John B. Drake, James J. Hack, Rudiger Jakob, and Paul N. Swarztrauber. A standard test set for numerical 114

Ph.D. Thesis - M. Aechtner

McMaster University - CSE

approximations to the shallow water equations in spherical geometry. J. Comput. Phys., 102(1):211–224, 1992. [Xu, 2006] G. Xu. Discrete Laplace-Beltrami operator on sphere and optimal spherical triangulations. INT. J. Comput. Geom. Ap., 16:75–93, 2006.

115