# Data Science

Data-driven models, inverse problems, uncertainty quantification

### Data-driven dimensionality reduction in probabilistic prediction

High-dimensional noisy time series associated with the underlying complex dynamics often contain redundant information and can be compactly represented by a dynamical process on a low-dimensional manifold; this is commonly referred to as the ‘manifold hypothesis’ and is related to the concentration of measure phenomenon in high-dimensional data sets. Due to the linear character of classical dimension reduction methods, such as the Principal Component Analysis, they are ill-suited to recover the nonlinear structure of the underlying state manifold. Robust and accurate dynamical predictions, based on reduced-order models extracted from empirical data, require a systematic understanding of how to combine manifold learning methods with analysis & probability techniques for extracting dynamical features from noisy or incomplete time-dependent data. This project will approach this problem from the probabilistic/stochastic viewpoint. For details please contact Michal Branicki (m.branicki@ed.ac.uk).

### Information theory, coarse-graining and predictability of complex dynamics

For details on the range of potential topics in this area please contact Michal Branicki (m.branicki@ed.ac.uk).

### A stochastic framework for microseism generation and for uncertainty quantification in seismic tomography

The study of terrestrial microseismic noise correlations holds a great promise as a means of learning the structure of the Earth’s crust and studying its temporal variations. Apart from the practical applications, seismic tomography is wide open to developing systematic mathematical approaches for robust inverse methods and uncertainty quantification. Microseisms are seismo-acoustic waves excited by nonlinear interactions of ocean waves whose energy is trapped within a wave guide established by the seafloor and the steep gradients in elastic velocities in the crust and upper mantle. The microseism sources illuminating the Earth’s crust are typically not co-spatial with tectonically active regions implying a continuous ability to monitor the crust. Longuet-Higgins (1950) first argued how the interaction between surface gravity waves could lead to the excitation of high phase velocity acoustic components. However, there is no theory for the temporal intermittency and non-Gaussianity which are apparent in the microseism time series. Understanding these non-Gaussian effects is often crucial for carrying out robust full wave field inversion which is essential in petroleum industry. This project will be concerned with developing a stochastic approach to the dynamic microseism generation. Informal enquiries can be made to Michal Branicki (m.branicki@ed.ac.uk).

### Data-driven models in molecular dynamics

Multiscale modelling plays an essential role in molecular simulation as the range of scales involved precludes the use of a single, unified system of equations. The most accurate model is quantum mechanics which describes the evolution of a system of nuclei and electrons. When a modest-sized quantum system is discretized for numerical solution, there results an unimaginably large number of equations which can swamp even the most powerful computer systems. A classical model based on potential energy functions for the interaction of atomic nuclei provides a much simplified description, but one that precludes many important effects (breakage of bonds, quantum tunnelling, etc.). Even the classical description must be further 'coarse-grained' to provide an effective scheme for large scale or slow-developing processes that would otherwise remain inaccessible in computer simulation. In a multiscale model, different models are unified by the use of bridging algorithms, numerical and analytical averaging, and reliance on the principles of statistical mechanics. In this project, the goal is to use experimental data in place of simulation data to capture complex local processes and low-level interactions in a molecular system . A system is no longer viewed as being described by a single inter-molecular potential energy surface, but rather by a collection of surfaces which can be locally determined, on-the-fly, from tabulated data. The resulting procedures will engender methodological changes in order to retain statistical properties that are relevant for the simulator. This project has aspects of molecular dynamics, computational statistical mechanics and quantum mechanics. It further relates to machine learning and has applications in materials modelling. Informal enquiries can be made to Ben Leimkuhler (b.leimkuhler@ed.ac.uk).

### Techniques for Uncertainty Quantification

Data assimilation techniques can be used to combine a numerical model with observations -- the numerical model captures the physics of the problem, while the observations provide information about the real system. However observations have associated errors, and these errors lead to uncertainty in state estimates. This project will study the application of uncertainty quantification techniques to the study of geophysical fluid flows, including techniques based on algorithmic differentiation or Monte Carlo methods. Informal enquiries can be made to James Maddison (j.r.maddison@ed.ac.uk).

### PDE-constrained optimization in scientific processes

A vast number of important and challenging applications in mathematics and engineering are governed by inverse problems. One crucial class of these problems, which has significant applicability to real-world processes, including those of fluid flow, chemical and biological mechanisms, medical imaging, and others, is that of PDE-constrained optimization. However, whereas such problems can typically be written in a precise form, generating accurate numerical solutions on the discrete level is a highly non-trivial task, due to the dimension and complexity of the matrix systems involved. In order to tackle practical problems, it is essential to devise strategies for storing and working with systems of huge dimensions, which result from fine discretizations of the PDEs in space and time variables. In this project, "all-at-once" solvers coupled with appropriate preconditioning techniques will be derived for these systems, in such a way that one may achieve fast and robust convergence in theory and in practice. Informal enquiries can be made to John Pearson (j.pearson@ed.ac.uk). This project is related to the EPSRC Fellowship http://gow.epsrc.ac.uk/NGBOViewGrant.aspx?GrantRef=EP/M018857/1.

### Numerical analysis of Bayesian inverse problems

In areas as diverse as climate modelling, geosciences and medicine, mathematical models and computer simulations are routinely used to inform decisions and assess risk. However, the parameters appearing in the mathematical models are often unknown, and have to be estimated from measurements. This project is concerned with the inverse problem of determining the unknown parameters in the model, given some measurements of the output of the model. In the Bayesian framework, the solution to this inverse problem is the probability distribution of the unknown parameters, conditioned on the observed outputs. Combining ideas from numerical analysis, statistics and stochastic analysis, this project will address questions related to the error introduced in the distribution of the parameters, when the mathematical model is approximated by a numerical method. Informal enquiries can be made to Aretha Teckentrup (A.Teckentrup@ed.ac.uk).

### Optimal construction of statistical interpolants

Many problems in science and engineering involve an unknown complex process, which it is not possible to observe fully and accurately. The goal is then to reconstruct the unknown process, given a small number of direct or indirect observations. Mathematically, this problem can be reformulated as reconstructing a function from limited information available, such as a small number of function evaluations. Statistical approaches, such as interpolation or regression using Gaussian processes, provide us with a best guess of the unknown function, as well as a measure of how confident we are in our reconstruction. Combining ideas from machine learning, numerical analysis and statistics, this project will address questions related to optimal reconstructions, such as the optimal choice of the location of the function evaluations used for the reconstruction. Informal enquiries can be made to Aretha Teckentrup (A.Teckentrup@ed.ac.uk).

### Sampling methods in uncertainty quantification

For details on the range of potential topics in this area please contact Aretha Teckentrup (A.Teckentrup@ed.ac.uk).

### Stochastic differential equations, sampling and big data

For details on the range of potential topics in this area please contact Konstantinos Zygalakis (kzygalakis@ed.ac.uk)