Potential projects in statistics
Opportunities for PhD projects in Statistics with internationally leading academics in the Statistics group, School of Mathematics
Unless otherwise stated, the projects are funded by a University of Edinburgh scholarship which fully covers the cost of tuition fees and provides an annual stipend and are open to home, EU, and overseas students. The following list is not exhaustive and there will be more projects available. Candidate students should contact staff members in their area as soon as possible. Statistics projects listed at the SENSE and NERC DTPs are also available.
To apply for a PhD in Statistics, visit the University of Edinburgh DegreeFinder webpage.
Global and regional variability and uncertainty in extreme sea states across climate-quality observations and other long term data sets
(part of the SENSE Centre for Doctoral Training)
Accurate knowledge and understanding of the sea state and its variability is crucial to numerous oceanic and coastal engineering applications, but also to climate change and related impacts including coastal erosion and inundation, and changing sea-ice interaction. The largest impacts and highest risks are associated with the most energetic conditions.
Various studies have examined the extremes of wave height both locally and globally, on a range of temporal scales and using a range of data sources (Izaguirre et al., 2011; Timmermans et al., 2017; Kumar et al., 2018; Stopa et al., 2019; Young & Ribal, 2019). Simulated hindcasts, re-analysis and satellite observations have been used as source data but more recently, independent groups (Young and Ribal, 2019; European Space Agency, Climate Change Initiative) have produced updated quality controlled and calibrated data sets based on the most currently available satellite observations. These offer potentially the most accurate and up-to-date insight into the spatial and temporal variability of recent (extreme) wave climate. Furthermore, the larger samples of observation arising from the use of more recent observations may provide the opportunity to examine extremes at fairly small spatial scales, such as coastal regions. However, recent comparative analysis of several global gridded products has revealed considerable disagreement in temporal and geographic characteristics of (mean) wave climate with indication that similar, if not more dramatic, discrepancies exist for more extreme conditions. The development of statistical models is crucial since the underlying causes behind discrepancies remain unclear, as do the potential implications for projected impacts in relevant regions.
The primary application area of this proposal relates to characterising extreme conditions in oceans world wide but the proposed statistical methodology is generic. Different oceans exhibit very different behaviour and statistical descriptions need to be sensitive and flexible in this respect. The proposed statistical approach will be based on graphical models for multivariate extremes and will facilitate spatio-temporal of variables such as significant wave height, wind speed and current speed, but also combinations of the different variables within one model, with quite general extremal dependence structure. This is particularly useful when computer simulators for extremes of an environment are compared with actual observations but also because the most important characteristics of environmental extremes are not contemporaneous. For example, the extremum of significant wave height within a storm may not coincide with extrema of wind speed or wave peak frequency and the proposed statistical models will aim at capturing and understanding such temporal incoherences better. In this project the student will exploit multiple sea state data sets in order to identify geographic regions of interest characterised by variability and disagreement across data sets, taking particular account of regions potentially vulnerable to coastal impact. The effects of interannual and decadal variability may be relevant, as well as the influence of severe weather systems such as tropical cyclones.
The output from this research will help to identify more broadly and consistently (geographically) where and why uncertainty affects temporal characteristics of the sea state representation across the climate-quality data record, and how this might influence longer term projections and planning in regions where impacts could become important.
The student is anticipated to be located at the School of Mathematics at the University of Edinburgh, with good opportunity to interact with researchers at the NOC sites. The project would suit a student with strengths primarily in statistics and the physical sciences (Physics, Maths, Engineering), familiarity with programming (typically R, Python or similar), the handling and analysis of large data sets and an interest in oceanographic and coastal processes, and their interaction and impacts on coastal communities and industry.
This PhD is part of the NERC and UK Space Agency funded Centre for Doctoral Training "SENSE": the Centre for Satellite Data in Environmental Science. SENSE will train 50 PhD students to tackle cross-disciplinary environmental problems by applying the latest data science techniques to satellite data. All our students will receive extensive training on satellite data and AI/Machine Learning, as well as attending a field course on drones, and residential courses hosted by the Satellite Applications Catapult (Harwell), and ESA (Rome). All students will experience extensive training on professional skills, including spending 3 months on an industry placement. See http://www.eo-cdt.org
Contact: Ioannis Papastathopoulos, mailto:email@example.com
Statistical Analysis of Literature and Social Media
The field of stylometry uses statistical techniques to analyse literature and answer questions about authorship. A typical question would be “Given two different pieces of writing, it is possible to determine whether both pieces have been written by the same author, or by two different authors?”. This is often phrased as a supervised learning problem where the goal is to build a statistical or machine learning model from a training set consisting of previous (known) works that each candidate author has written, and using this model to make inferences about the probability of them being the author of the new text.
Such techniques have previously been used for the analysis of literary works, such as detecting forgeries when a newly discovered work is claimed to have been written by some famous author (e.g. Shakespeare). Recently, there has been an increased interest in applying these techniques to the analysis of social media data. Questions here might include:
- If a person claims that their social media account has been hacked, is it possible to determine whether posts that have been made after the hack were really written by the original author?
- If we suspect that two user accounts on a platform are controlled by the same person, is it possible to confirm this using statistical analysis?
This project aims to develop new methodology for the analysis of writing, and apply it to both literary and social media applications. There are many potential projects in this area, and some potential methodological issues might include: the use of hierarchal modelling or regularisation to help scale traditional stylometric methods up to large social media datasets. Nonparametric modelling of authorship style. Unsupervised learning where we do not have a training set for each author. Etc.
Contact: Gordon Ross, mailto:firstname.lastname@example.org.
Theory of Nonparametric Bayesian Inference
Bayesian nonparametric methods with Dirichlet process and Gaussian priors, among others, are widely used in practice and are part of some machine learning methods. However, it is known that they can lead to inconsistent inference. Hence, it is important to study large sample properties of inference based on these models, such the rate of contraction of the posterior distribution and local concentration of the posterior around the true parameter (known as the Bernstein-von Mises theorem). There are various open problems in this area that can be chosen as a potential PhD project which will be supervised by Dr Natalia Bochkina.
Contact: Natalia Bochkina, mailto:email@example.com
Statistical modelling of grid-cell firing using log-Gaussian Cox processes through the SPDE approach
One of the most important unsolved problems in science today is to understand the codes that neurons in our brains use to communicate with one another and, collectively, to generate phenomena such as perception and cognition. Currently, approaches to this problem are limited by our ability to analyze neural codes. Grid cells are nerve cells in the entorhinal cortex that represent the location of an animal in its environment and, in combination with place cells, form a coordinate system that allows spatial navigation and learning of maps of the world. There are many open problems in this area that are related to accurate identification of covariate effects on the patterns of grid-cell firing and include an opportunity to develop new statistical methodology for point processes. This project will look at developing novel methods for the statistical modelling of neural firing based on the class of log-Gaussian Cox processes. The methods will be based on the SPDE approach to Gaussian Fields and will facilitate spatial, temporal and directional covariate effects on the intensity of grid-cell point patterns. This methodology will provide a key basis for understanding and quantifying the grid-field and more importantly for investigating how additional information can be multiplexed within the grid representation.
Full funding for UK/EU only.
Contact: Ioannis Papastathopoulos, mailto:firstname.lastname@example.org/
Sequential Bayesian inference in complex and realistic dynamical systems
This PhD position will be at the interesting overlap between statistical signal processing, statistics, and machine learning, motivated by applications that aim to improve human life and environment. The successful applicant will be supervised by Dr. Victor Elvira. Several international collaborations with scientists in France and USA are also expected.
Many problems in different scientific domains can be described through statistical models that relate the sequential observed data to a hidden process through some unobserved parameters. In the Bayesian framework, the probabilistic estimation of the unknowns is represented by the posterior distribution of these parameters. However in most of the realistic models, the posterior is intractable and must be approximated. Importance Sampling (IS)-based algorithms are Monte Carlo methods that have shown a satisfactory performance in many problems of Bayesian inference, including the sequential setting.
In this thesis, we will develop novel IS-based methods for Bayesian inference in complex systems (high-dimensional, large amount of data, non-linear non-Gaussian relations, with model misspecification, etc). More specifically, we will propose novel efficient computational methods to deal with these complex models in order to overcome current limitations of more traditional Monte Carlo techniques in such a challenging context. Many applications can be benefited from the development of these methodologies, including inferential problems in climatology, biological systems, or ecology, among many others.
Contact: Victor Elvira, mailto:email@example.com
Earthquake Forecasting Using Machine Learning and Statistics
Although the occurrence time of individual earthquakes cannot be predicted exactly, statistical models are able to give relatively accurate forecasts for the long-term probability of large earthquakes occurring in particular geographic regions. Such forecasts are useful for risk management, as well as for allowing insurance companies to accurately price their models.
Many statistical forecasting approaches treat earthquakes as a point process which is then fitted to particular earthquake regions. The most well known model is the ETAS (Epidemic Type Aftershock Sequence model). This project will explore and extend the use of ETAS-type models to specific earthquake forecasting scenarios. There are many open problems in this area that include the opportunity to develop new methodology for point processes, and the application of such point processes to a rich variety of spatial datasets.
Contact: Gordon Ross, mailto:firstname.lastname@example.org