Research and Teaching in Statistical and Data Sciences

Organisers

Wei Zhang , University of Glasgow

Past Organisers

Ben Swallow, University of Glasgow
Dirk Husmeier, University of Glasgow

About:

This diverse seminar series will highlight novel advances in methodology and application in statistics and data science, and will take the place of the University of Glasgow Statistics Group seminar during this period of remote working. We welcome all interested attendees at Glasgow and further afield.

For more information please see the University of Glasgow webpage

Call details will be sent out 30mins before the start of the seminar

These seminars are recorded. All recordings can be found here.

THESE SEMINARS WILL BE FORTNIGHTLY.

PLEASE NOTE THE MOVE TO (BST) FOR THE NEXT SEMINAR

Links and recordings

Past Events:

23 Apr 2020

Neil Chada, National University of Singapore

Advancements of non-Gaussian random fields for statistical inversion

14 May 2020

Roberta Pappadà, University of Trieste

Consensus clustering based on pivotal methods

21 May 2020

Ana Basiri, UCL

Who Are the "Crowd"? Learning from Large but Patchy Samples

04 Jun 2020

Colin Gillespie, University of Newcastle

Getting the most out of other people's R sessions

18 Jun 2020

Jo Eidsvik, NTNU

Autonomous Oceanographic Sampling Designs Using Excursion Sets for Multivariate Gaussian random fields

28 Jan 2022

Vinny Davies, University of Glasgow

Computational Metabolomics as a game of Battleships

Liquid chromatography (LC) coupled to tandem mass spectrometry (MS/MS) is widely used in identifying small molecules in untargeted metabolomics. Strategies for acquiring data in LC-MS/MS are however very limited and we usually only acquire around 30% of the data available to the detriment of follow-up experiments. In our recent work we have developed a Virtual Metabolomics Mass Spectrometer (ViMMS) which allows us to develop, evaluate and test new data acquisition strategies without the cost of using valuable mass spectrometer time. These methods can be used in simulation or in real experiments through an Application Programming Interface to the mass spectrometer. In this talk I will briefly describe ViMMS, before using the game Battleship to demonstrate and describe the new innovations we are developing using machine learning and statistics.

11 Feb 2022

Oliver Stoner, University of Glasgow

Statistical methods for nowcasting daily hospital deaths from COVID-19

Delayed reporting is a significant problem for effective pandemic surveillance and decision-making. In the absence of timely data, statistical models which account for delays can be adopted to nowcast and forecast cases or deaths. I will first explain four key sources of systematic and random variability in available data for daily COVID-19 deaths in English hospitals. Then, I will present a general hierarchical approach which I claim can appropriately separate and capture this variability better than competing approaches. I will back up my claim with theoretical arguments and with results from a rolling prediction experiment imitating real-world use of these methods throughout the second wave of COVID in England.

25 Feb 2022

Mark Keane, University College Dublin

Explaining artificial intelligence: contrastive explanations for AI black boxes and what people think of them.

Abstract: In recent years, there has been a lot of excitement around the apparent success of Deep Learning in AI. There has also been a decent amount of skepticism around the issue of knowing what these models are actually doing, when they are being successful. This has led to the emerging area of Explainable AI, where techniques have been developed to explain a model’s workings to end-users and model developers. Recently, contrastive explanations (counterfactual and semi-factual) have become very popular for explaining the predictions of such black-box AI systems. For example, if you are refused a loan by an AI and ask “why”, a counterfactual explanation might tell you, “well, if you asked for a smaller loan, then you would have been granted the loan.”. These counterfactuals are generated by methods that perform perturbations of the feature values of the original situation (e.g., we perturb the value of the loan). In this talk, I review some of the contrastive methods we have developed for different datasets (tabular, image, time-series) based on a case-based reasoning approach. I also review some of our very recent work on user studies testing whether these AI methods are comprehendible to users in the ways that are assumed by AI researchers (Spoiler Alert: they often aren’t).

11 Mar 2022

Yunpeng Li, University of Surrey

Going with flow: transport methods and neural networks for sequential Monte Carlo methods

Abstract: Sequential state estimation in non-linear and non-Gaussian state spaces has a wide range of applications in signal processing and statistics. One of the most effective non-linear filtering approaches, particle filters a.k.a. sequential Monte Carlo methods, suffer from weight degeneracy in high-dimensional filtering scenarios. A particular challenge for the deployment of particle filters is the need to specify the often nonlinear models that simulate state dynamics and their relation to measurements. This becomes non-trivial for practitioners when dealing with complex environments and big data. In the first part of the talk, I will present new filters which incorporate physics-inspired particle flow methods into an encompassing particle filter framework. The valuable theoretical guarantees concerning particle filter performance still apply, but we can exploit the attractive performance of the particle flow methods. The second part of the talk will focus on learning different components of particle filters through neural networks particularly normalizig flow, to provide flexibility to apply particle filters in large-scale real-world applications.

25 Mar 2022

Wanyu Lin, Hong Kong Polytechnic University

Generating Causal Explanations for Graph Neural Networks

PLEASE NOTE THIS SEMINAR WILL TAKE PLACE AT 13:00 GMT

These years, we have witnessed the increasing attention of deep learning on graphs with graph neural networks (GNNs) from academia and industry. GNNs have exhibited superior performance across various disciplines, such as healthcare systems, financial systems, and social information systems. These systems are typically required to make critical decisions, such as disease diagnosis in the healthcare systems. With the global calls for accountable and ethical use of artificial intelligence (AI), model explainability has been broadly recognized as one of the fundamental principles of using machine learning technologies on decision-critical applications. However, despite their practical success, most GNNs are deployed as black boxes, lacking explicit declarative knowledge representations. The deficiency of explanations for the decisions of GNNs significantly hinders the applicability of these models in decision-critical settings, where both predictive performance and interpretability are of paramount importance. For example, medical decisions are increasingly being assisted by complex predictions that should lend themselves to be verified by human experts easily. Model explanations allow us to argue for model decisions and exhibit the situation when algorithmic decisions might be biased or discriminating. In addition, precise explanations may facilitate model debugging and error analysis, which may help decide which model would better describe the data's underlying semantics. In this seminar, we are going to unveil the inner working of GNNs from the lens of causality.

08 Apr 2022

Víctor Elvira, University of Edinburgh

HIGH-PERFORMANCE IMPORTANCE SAMPLING SCHEMES FOR BAYESIAN INFERENCE

PLEASE NOTE THE MOVE TO (BST) FOR THIS SEMINAR

Importance sampling (IS) is an elegant, theoretically sound, flexible, and simple-to-understand methodology for approximation of moments of distributions in Bayesian inference (and beyond). The only requirement is the point-wise evaluation of the targeted distribution. The basic mechanism of IS consists of (a) drawing samples from simple proposal densities, (b) weighting the samples by accounting for the mismatch between the targeted and the proposal densities, and (c) approximating the moments of interest with the weighted samples. The performance of IS methods directly depends on the choice of the proposal functions. For that reason, the proposals have to be updated and improved with iterations so that samples are generated in regions of interest. In this talk, we will first introduce the basics of IS and multiple IS (MIS), motivating the need of using several proposal densities. Then, the focus will be on motivating the use of adaptive IS (AIS) algorithms, describing an encompassing framework of recent methods in the current literature. Finally, we will briefly present some numerical examples where we will study the performance of various IS-based algorithms.