Computational Strategies for Large-Scale Statistical Data Analysis

Home > What's on > Workshops > Computational Strategies for Large-Scale Statistical Data Analysis

Computational Strategies for Large-Scale Statistical Data Analysis

 02 - 06 Jul 2018
 ICMS, 15 South College Street Edinburgh

Scientific Organisers:

  • Guang Cheng, Purdue University
  • Chenlei Leng, University of Warwick

Large-scale data is increasingly encountered in biology, medicine, engineering, social sciences and economics with the advance of the measurement technology. A distinctive feature of such data is that it usually comes with a large sample size and/or a large number of features, creating challenges for data storage, processing and data analysis. On the other hand, classical statistical methodology, theory and computation have been developed based on the assumption that the entire data reside on a central location. As a result, most classical statistical methods face computational challenges for analysing large-scale data in the big data era. Specifically, big data is known to possess the so-called 4D features: Distributed, Dirty, Dimensionality and Dynamic. These features make it very challenging to apply traditional statistical thinking to massive data.

The main aims of this workshop were to exchange developments made in distributed data analysis and aggregated inference with consideration on computational complexity and statistical properties of relevant estimators; to discuss open challenges, exchange research ideas and forge collaborations in three research areas: statistics, machine learning and optimisation; to promote the development of software with justified statistical properties and efficient computational properties; to engage more UK young researchers to work at the interface of computing and statistics.


  • David Dunson, Duke University - Scaling up Bayesian Inference

  • Ata Kaban, University of Birmingham - Structure Aware Generalisation Error Bounds Using Random Projections

  • Eric Xing, Carnegie Mellon University - On System and Algorithm Co-Design and Automatic Machine Learning

  • Yoonkyung Lee, The Ohio State University - Dimensionality Reduction for Exponential Family Data

  • Jinchi Lv, University of Southern California - Asymptotics of Eigenvectors and Eigenvalues for Large Structured Random Matrices

  • Faming Liang, Purdue University - Markov Neighbourhood Regression for High-Dimensional Inference

  • Ping Ma, University of Georgia - Asympirical Analysis: a New Paradigm for Data Science

  • Mladen Kolar, University of Chicago - Recovery of Simultaneous Low Rank and Two-Way Sparse Coefficient Matrices, a Nonconvex Approach

  • Guang Cheng, Purdue University - Large-Scale Nearest Neighbour Classification with Statistical Guarantee

  • Yining Chen, LSE - Narrowest-Over-Threshold Detection of Multiple Change-points and Change-point-like Features

  • Haeran Cho, University of Bristol - Multiscale MOSUM Procedure with localised Pruning

  • Jason Lee, University of Southern California - Geometry of Optimization Landscapes and Implicit Regularization of Optimization Algorithms

  • Chen Zhang, University College London - Variational Gaussian Approximation for Poisson Data

  • Jeremias Knoblauch, University of Warwick - Bayesian Online Changepoint Detection and Model Selection in High-Dimensional Data

  • Stanislav Volgushev, University of Toronto - Distributed Inference for Quantile Regression Processes

  • Hua Zhou, University of California - Global Solutions of Generalized Canonical Correlation Analysis Problems

  • Matteo Fasiolo, University of Bristol - Calibrated Additive Quantile Regression

  • Wenxuan Zhong, University of Georgia - Leverage Sampling to Overcome the Computational Challenges for Big Spatial Data

  • Moulinath Banerjee, University of Michigan - Divide and Conquer in Nonstandard Problems: the Super-Efficiency Phenomenon

  • Qifan Song, Purdue University - Bayesian Shrinkage Towards Sharp Minimaxity

  • Binyan Jiang, The Hong Kong Polytechnic University - Penalized Interaction Estimation for Ultra High Dimensional Quadratic Regression

  • Chao Zheng, Lancaster University - Revisiting Huber’s M-Estimation: a Tuning-Free Approach

  • Xin Bing, Cornell University - A Fast Algorithm with Minimax Optimal Guarantees for Topic Models with an Unknown Number of Topics

  • Cheng Qian, LSE - Covariance and Graphical Modelling for High-Dimensional Longitudinal and Functional Data

  • Didong Li, Duke University - Efficient Manifold and Subspace Approximations with Spherelets

  • Xiaoming Huo, Georgia Institute of Technology - Non-Convex Optimization and Statistical Properties


Mouli, BanerjeeUniversity of Michigan
Mike, BingCornell University
Yining, ChenLondon School of Economics
Guang, ChengPurdue University
Haeron, ChoUniversity of Bristol
David, DunsonDuke University
Matteo, FasioloUniversity of Bristol
Zhenzhong, HuangDepartment of Statistics, University of Warwick
Xiaoming, HuoGeorgia Institute of Technology
Ata, KabanUniversity of Birmingham
Jeremias, KnoblauchWarwick University
Mladen, KolarUniversity of Chicago
Jason, LeeUniversity of Southern California
Yoonkyung, LeeOhio State University
Chenlei, LengUniversity of Warwick
Didong, LiDuke University
Junchi, LiPrinceton University
Faming, LiangUniversity of Florida
Ping, MaUniversity of Georgia
Cheng, QianLondon School of Economics
Sandipan, RoyUniversity College London
Stefan, SteinUniversity of Warwick
Weijie, SuUniversity of Pennsylvania
Lizhu, TaoUniversity of Warwick
Stanislav, VolgushevToronto University
Xiangyu, WangGoogle LLC
Chao, ZhengHeriot-Watt University
Wenxuan, ZhongUniversity of Georgia
Hua, ZhouUCLA