Computational Strategies for Large-Scale Statistical Data Analysis

Guang Chen, Purdue University
Chenlei Leng, University of Warwick

About:

Large-scale data is increasingly encountered in biology, medicine, engineering, social sciences and economics with the advance of the measurement technology. A distinctive feature of such data is that it usually comes with a large sample size and/or a large number of features, creating challenges for data storage, processing and data analysis. On the other hand, classical statistical methodology, theory and computation have been developed based on the assumption that the entire data reside on a central location. As a result, most classical statistical methods face computational challenges for analysing large-scale data in the big data era. Specifically, big data is known to possess the so-called 4D features: Distributed, Dirty, Dimensionality and Dynamic. These features make it very challenging to apply traditional statistical thinking to massive data.

The main aims of this workshop were to exchange developments made in distributed data analysis and aggregated inference with consideration on computational complexity and statistical properties of relevant estimators; to discuss open challenges, exchange research ideas and forge collaborations in three research areas: statistics, machine learning and optimisation; to promote the development of software with justified statistical properties and efficient computational properties; to engage more UK young researchers to work at the interface of computing and statistics.

Speakers

David Dunson, Duke University	Scaling up Bayesian Inference
Ata Kaban, University of Birmingham	Structure Aware Generalisation Error Bounds Using Random Projections
Eric Xing, Carnegie Mellon University	On System and Algorithm Co-Design and Automatic Machine Learning
Yoonkyung Lee, The Ohio State University	Dimensionality Reduction for Exponential Family Data
Jinchi Lv, University of Southern California	Asymptotics of Eigenvectors and Eigenvalues for Large Structured Random Matrices
Faming Liang, Purdue University	Markov Neighbourhood Regression for High-Dimensional Inference
Ping Ma, University of Georgia	Asympirical Analysis: a New Paradigm for Data Science
Mladen Kolar, University of Chicago	Recovery of Simultaneous Low Rank and Two-Way Sparse Coefficient Matrices, a Nonconvex Approach
Guang Cheng, Purdue University	Large-Scale Nearest Neighbour Classification with Statistical Guarantee
Yining Chen, LSE	Narrowest-Over-Threshold Detection of Multiple Change-points and Change-point-like Features
Haeran Cho, University of Bristol	Multiscale MOSUM Procedure with localised Pruning
Jason Lee, University of Southern California	Geometry of Optimization Landscapes and Implicit Regularization of Optimization Algorithms
Chen Zhang, University College London	Variational Gaussian Approximation for Poisson Data
Jeremias Knoblauch, University of Warwick	Bayesian Online Changepoint Detection and Model Selection in High-Dimensional Data
Stanislav Volgushev, University of Toronto	Distributed Inference for Quantile Regression Processes
Hua Zhou, University of California	Global Solutions of Generalized Canonical Correlation Analysis Problems
Matteo Fasiolo, University of Bristol	Calibrated Additive Quantile Regression
Wenxuan Zhong, University of Georgia	Leverage Sampling to Overcome the Computational Challenges for Big Spatial Data
Moulinath Banerjee, University of Michigan	Divide and Conquer in Nonstandard Problems: the Super-Efficiency Phenomenon
Qifan Song, Purdue University	Bayesian Shrinkage Towards Sharp Minimaxity
Binyan Jiang, The Hong Kong Polytechnic University	Penalized Interaction Estimation for Ultra High Dimensional Quadratic Regression
Chao Zheng, Lancaster University	Revisiting Huber’s M-Estimation: a Tuning-Free Approach
Xin Bing, Cornell University	A Fast Algorithm with Minimax Optimal Guarantees for Topic Models with an Unknown Number of Topics
Cheng Qian, LSE	Covariance and Graphical Modelling for High-Dimensional Longitudinal and Functional Data
Didong Li, Duke University	Efficient Manifold and Subspace Approximations with Spherelets
Xiaoming Huo, Georgia Institute of Technology	Non-Convex Optimization and Statistical Properties