All Hands Meetings on Big Data Optimization - Semester 2, 2016-2017

Venue: James Clerk Maxwell Building ROOM: JCMB 5323 (5th floor)
Time: Tuesdays 12:15 - 13:30 (lunch provided)

We thankfully acknowledge support from the Head of School of Mathematics and the Center for Doctoral Training in Data Science

Date Speaker Paper
May 23, 2017
No meeting due to SIAM Conference on Optimization
May 16, 2017

May 9, 2017

May 2, 2017

April 25, 2017

April 18, 2017

April 11, 2017

April 4, 2017

March 28, 2017

March 21, 2017

March 14, 2017

March 7, 2017

February 28, 2017

February 21, 2017

February 14, 2017 Kostas Zygalakis
A differential equation for modeling Nesterov's accelerated gradient method: theory and insights (Su, Boyd and Candes - NIPS 2014)
February 7, 2017 László A. Végh (LSE)
Rescaled first-order methods for linear programming (Dadush, Végh and Zambelli 11/2016)
January 31, 2017 Filip Hanzely TBA
January 24, 2017 Ion Necoara (Bucharest)
Linear convergence of first order methods for non-strongly convex optimization (Necoara, Nesterov and Glineur - 4/2015)
January 17, 2017 Armin Eftekhari (The Alan Turing Institute)
The alternating descent conditional gradient method for sparse inverse problems (Boyd, Schiebinger and Recht - 7/2015)

Organizers: Nicolas Loizou and Peter Richtárik

All Hands Meetings on Big Data Optimization - Semester 1, 2016-2017

Venue: James Clerk Maxwell Building ROOM: JCMB 6207 (6th floor)
Time: Tuesdays 12:15 - 13:30 (lunch provided: thanks to the support of the Head of School)

Date Speaker Paper
December 13, 2016
Panos Parpas
Using variational techniques to understand accelerated methods (Wibisono, Wilson and Jordan - 3/2016)
December 6, 2016
No meeting (NIPS)
November 29, 2016
Iain Murray
Fitting real-valued conditional distributions.

Abstract: Neural networks can be used for regression. Given an input x, guess the output y. The standard optimization task is to minimize some regularized
measure of mismatch between guesses and observed training outputs.

Neural networks can also express their own uncertainty. For example, we
can fit two functions, a guess m(x) and an "error-bar" s(x), by maximizing the total log probability of training outputs under a Gaussian model: \sum_n log N(y_n; m(x_n), s(x_n)^2).

Fitting functions representing Gaussian outputs by stochastic steepest descent can be hard: the gradients of the loss with respect to the mean depend strongly on the standard deviation, making it hard to adapt step-sizes.

Moving beyond the Gaussian assumption, we might represent p(y|x) with a mixture of Gaussians, or with quantiles. For multivariate y we can use multivariate Gaussians or RNADE. Gaussians are also fitted in stochastic variational inference, sometimes with diagonal covariances, sometimes low-rank + diagonal.

We are able to optimize all these things to some extent, but it's harder than conventional neural networks, which hinders wide-spread adoption of the methods.

Relevant papers Mixture Density Networks (MDNs), Multivariate MDN, RNADE, Bayesian MDN, matrix manifold optimization for Gaussian mixtures
November 22, 2016
Lukasz Szpruch An analytical framework for a consensus-based global optimization method (Carrillo, Choi, Totzek and Tse - 1/2016)
November 15, 2016
Dominik Csiba Stochastic Optimization with Variance Reduction for Infinite Datasets with Finite-Sum Structure (Bietti and Mairal - 10/2016)
November 8, 2016
Aretha Teckentrup
Large-scale Gaussian process regression via doubly stochastic gradient descent (Yan, Xie, Song and Boots - 2015)
November 1, 2016
Filip Hanzely
Variance reduction for faster non-convex optimization (Allen-Zhu and Hazan - 3/2016)
October 25, 2016 Dominik Csiba Linear coupling: an ultimate unification of gradient and mirror descent (Allen-Zhu and Orecchia - 1/2015)
October 18, 2016 Jakub Konečný Train faster, generalize better: Stability of stochastic gradient descent (Hardt, Rech and Singer - 7/2016)
October 11, 2016 Nicolas Loizou Convergence rates for greedy Kaczmarz algorithms, and faster randomized Kaczmarz rules using the orthogonality graph (Nutini, Sepehry, Laradji, Schmidt, Koepke, Virani - UAI 2016) supplementary material poster
October 4, 2016 Jakub Konečný Differentially private empirical risk minimization (Chaudhuri, Monteleoni, Sarwate - JMLR 2011)
September 27, 2016 Dominik Csiba Online ad allocation via online optimization (Jenatton, Huang, Csiba and Archambeau - 6/2016)

Organizers: Dominik Csiba and Peter Richtárik

All Hands Meetings on Big Data Optimization - Semester 2, 2015-2016

Venue: James Clerk Maxwell Building ROOM: JCMB 4312 (4th floor)
Time: Tuesdays 12:15 - 13:30 (lunch provided: thanks to the support of the Head of School)

Date Speaker Paper
May 3, 2016
JC Pesquet (Paris)
A stochastic majorize-minimize subspace algorithm with application to filter identification (Chouzenoux and Pesquet - 12/2015)
April 26, 2016
Robert M Gower Open-ended research discussion on the topic: "Newton-type methods for solving the empirical risk minimization problem"
April 19, 2016
Haihao Lu (MIT)
Norm-free methods
April 12, 2016
Sebastian Stich (CORE)
A simple, combinatorial algorithm for solving SDD systems in nearly-linear time (Kelner, Orecchia, Sidford, Allen-Zhu - 1/2013)
April 5, 2016
No meeting
March 29, 2016
No meeting
March 22, 2016
Nicolas Loizou Second order stochastic optimization in linear time (Agarwal, Bullins and Hazan - 2/2016)
March 15, 2016
Robert M Gower Sub-sampled Newton methods I: globally convergent algorithms (Roosta-Khorasani and Mahoney - 1/2016)
March 8, 2016
No meeting
I am in Oberwolfach...
March 1, 2016 Dominik Csiba Local smoothness in variance-reduced optimization (Vainsencher, Liu and Zhang - NIPS 2015 Local Smoothness in Variance Reduced Optimization )
February 23, 2016 Jaroslav Fowkes
Submodular function maximization (based on a survey of Krause and Golovin 2012)
February 16, 2016 Jakub Konečný Taming the wild: a unified analysis of Hogwild!-style algorithms (De Sa, Zhang, Olukotun, Re - NIPS 2015)
February 9 2016 No meeting
(Dominik, Jakub, Robert and I will be in Les Houches)
February 2, 2016 Nicolas Loizou Randomized gossip algorithms (Boyd, Ghosh, Prabhakar and Shah - IEEE Transactions on Information Theory 2006 and Dimakis, Kar, Moura, Rabbat and Scaglione - Proceedings of the IEEE)
January 26, 2016 Jakub Konečný On Variance Reduction in Stochastic Gradient Descent and its Asynchronous Variants On Variance Reduction in Stochastic Gradient Descent and its Asynchronous Variants (Reddi, Hefny, Sra, Poczos and Smola - NIPS 2015)

Organizers: Jakub Konečný and Peter Richtárik

All Hands Meetings on Big Data Optimization - Semester 1, 2015-2016

Venue: James Clerk Maxwell Building ROOM: JCMB 6311 (6th floor)
Time: 12:15 - 13:15 (lunch provided: thanks to the support of the Head of School)

Date Speaker Paper
November 24, 2015 Nick Polydorides A quasi Monte Carlo method for large-scale inverse problems (Polydorides, Wang & Bertsekas - 2012) more resources: [regression, inverse, DP chapter]
November 17, 2015 Ran Zhang Path-following methods (Chapter 5 of Wright's "Primal-dual interior-point methods" book)
November 10, 2015 Jakub Konečný Why random reshuffling beats stochastic gradient descent (Gurbuzbalaban, Ozdaglar and Parrilo - 10/2015)
November 3, 2015 Nicolas Loizou
Stochastic gradient descent, weighted sampling and the randomized Kaczmarz algorithm (Needell, Srebro and Ward - 10/2013)
October 27, 2015 Dominik Csiba
A universal catalyst for first-order optimization (Lin, Mairal & Harchaoui - 6/2015)
October 20, 2015 No meeting

October 13, 2015 Robert M Gower Convergence rates of sub-sampled Newton methods (Erdogdu & Montanari - 8/2015)
October 6, 2015 Robert M Gower
Newton sketch (Pilanci & Wainwright - 5/2015)
September 29, 2015 Dominik Csiba
Beyond convexity: stochastic quasi-convex optimization (Hazan, Levy and S-Shwartz - 7/2015)
September 22, 2015 Jakub Konečný Communication Complexity of Distributed Convex Learning and Optimization (Arjevani and Shamir - 6/2015)

Organizers: Jakub Konečný and Peter Richtárik

All Hands Meetings on Big Data Optimization - Semester 2, 2014-2015

Venue: James Clerk Maxwell Building ROOM: JCMB 4312 (4th floor)
Time: 12:15 - 13:15 (lunch provided: thanks to the support of the Head of School)

Date Speaker Paper
May 19, 2015 Ian Wallace HELM: Holomorphic Embedding Load flow Method (papers: 1 and 2 )
May 12, 2015 Andreas Grothey Contingency generation for AC optimal power flow (Chiang and Grothey - 2012 [Optimization Online])
May 5, 2015 No meeting due to Optimization and Big Data 2015
April 28, 2015 Zheng Qu On lower and upper bounds for smooth and strongly convex optimization problems (Arjevani, Shalev-Shwartz and Shamir - 3/2015)
April 21, 2015 Alessandro Perelli Combining ordered subsets and momentum for accelerated X-ray CT image reconstruction (Donghwan, Ramani and Fessler - 1/2015, IEEE link)
April 14, 2015 Robert Gower Research discussion
April 7, 2015 No meeting due to Easter Break
March 31, 2015 Dominik Csiba Stochastic Dual Coordinate Ascent (SDCA): A Dual-Free Analysis (Shai Shalev-Shwartz - 2/2015)
March 24, 2015 Jakub Konečný Greedy coordinate descent vs randomized coordinate descent
March 17, 2015 Tom Mayo and Guido Sanguinetti Challenges for predictive modelling in high-throughput biology (papers: [1] and [2])
March 10, 2015 Zheng Qu Complexity bounds for primal-dual methods minimizing the model of objective function (Nesterov - 2/2015)
March 3, 2015 Kimon Fountoulakis Randomized numerical linear algebra meets big data optimization (Yang, Chow, Re and Mahoney - 2/2015 and Yang, Meng and Mahoney - 2/2015)
February 24, 2015 Robert M. Gower Action constrained quasi-Newton methods (Gower and Gondzio - 12/2014)
February 17, 2015 no meeting due to Innovative Learning Week
February 10, 2015 Chris Williams Linear dynamical systems applied to condition monitoring (papers [1] and [2]).

Abstract: We develop a Hierarchical Switching Linear Dynamical System (HSLDS) for the detection of sepsis in neonates in an intensive care unit. The Factorial Switching LDS (FSLDS) of Quinn et al. (2009) is able to describe the observed vital signs data in terms of a number of discrete factors, which have either physiological or artifactual origin. We demonstrate that by adding a higher-level discrete variable with semantics sepsis/non-sepsis we can detect changes in the physiological factors that signal the presence of sepsis. We demonstrate that the performance of our model for the detection of sepsis is not statistically different from the auto-regressive HMM of Stanculescu et al. (2013), despite the fact that their model is given "ground truth" annotations of the physiological factors, while our HSLDS must infer them from the raw vital signs data. Joint work with Ioan Stanculescu and Yvonne Freer.
February 3, 2015 Jakub Konečný Communication efficient distributed optimization using an approximate Newton-type method (Shamir, Srebro and Zhang - 12/2013)
January 27, 2015 Zheng Qu A lower bound for the optimization of finite sums (Agarwal and Bottou - 10/2014)
January 20, 2015 Ilias Diakonikolas Algorithms in Statistics (papers: long version [1] and short version [2])

Blurb: A broad class of big data – such as those collected from financial transactions, seismic measurements, neurobiological measurements, sensor nets, or network traffic records – is best modeled as samples from a probability distribution over a very large domain. One of the most basic statistical inference tasks in this setting is this: learn the underlying distribution that generated the data.

Organizers: Jakub Konečný, Zheng Qu and Peter Richtárik

All Hands Meetings on Big Data Optimization - Semester 1, 2014-2015

Venue: James Clerk Maxwell Building ROOM: 6311 (6th floor)
Time: Tuesdays, 12:15 - 13:15 (lunch provided: thanks to NAIS)

Date Speaker Paper
December 2, 2014 Charles Sutton Optimization in Modern Machine Learning: Four Vignettes (Exploratory data analysis: Mining transaction data, Unsupervised learning in neural networks, Signal disaggregation: Understanding household energy usage, Sampling from high dimensional distributions using continuous relaxations) (papers: [1] [2] [3] )
November 25, 2014 Dominik Csiba Iterative Hessian sketch: fast and accurate solution approximation for constrained least-squares (based on Pilanci and Wainwright - 11/2014)
November 18, 2014 Xavier Cabezas Cycle bases in network synchronization problems (based on [1, 2, 3])
November 11, 2014 Zheng Qu Large-scale randomized-coordinate descent methods with non-separable linear constraints (Reddy, Hefny, Downey, Dubey and Sra - 10/2014)
November 4, 2014 Ademir Ribeiro Towards a direct search method with adaptive directions/geometry (Ademir will describe some challenges of his ongoing research in the area; paper to read: Konecny and Richtarik - 09/2014)
October 28, 2014 Amos Storkey Machine learning markets (abstract)
October 21, 2014 Dominik Csiba A stochastic PCA algorithm with an exponential convergence rate (Shamir - 09/2014)
October 14, 2014 Jakub Konecny Parallelism in optimization (this is a brainstorming session about the limits of paralleism in optimization and is not based on any papers)
October 7, 2014 Robert Gower A stochastic quasi-Newton method for large-scale optimization (Byrd, Hansen, Nocedal and Singer - 2014)
September 30, 2014 Jakub Konecny Trade-offs of large scale learning (papers: 1 - Bottou and Bousquet, 2 - Bottou and Bousquet, 3 - Bottou)
September 23, 2014 Zheng Qu SAGA: A fast incremental gradient method with support for non-strongly convex composite objectives (Defazio, Bach and Lacoste-Julien - 2014)
September 16, 2014 Kimon Fountoulakis Robust block coordinate descent (Fountoulakis and Tappenden - 2014)

Organizers: Jakub Konečný, Zheng Qu and Peter Richtárik

All Hands Meetings on Big Data Optimization - Semester 2, 2013-2014

Venue: James Clerk Maxwell Building NEW ROOM: 4312 (4th floor)
Time: Tuesdays, 12:15 - 13:15 (refreshments provided: thanks to NAIS)

Date Speaker Paper
June 17, 2014 Mojmír Mutný Revisiting Frank-Wolfe: Projection-Free Sparse Convex Optimization (Martin Jaggi - ICML 2013)
June 10, 2014 no meeting (due to this event)
June 3, 2014 Lukas Szpruch Multilevel Monte Carlo methods for applications in finance (Giles and Szpruch)
May 27, 2014 Jakub Konečný Incremental Majorization-Minimization Optimization with Application to Large-Scale Machine Learning (Julien Mairal - 2014)
May 13, 2014 Zheng Qu First-order methods of smooth convex optimization with inexact oracle (Devolder, Glineur and Nesterov - 2011). Preprint here.
May 6, 2014 Robert M. Gower A Stochastic Quasi-Newton Method for Large-Scale Optimization (Byrd, Hansen, Nocedal and Singer - 2014). Plus maybe also some background from this paper.
April 30, 2014 Olivier Fercoq Adaptive Subgradient Methods for Online Learning
and Stochastic Optimization (Duchi, Hazan and Singer - 2011)
April 22, 2014 no meeting (spring break)
April 15, 2014 no meeting (spring break)
April 8, 2014 no meeting (spring break)
April 1, 2014 Martin Takáč A Proximal Stochastic Gradient Method with Progressive Variance Reduction (Xiao and Zhang - 2014)
March 25, 2014 no meeting
March 18, 2014 Jakub Konečný Subgradient Methods for Huge-Scale Optimization Problems (Nesterov - 2012) [Mathematical Programming 2013]
March 11, 2014 Kimon Fountoulakis Parallel Coordinate Descent Newton for Efficient L1-Regularized Minimization (Bian, Li, Liu and Yang - 2013)
March 4, 2014 Mehrdad Yaghoobi Efficient Projections onto the L1-Ball for Learning in High Dimensions (Duchi, Shalev-Shwartz, Singer, Chandra - 2008)
Feb 25, 2014 Zheng Qu Finding the stationary states of Markov chains by iterative methods (Nesterov and Nemirovski - 2013)
Feb 18, 2014 no meeting as many of us will attend this event
Feb 11, 2014 Olivier Fercoq Efficient Accelerated Coordinate Descent Methods and Faster Algorithms for Solving Linear Systems (Lee and Sidford - 2013)
Feb 4, 2014 Rachael Tappenden Feature Clustering for Accelerating Parallel Coordinate Descent (Sherrer, Tewari, Halappanavar and Haglin - 2012)
Jan 28, 2014 Jakub Konečný Minimizing Finite Sums with the Stochastic Average Gradient (Schmidt, Le Roux and Bach - 2013)

Organizers: Jakub Konečný and Peter Richtárik