July 10, 2017


I am reviewing NIPS papers this week.


July 3, 2017


Jakub's PhD thesis is now on arXiv.



July 3, 2017


I am on my way to the Fields Institute, Toronto, to attend a workshop entitled "Modern Convex Optimization and Applications: AN70". This event is organized in honour of Arkadi Nemirovski's 70th birthday. Arkadi is one of the most influential individuals in optimization, directly resposible for the existence of several of its most important and most beautiful subfields. Here is a very brief profile of this giant of our beloved field, taken from the website of a workshop I co-organized in Edinburgh in 2015.

Update (July 5, 2017): I have given my talk today, here are the slides.

Update (July 7, 2017): Filip delivered his pitch talk and presented his poster "Extending the Reach of Big Data Optimization: Randomized Algorithms for Minimizing Relatively Smooth Functions".



July 2, 2017


New paper out: "A batch-incremental video background estimation model using weighted low-rank approximation of matrices" - joint work with Aritra Dutta (KAUST) and Xin Li (University of Central Florida).

Abstract: Principal component pursuit (PCP) is a state-of-the-art approach for background estimation problems. Due to their higher computational cost, PCP algorithms, such as robust principal component analysis (RPCA) and its variants, are not feasible in processing high definition videos. To avoid the curse of dimensionality in those algorithms, several methods have been proposed to solve the background estimation problem in an incremental manner. We propose a batch-incremental background estimation model using a special weighted low-rank approximation of matrices. Through experiments with real and synthetic video sequences, we demonstrate that our method is superior to the state-of-the-art background estimation algorithms such as GRASTA, ReProCS, incPCP, and GFL.


June 27, 2017


IMA Fox Prize (2nd Prize) for Robert Mansel Gower

Robert M. Gower was awarded a Leslie Fox Prize (2nd Prize) by the Institute of Mathematics and its Applications (IMA) for the paper Randomized Iterative Methods for Linear Systems (SIAM J. Matrix Anal. & Appl., 36(4), 1660–1690, 2015), coauthored with me. The list of finalists can be found here.

The Leslie Fox Prize for Numerical Analysis of the Institute of Mathematics and its Applications (IMA) is a biennial prize established in 1985 by the IMA in honour of mathematician Leslie Fox (1918-1992). The prize honours "young numerical analysts worldwide" (any person who is less than 31 years old), and applicants submit papers for review. A committee reviews the papers, invites shortlisted candidates to give lectures at the Leslie Fox Prize meeting, and then awards First Prize and Second Prizes based on "mathematical and algorithmic brilliance in tandem with presentational skills".

June 26, 2017


Jakub Konečný defended his PhD thesis "Stochastic, Distributed and Federated Optimization for Machine Learning" today. Congratulations!

Jakub joined my group as a PhD student in August 2013. His PhD was in his first year supported by the Principal's Career Development Scholarship, and in the subsequent years by a Google Europe Doctoral Fellowship in Optimization Algorithms. Jakub has co-authored 13 papers during his PhD (links to the papers can be found here). He has worked on diverse topics such as distributed optimization, machine learning, derivative-free optimization, federated learning, gesture recognition, semi-stochastic methods and variance-reduced algorithms for empirical risk minimization. He is joining Google Seattle in August 2017 as a research scientist.

June 21, 2017


New paper out: "Privacy Preserving Randomized Gossip Algorithms" - joint work with Filip Hanzely (Edinburgh), Jakub Konečný (Edinburgh), Nicolas Loizou (Edinburgh) and Dmitry Grishchenko (Higher School of Economics, Moscow).

Abstract: In this work we present three different randomized gossip algorithms for solving the average consensus problem while at the same time protecting the information about the initial private values stored at the nodes. We give iteration complexity bounds for all methods, and perform extensive numerical experiments.

June 18, 2017


I am now in Slovakia, visiting Radoslav Harman at Comenius University.

June 15, 2017


New paper out: "Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications" - joint work with Antonin Chambolle (Ecole Polytechnique), Matthias J. Ehrhardt (Cambridge), and Carola-Bibiane Schoenlieb (Cambridge).

Abstract: We propose a stochastic extension of the primal-dual hybrid gradient algorithm studied by Chambolle and Pock in 2011 to solve saddle point problems that are separable in the dual variable. The analysis is carried out for general convex-concave saddle point problems and problems that are either partially smooth / strongly convex or fully smooth / strongly convex. We perform the analysis for arbitrary samplings of dual variables, and obtain known deterministic results as a special case. Several variants of our stochastic method significantly outperform the deterministic variant on a variety of imaging tasks.

June 4, 2017


New paper out: "Stochastic reformulations of linear systems: algorithms and convergence theory" - joint work with Martin Takáč (Lehigh).

Abstract: We develop a  family of reformulations of an arbitrary consistent linear system into  a stochastic problem. The reformulations are governed by two user-defined parameters: a positive definite matrix defining a norm, and an arbitrary discrete or continuous distribution over random matrices. Our reformulation has several equivalent interpretations, allowing for researchers from various communities to leverage their domain specific insights. In particular, our reformulation can be equivalently seen as a stochastic optimization problem, stochastic linear system, stochastic fixed point problem and a probabilistic intersection problem. We prove sufficient, and necessary and sufficient conditions for the reformulation to be exact.

Further, we propose and analyze three stochastic algorithms for solving the reformulated problem---basic, parallel and accelerated methods---with global linear convergence rates. The rates can be interpreted as  condition numbers of a matrix which depends on the system matrix and on the reformulation parameters. This gives rise to a new phenomenon  which we call stochastic preconditioning, and which refers to the problem of finding parameters (matrix and distribution) leading to a sufficiently small condition number. Our basic method can be equivalently interpreted as  stochastic gradient descent, stochastic Newton method, stochastic proximal point method, stochastic fixed point method, and stochastic projection method,  with fixed stepsize (relaxation parameter), applied to the reformulations.


Comment: I have taught a course at the University of Edinburgh in Spring 2017 which was largely based on the results in this paper.

June 3, 2017


I am now back at KAUST to welcome Aritra Dutta who just joined my group at KAUST as a postdoc. Aritra: welcome!

May 26, 2017


I am in Edinburgh now - I'll be here until May 30. I am then giving a talk at Plymouth University on May 31 and at Cardiff University on June 1st. I'll be in London on June 2nd.

Update (June 4): This is the paper I talked about in Plymouth and Cardiff.

May 20, 2017


I am in Vancouver as of today, attending the SIAM Conference on Optimization. I am giving a talk on Monday, May 22 (stochastic reformulations of linear systems), and so is Nicolas Loizou (stochastic heavy ball method) and Filip Hanzely (randomized methods for minimizing relatively smooth functions). Strangely, none of these three papers are online yet! Robert Gower is giving his talk on Tuesday (sketch and project: a tool for designing stochastic quasi-Newton methods and stochastic variance reduced gradient methods). The first part of the talk is based on this and this paper, the variance reduction part is also new and not online yet. Jakub Konečný on Wednesday (AIDE: fast and communication efficient distributed optimization).

Update (June 4): This is the paper I talked about in Vancouver. Here are the talk slides.

May 15, 2017


Martin Takáč is visiting me at KAUST this week.

May 10, 2017


New Approach to AI: Federated Learning

Standard machine learning approaches require centralizing the training data on one machine or in a datacenter. For models trained from user interaction with mobile devices, a new approach was just released by Google, a result of collaboration between Google, Jakub Konečný and myself.

The new approach is called Federated Learning; it is described in the following four paper:

[1] Keith Bonawitz, Vladimir Ivanov, Ben Kreuter, Antonio Marcedone, H. Brendan McMahan, Sarvar Patel, Daniel Ramage, Aaron Segal and Karn Seth
Practical Secure Aggregation for Privacy Preserving Machine Learning (3/2017)

[2] Jakub Konečný, H. Brendan McMahan, Felix X. Yu, Peter Richtárik, Ananda Theertha Suresh, Dave Bacon
Federated Learning: Strategies for Improving Communication Efficiency (10/2016)

[3] Jakub Konečný, H. Brendan McMahan, Daniel Ramage, Peter Richtárik
Federated Optimization: Distributed Machine Learning for On-Device Intelligence (10/2016)

[4] H. Brendan McMahan, Eider Moore, Daniel Ramage, Seth Hampson, Blaise Agüera y Arcas
Communication-Efficient Learning of Deep Networks from Decentralized Data (2/2016)

Federated Learning enables mobile phones to collaboratively learn a shared prediction model while keeping all the training data on device, decoupling the ability to do machine learning from the need to store the data in the cloud. This goes beyond the use of local models that make predictions on mobile devices by bringing model training to the device as well. The technology is now in use by around 1 billion Android devices.

The CEO of Google, Sundar Pichai, said:

“… we continue to set the pace in machine learning and AI research. We introduced a new technique for training deep neural networks on mobile devices called Federated Learning. This technique enables people to run a shared machine learning model, while keeping the underlying data stored locally on mobile phones."

The new technology is described in a Google Research Blog, dated April 2017, to a lay audience. Selected media coverage:


May 9, 2017


New paper out: "Parallel stochastic Newton method" - joint work with Mojmír Mutný (ETH Zurich).

Abstract: We propose a parallel stochastic Newton method (PSN) for minimizing smooth convex functions. We analyze the method in the strongly convex case, and give conditions under which acceleration can be expected when compared to it serial stochastic Newton. We show how PSN can be applied to the empirical risk minimization problem, and demonstrate the practical efficiency of the method through numerical experiments and models of simple matrix classes.


May 6, 2017


I am hosting two interns from Indian Institute of Technology, Kanpur this Summer at KAUST; they just arrived: Aashutosh Tiwari and Atal Narayan. Welcome!


May 2, 2017


Most Downloaded SIMAX Paper

The paper "Randomized Iterative Methods for Linear Systems", coauthored with Robert M. Gower and published in 2015, is now the Most Downloaded Paper in the SIAM Journal on Matrix Analysis and Applications.

The paper was the Second Most Downloaded Paper since at least August 2016 when Robert noticed this and brought it to my attention. We have just noticed it climbed up to the #1 position. Thanks for all the interest in our work!

For those who want to pursue the line of work initiated in our paper, we recommend looking at the following follow-up papers where we address several extensions and obtain various additional insights and improvements:

1) Stochastic dual ascent for solving linear systems, arXiv:1512.06890

Here we lift the full rank assumption from our original SIMAX paper. In doing so, we discover a particularly beautiful duality theory behind the method. This also leads to the design of a novel stochastic method in optimization, which we call Stochastic Dual Ascent (SDA). With a bit of hindsight - we should have called it Stochastic Dual Subspace Ascent (SDSA).

2) Randomized quasi-Newton updates are linearly convergent matrix inversion algorithms, arXiv:1602.01768

The basic idea behind this paper is to apply a similar methodology we used to solve linear systems in the SIMAX paper to the problem of computing an inverse of a (large) matrix. I'll simply copy-paste the abstract here:

We develop and analyze a broad family of stochastic/randomized algorithms for inverting a matrix. We also develop specialized variants maintaining symmetry or positive definiteness of the iterates. All methods in the family converge globally and linearly (i.e., the error decays exponentially), with explicit rates. In special cases, we obtain stochastic block variants of several quasi-Newton updates, including bad Broyden (BB), good Broyden (GB), Powell-symmetric-Broyden (PSB), Davidon-Fletcher-Powell (DFP) and Broyden-Fletcher-Goldfarb-Shanno (BFGS). Ours are the first stochastic versions of these updates shown to converge to an inverse of a fixed matrix. Through a dual viewpoint we uncover a fundamental link between quasi-Newton updates and approximate inverse preconditioning. Further, we develop an adaptive variant of randomized block BFGS, where we modify the distribution underlying the stochasticity of the method throughout the iterative process to achieve faster convergence. By inverting several matrices from varied applications, we demonstrate that AdaRBFGS is highly competitive when compared to the well established Newton-Schulz and minimal residual methods. In particular, on large-scale problems our method outperforms the standard methods by orders of magnitude. Development of efficient methods for estimating the inverse of very large matrices is a much needed tool for preconditioning and variable metric optimization methods in the advent of the big data era.

3) Stochastic block BFGS: squeezing more curvature out of data, ICML 2016

In this work we apply the stochastic block BFGS method developed in the above paper to empirical risk minimization. Of course, much more is needed than just a straightforward application - but this is the initial idea behind the work.

4) Linearly convergent randomized iterative methods for computing the pseudoinverse, arXiv:1612.06255

Here we show that after suitable insights and modifications, the iterative sketching framework for inverting matrices from the "quasi-Newton" paper above can be used to compute the Moore-Penrose pseudoinverse of arbitrary (rectangular) real matrices. Extension to the complex setting is possible, but we did not do it.

5) Soon I will post a new paper on ArXiv which will go much deeper than the SIMAX paper - this work will represent what is to the best of my knowledge the deepest insight into the sketch-and-project we have at the moment.

Update (18.6.2017): The paper I mentioned in item 5 above is now on arXiv.

April 19, 2017


Dominik Csiba is giving a talk entitled "The role of optimization in machine learning" at the Machine Learning MeetUp (MLMU) in Bratislava today (at 7pm). If you are around and interested in machine learning and/or optimization, I recommend you attend!

Update (May 26, 2017): A video recording of Dominik's talk is now online.

April 17, 2017


Alibek Sailanbayev is visiting me at KAUST this week.

April 10, 2017


I am giving 2 talks this week. Today, I am giving a talk on stochastic Chambolle-Pock algorithm at the Visual Computing Conference held at KAUST. My PhD students Jakub, Nicolas and Filip are presenting posters tomorrow. On Thursday (April 13), I am giving a talk on randomized algorithms in linear algebra at the AMCS Graduate Seminar at KAUST.

April 9, 2017


I have several visitors at KAUST at the moment. Nicolas Loizou arrived in late March and is staying until early may. Filip Hanzely and Jakub Konečný arrived yesterday; Jakub will stay for a couple weeks and Filip will stay until late May, after which all of will go to Vancouver for SIAM Conference on Optimization. Konstantin Mishchenko (Paris Dauphine / Ecole Normale Superieure) visited for a few days recently, and is now attending the OSL 2017 workshop in Les Houches, France. Dmitry I. Grishchenko (Higher School of Economics) arrived yesterday and is staying for 2 weeks.

April 6, 2017


#1 Trending Paper in Mathematical Programming (Series A and B)

About a week ago I have received an email (see a screenshot below) in which Springer notifies the optimization community about the top 5 trending articles in the Mathematical Programming (Series A and B) journals. It was a great surprise (and pleasure) to learn that our paper Parallel coordinate descent methods for big data optimization (coauthored with Martin Takáč) is #1 on the list!

Top 5 trending articles in Mathematical Programming, Series A and B

March 29, 2017


Robert Gower's paper Randomized Iterative Methods for Linear Systems (SIAM Journal on Matrix Analysis and Applications 36(4):1660-1690, 2015), written in collaboration with me, was shortlisted for the 18th Leslie Fox Prize in Numerical Analysis.

This paper has been the 2nd most downloaded SIMAX paper since (at least) August 2016. The first most downloaded paper was published in year 2000.

The Leslie Fox Prize for Numerical Analysis of the Institute of Mathematics and its Applications (IMA) is a biennial prize established in 1985 by the IMA in honour of mathematician Leslie Fox (1918-1992). The prize honours "young numerical analysts worldwide" (any person who is less than 31 years old), and applicants submit papers for review. A committee reviews the papers, invites shortlisted candidates to give lectures at the Leslie Fox Prize meeting, and then awards First Prize and Second Prizes based on "mathematical and algorithmic brilliance in tandem with presentational skills".

March 17, 2017


As of today, Nicolas Loizou is visiting me at KAUST. He will stay until early May.

March 14, 2017


Filip Hanzely is giving a talk in our big data optimization seminar today. He will speak on "Finding Approximate Local Minima for Nonconvex Optimization in Linear Time".

March 13, 2017


I will be giving a tutorial on randomized optimization methods at the Data Science Summer School, held at CMAP, École Polytechnique, during August 28 - September 1, 2017.

Full list of tutorials:

- Yoshua Bengio (Montreal): Deep Learning
- Pradeep Ravikumar (Carnegie Mellon): Graphical Models
- Peter Richtarik (Edinburgh & KAUST): Randomized Optimization Methods
- Csaba Szepesvari (Alberta): Bandits

Plenary speakers:

- Cedric Archambeau (Amazon)
- Olivier Bousquet (Google)
- Damien Ernst (Liege)
- Laura Grigori (INRIA)
- Sean Meyn (Florida)
- Sebastian Nowozin (Microsoft Research)
- Stuart Russell (Berkeley)

The full program can be found here.

March 7, 2017


Marcelo Pereyra is giving a talk in our big data optimization seminar today. He is talking about his recent paper "Efficient Bayesian computation by proximal Markov chain Monte Carlo: when Langevin meets Moreau".

February 28, 2017


Jakub Konečný is giving a talk in our big data optimization seminar today.

February 26, 2017


I am on a leave from Edinburgh as of today. I have taken up an Associate Professor position at King Abdullah University of Science and Technology (KAUST). I am affiliated with the Computer Science (CS) and Applied Mathematics & Computational Science (AMCS) programs. I am also affiliated with the Extreme Computing Research Center (ECRC) and the Visual Computing Center (VCC). I have several positions open, contact me if interested!

PS: I am pleasantly surprised to see that the weather at KAUST is great at this time of the year!

February 21, 2017


Nicolas Loizou is giving a talk in our big data optimization seminar today.

February 14, 2017


Kostas Zygalakis is giving a talk in our big data optimization seminar today.

February 7, 2017


László Végh is visiting me for a couple days. He is also giving a talk in our big data optimization seminar tomorrow.

January 29, 2017


Between January 29 and February 3, I am in Villars-sur-Ollon, Switzerland, attending the BASP Frontiers 2017 workshop.

January 25, 2017


Due to certain administrative reasons, interviews for the 2 postdoc posts I have open will happen a bit later than anticipated. Originally I expected the interviews to happen this week - there will be some delay with this.

January 23, 2017


I have two visitors this week: Ion Necoara (Bucharest) and Elhoucine Bergou (INRA).

January 19, 2017


I am in London today and tomorrow. Today I am discussing research with people at the Alan Turing Institute (we managed to start a new project today and prove a lemma to kick it off), and tomorrow I am giving a seminar talk in the Department of Computing at Imperial College.

January 16, 2017


As of this week, I am teaching an intensive course (6 hours per week) entitled "Modern Optimization Methods for Big Data Problems". This is a rigorous course covering some of the fundamentals of randomized algorithms for optimization problems described by very large quantities of data. It is open to anyone interested (the current composition of the class includes PhD students, Master's students, a few undergraduate students and even some faculty).

January 10, 2017


We have several Lectureships (i.e., positions equivalent to Assistant Professorships in the US) open in the School of Mathematics:

Lectureship in Industrial Mathematics, application deadline: February 1, 2017 (5pm UK time)
Two Lectureships in Statistics and Data Science, application deadline: February 7, 2017 (5pm UK time)

January 7, 2017


I am in Slovakia at the moment, and it's been crazy cold the last few days. Today we have -15 degrees Celsius, but it feels like -25. This is when your eyebrows freeze. Anyway, I am returning back to Edinburgh tomorrow.

Some news: Dominik Csiba is now based in Slovakia. He will be finishing off his PhD from there; and is expected to submit his thesis in the Summer of 2017. He will be picking up some optimization/machine learning related activities in Slovakia - do talk to him if you get a chance! For instance, on February 15, Dominik will give a talk at a Slovak Machine Learning Meetup (MLMU). Further, Dominik is a mentor in a Data Science BaseCamp. Here is a blurb from their website: "BaseCamp is an immersive full-time 8-week program for prospective data scientists. During 8 weeks you will deepen your theoretical knowledge, enhance your practical skills and become a qualified data scientist ready for your exciting data science career."


Read Old News (2016 and earlier)
"