### July 10, 2018

Most-read paper in the journal Optimization Methods and Software in 2017

The paper Distributed Optimization with Arbitrary Local Solvers, joint work with Chenxin Ma, Jakub Konečný, Martin Jaggi, Virginia Smith, Michael I Jordan, and Martin Takáč, was the most-read paper in the OMS journal in year 2017.

This is how I know: I clicked on the "Read our most-read article of 2017 for free here" link available on this website, and got a nice surprise.

### July 3, 2018

I have joined the Editorial Board of Optimization Methods and Software.

### July 1, 2018

I am on my way to Bordeaux, to attend ISMP (23rd International Symposium on Mathematical Programming). With Alexandre d’Aspremont, Olivier Beaumont, and Suvrit Sra, we have organized stream 4a: "Learning: Machine Learning, Big Data, Cloud Computing, and Huge-Scale Optimization". Here is the schedule of talks which are based on papers I am co-author of (highlighted in red):

**Coordinate Descent and
Randomized Direct Search Methods (Continuous
Optimization****)**

RandomM - Mo 3:15pm-4:45pm, Format: 3x30 min Room: Salle KC6 Building: K, Intermediate 1, Zone: 10

Invited Session 211

RandomM - Mo 3:15pm-4:45pm, Format: 3x30 min Room: Salle KC6 Building: K, Intermediate 1, Zone: 10

Invited Session 211

Organizer: Martin Takáč, Lehigh University, US

Organizer: Martin Takáč, Lehigh University, US

Speaker: Asu Ozdaglar, MIT, US, talk 1486

Co-Authors: Mert Gurbuzbalaban, Nuri Vanli, Pablo Parrilo

Speaker: El houcine Bergou, KAUST-INRA, SA, talk 421

Co-Authors: Peter Richtárik, Eduard Gorbunov

Speaker: Dimitri Papageorgiou, ExxonMobil, US, talk 1275

Co-Authors: Krishnan Kumaran, Martin Takáč

**Machine
learning and sparse optimisation (Continuous
Optimization)
NLP - Tu 8:30am-10:30am, Format: 4x30 min Room: Salle
05 Building: Q, 1st floor, Zone: 11
Invited Session 109
Organizer: Coralia Cartis, University of Oxford, GB **

Speaker: Martin Lotz, The University of Manchester, GB,
talk 957

Co-Authors: Dennis Amelunxen, Jake Walvin

Speaker: Armin Eftekhari, Alan Turing Institute, GB, talk
1199

Co-Authors: Martin Lotz, Peter Richtárik

Speaker: Florentin Goyens, Oxford University, GB, talk
1182

Co-Authors: Armin Eftekhari, Coralia Cartis, Greg Ongie

Speaker: Jared Tanner, University of Oxford, GB, talk 1462

Co-Authors: Armin Eftekhari, Andrew Thompson, Bogdan
Toader, Hemant Tyagi

**Recent advances in first-order
algorithms for non-smooth optimization (Continuous
Optimization)**

NonSmooth - We 8:30am-10:30am, Format: 4x30 min Room: Salle LC4 Building: L, Intermediate 1, Zone: 9

NonSmooth - We 8:30am-10:30am, Format: 4x30 min Room: Salle LC4 Building: L, Intermediate 1, Zone: 9

Invited Session 198

Invited Session 198

Organizer: Thomas Pock, Graz University of Technology, AT

Organizer: Thomas Pock, Graz University of Technology, AT

Speaker: Peter Ochs, Saarland University, DE, talk 134

Co-Authors: Jalal Fadili

Speaker: Yura Malitsky, Univeristy of Göttingen, DE, talk
155

Speaker: Matthias Ehrhardt, University of Cambridge, GB,
talk 127

Co-Authors: Carola Schoenlieb, Peter Richtárik, Antonin
Chambolle

Speaker: Stanislav Mazurenko, University of Liverpool, GB,
talk 1655

Co-Authors: Tuomo Valkonen, C. Clason

**Fast Converging Stochastic
Optimization Algorithms (Continuous Optimization)**

RandomM - We 3:15pm-4:45pm, Format: 3x30 min Room: Salle KC6 Building: K, Intermediate 1, Zone: 10

RandomM - We 3:15pm-4:45pm, Format: 3x30 min Room: Salle KC6 Building: K, Intermediate 1, Zone: 10

Invited Session 213

Invited Session 213

Organizer: Francis Bach, INRIA - ENS, FR

Organizer: Francis Bach, INRIA - ENS, FR

Speaker: Aymeric Dieuleveut, EPFL, CH, talk 1102

Co-Authors: Alain Durmus, Francis Bach

Speaker: Aude Genevay, ENS, FR, talk 888

Speaker: Robert Gower, Telecom Paristech, FR, talk 859

Co-Authors: Peter Richtárik, Francis Bach

**Non-Convex and Second-order
Methods in Machine Learning (Continuous Optimization)**

RandomM - We 5:00pm-6:30pm, Format: 4x20 min Room: Salle KC6 Building: K, Intermediate 1, Zone: 10

RandomM - We 5:00pm-6:30pm, Format: 4x20 min Room: Salle KC6 Building: K, Intermediate 1, Zone: 10

**Invited Session 33**

Organizer: Martin Takáč, Lehigh University, US

Organizer: Martin Takáč, Lehigh University, US

Speaker: Aurelien Lucchi, ETH Zurich, CH, talk 531

Speaker: Reza Babanezhad, UBC, CA, talk 1135

Co-Authors: Raunak Kumar, Mark Schmidt

Speaker: Francesco Orabona, Stony Brook University, US,
talk 1108

Speaker: Martin Takáč, Lehigh University, US, talk 1342

Co-Authors: Lam Nguyen, Phuong Nguyen, Marten van Dijk,
Peter Richtárik, Katya Scheinberg

**First-order methods for
large-scale convex problems (Specific Models,
Algorithms, and Software Learning) **

- Th 8:30am-10:30am, Format: 4x30 min Room: FABRE Building: J, Ground Floor, Zone: 8

- Th 8:30am-10:30am, Format: 4x30 min Room: FABRE Building: J, Ground Floor, Zone: 8

Invited Session 316

Invited Session 316

Organizer: Stephen Vavasis, University of Waterloo, CA

Organizer: Stephen Vavasis, University of Waterloo, CA

Speaker: Stephen Vavasis, University of Waterloo, CA, talk
582

Co-Authors: Sahar Karimi

Speaker: Mert Gurbuzbalaban, Rutgers University, US, talk
1106

Speaker: Peter Richtárik, KAUST, SA, talk 385

Co-Authors: Ion Necoara, Andrei Patrascu

Speaker: Yaoliang Yu, University of Waterloo, CA, talk 937

Co-Authors: Xinhua Zhang, Zhan Shi

**Recent Advances
in Coordinate Descent and Constrained Problems
(Continuous Optimization)
RandomM - Fr 9:00am-10:30am, Format: 3x30 min
Room: Salle KC6 Building: K, Intermediate 1, Zone: 10
Invited Session 208
Organizer: Ion Necoara, Univ. Politehnica Bucharest,
RO**

1
- Convergence Analysis of Inexact Randomized Iterative
Methods

Speaker: Nicolas Loizou, University of Edinburgh, GB, talk
835

Co-Authors: Peter Richtárik

Speaker: Konstantin Mishchenko, KAUST, SA, talk 1264
Co-Authors: Peter Richtárik, Ion Necoara

Speaker: Ion Necoara, Univ. Politehnica Bucharest, RO,
talk 809

Co-Authors: Martin
Takáč

### June 29, 2018

Best DS

^{3}Poster Award for Samuel Horváth

Samuel Horváth attended the Data Science Summer School (DS

^{3}), which took place during June 25-29, 2018, at École Polytechnique in Paris, France. Based on the event website, the event gathered 500 participants from 34 countries and 6 continents, out of which 290 were MS and PhD students and postdocs, and 110 professionals. Selected guest/speaker names (out of 41): Cédric Villani, Nicoló Cesa-Bianchi, Mark Girolami, Yann Lecun, Suvrit Sra, Jean-Philippe Vert, Adrian Weller, Marco Cuturi, Arthur Gretton, and Andreas Krause.

The event also included a best-poster competition, with an impressive total of 170 posters. Samuel's poster won the Best DS

^{3}Poster Award. The poster, entitled

*Nonconvex variance reduced optimization with arbitrary sampling*, is based on a paper of the same title, joint work with me, and currently under review.

Here is the poster:

And here is the award:

This first prize carries a 500 EUR cash award.

Samuel: Congratulations!!!

### June 18, 2018

I am now in Edinburgh for a week. On Tuesday, I am giving a talk in the ANC Seminar (School of Informatics), and on Wednesday I am giving the same talk in the ERGO Seminar (School of Mathematics).

### June 15, 2018

New paper out: "Improving SAGA via a probabilistic interpolation with gradient descent" - joint work with Adel Bibi, Alibek Sailanbayev, Bernard Ghanem and Robert Mansel Gower.

Abstract: We develop and analyze a new algorithm for empirical risk minimization, which is the key paradigm for training supervised machine learning models. Our method---SAGD---is based on a probabilistic interpolation of SAGA and gradient descent (GD). In particular, in each iteration we take a gradient step with probability $q$ and a SAGA step with probability $1−q$. We show that, surprisingly, the total expected complexity of the method (which is obtained by multiplying the number of iterations by the expected number of gradients computed in each iteration) is minimized for a non-trivial probability $q$. For example, for a well conditioned problem the choice $q=1/(n−1)^2$, where $n$ is the number of data samples, gives a method with an overall complexity which is better than both the complexity of GD and SAGA. We further generalize the results to a probabilistic interpolation of SAGA and minibatch SAGA, which allows us to compute both the optimal probability and the optimal minibatch size. While the theoretical improvement may not be large, the practical improvement is robustly present across all synthetic and real data we tested for, and can be substantial. Our theoretical results suggest that for this optimal minibatch size our method achieves linear speedup in minibatch size, which is of key practical importance as minibatch implementations are used to train machine learning models in practice. This is the first time linear speedup in minibatch size is obtained for a variance reduced gradient-type method by directly solving the primal empirical risk minimization problem.

### June 10, 2018

I am in Voronovo, Russia, attending the Traditional Youth School in "Control, Information and Optimization" organized by Boris Polyak and Elena Gryazina. This is the 10th edition of the school. I will be teaching a 3h module on stochastic methods in optimization and machine learning.

*Update 1:*Slides from my two talks: TALK 1, TALK 2.

*Update 2:*Nikita Doikov won the Best Talk Award for the paper "Randomized Block Cubic Newton Method", to appear in ICML 2018.

### June 1, 2018

Jingwei Liang (Cambridge) is visiting me at KAUST.

### May 21, 2018

Adil Salim (Télécom ParisTech) is visiting me at KAUST this week.

### May 21, 2018

New paper out: "A nonconvex projection method for robust PCA" - joint work with Aritra Dutta and Filip Hanzely.

Abstract: Robust principal component analysis (RPCA) is a well-studied problem with the goal of decomposing a matrix into the sum of low-rank and sparse components. In this paper, we propose a nonconvex feasibility reformulation of RPCA problem and apply an alternating projection method to solve it. To the best of our knowledge, we are the first to propose a method that solves RPCA problem without considering any objective function, convex relaxation, or surrogate convex constraints. We demonstrate through extensive numerical experiments on a variety of applications, including shadow removal, background estimation, face detection, and galaxy evolution, that our approach matches and often significantly outperforms current state-of-the-art in various ways.

### May 19, 2018

The NIPS deadline is over now. Me and my group members will probably spend a few days sleeping...

### May 11, 2018

We have got two papers accepted to ICML 2018:

1) Randomized block cubic Newton method (with Nikita Doikov)

2) SGD and Hogwild! convergence without the bounded gradients assumption (with Lam M. Nguyen, Phuong Ha Nguyen, Marten van Dijk, Katya Scheinberg and Martin Takáč)

### May 1, 2018

New paper out: "Stochastic quasi-gradient methods: variance reduction via Jacobian sketching" - joint work with Robert Gower and Francis Bach.

Abstract: We develop a new family of variance reduced stochastic gradient descent methods for minimizing the average of a very large number of smooth functions. Our method---JacSketch---is motivated by novel developments in randomized numerical linear algebra, and operates by maintaining a stochastic estimate of a Jacobian matrix composed of the gradients of individual functions. In each iteration, JacSketch efficiently updates the Jacobian matrix by first obtaining a random linear measurement of the true Jacobian through (cheap) sketching, and then projecting the previous estimate onto the solution space of a linear matrix equation whose solutions are consistent with the measurement. The Jacobian estimate is then used to compute a variance-reduced unbiased estimator of the gradient, followed by a stochastic gradient descent step. Our strategy is analogous to the way quasi-Newton methods maintain an estimate of the Hessian, and hence our method can be seen as a stochastic quasi-gradient method. Indeed, quasi-Newton methods project the current Hessian estimate onto a solution space of a linear equation consistent with a certain linear (but non-random) measurement of the true Hessian. Our method can also be seen as stochastic gradient descent applied to a controlled stochastic optimization reformulation of the original problem, where the control comes from the Jacobian estimates.

We prove that for smooth and strongly convex functions, JacSketch converges linearly with a meaningful rate dictated by a single convergence theorem which applies to general sketches. We also provide a refined convergence theorem which applies to a smaller class of sketches, featuring a novel proof technique based on a stochastic Lyapunov function. This enables us to obtain sharper complexity results for variants of JacSketch with importance sampling. By specializing our general approach to specific sketching strategies, JacSketch reduces to the celebrated stochastic average gradient (SAGA) method, and its several existing and many new minibatch, reduced memory, and importance sampling variants. Our rate for SAGA with importance sampling is the current best-known rate for this method, resolving a conjecture by Schmidt et al (2015). The rates we obtain for minibatch SAGA are also superior to existing rates. Moreover, we obtain the first minibatch SAGA method with importance sampling.

### April 29, 2018

I am on my way to Birmingham, and then Coventry. I will be giving a talk at the DIMAP seminar (DIMAP = Centre for Discrete Mathematics and its Applications), University of Warwick, on "Stochastic Quasi-Gradient Methods: Variance Reduction via Jacobian Sketching". The talk is based on joint work with Robert M. Gower and Francis Bach.

### April 25, 2018

Today and tomorrow I am teaching a mini-course on "Optimization for Machine Learning" for students from various Saudi universities who were previously contestants in Saudi National Mathematical Olympiad and/or IMO. Several current contestants are attending as well.

This is a collaborative effort with Diogo Gomes, who is teaching a "Mathematica" mini-course.

### April 18, 2018

New paper out: "Weighted low-rank approximation of matrices and background modeling" - joint work with Aritra Dutta and Xin Li.

Abstract: We primarily study a special a weighted low-rank approximation of matrices and then apply it to solve the background modeling problem. We propose two algorithms for this purpose: one operates in the batch mode on the entire data and the other one operates in the batch-incremental mode on the data and naturally captures more background variations and computationally more effective. Moreover, we propose a robust technique that learns the background frame indices from the data and does not require any training frames. We demonstrate through extensive experiments that by inserting a simple weight in the Frobenius norm, it can be made robust to the outliers similar to the L1 norm. Our methods match or outperform several state-of-the-art online and batch background modeling methods in virtually all quantitative and qualitative measures.

### April 16, 2018

I am giving a talk today at the CS Graduate Seminar at KAUST. I will be talking about "Randomized Methods for Convex Feasibility Problems". This is joint work with Ion Necoara and Andrei Patrascu.

### April 5, 2018

My lab has openings for postdoc (straight after PhD, or a few years after PhD) and research scientist (several to many years after PhD; similar to a RS position at big data companies such as Google, Microsoft Research, Amazon, Baidu, Tencent, Facebook) positions.

Relevant areas: machine learning theory, optimization, algorithms, high performance computing, deep learning, randomized and stochastic algorithms, federated learning, computer vision, machine learning systems, data science, applied mathematics, theoretical computer science. Contact me by email if interested. Please send your CV (including publication record), a brief statement of interest, 3 reference letters (and PhD transcript for postdoc applicants).

Place of work: KAUST. Outstanding working conditions.

Starting date: Fall 2018 (flexible).

Contract duration: based on agreement (e.g., 1-3 years).

### April 2, 2018

Dominik Csiba's PhD thesis "Data sampling strategies in stochastic algorithms for empirical risk minimization" is online now.

### March 2, 2018

I am on vacation this week.

### March 23, 2018

Konstantin and Filip are attending the 2018 NFORMS Optimization Society Conference in Denver, Colorado.

### March 21, 2018

New paper out: "Fastest rates for stochastic mirror descent methods" - joint work with Filip Hanzely.

Abstract: Relative smoothness - a notion introduced by Birnbaum et al. (2011) and rediscovered by Bauschke et al. (2016) and Lu et al. (2016) - generalizes the standard notion of smoothness typically used in the analysis of gradient type methods. In this work we are taking ideas from well studied field of stochastic convex optimization and using them in order to obtain faster algorithms for minimizing relatively smooth functions. We propose and analyze two new algorithms: Relative Randomized Coordinate Descent (relRCD) and Relative Stochastic Gradient Descent (relSGD), both generalizing famous algorithms in the standard smooth setting. The methods we propose can be in fact seen as variants of stochastic mirror descent. One of them, relRCD is the first stochastic mirror descent algorithm with a linear convergence rate.

### March 18, 2018

Sarah Sachs, a master student from Technical University Munich (TUM), arrived at KAUST today. She will spend six months at KAUST (until early September) as a visiting student in my group, and will write her master's thesis under my supervision. In her thesis she is focusing on randomized optimization algorithms. Welcome!

Sarah's bachelor thesis at TUM focused on approximation of the infimal convolution for non-convex functions. She previously worked on finding efficiently computable stopping criteria for ADMM and the Chambolle-Pock algorithm applied to LP relaxations of ILPs with integral extreme points. She is generally interested in optimization with applications to computer vision.

### March 4, 2018

Konstantin Mishchenko is visiting the Cambridge Image Analysis group of Carola-Bibiane Schönlieb at the University of Cambridge. During March 8-11 he is participating in VHacks, the first ever hackathon at the Vatican.

Aritra and El Houcine are also travelling.

Update (March 19): Konstantin, El Houcine and Aritra are back.

### February 25, 2018

I am on my way to Bratislava, Slovakia. Tomorrow, I am giving a statistics seminar talk at "Matfyz" - School of Mathematics, Physics and Informatics, Comenius University.

Title: On stochastic algorithms in linear algebra, optimization and machine learning

Place: FMFI UK, M/XII

Date: Monday, February 26, 2018

Time: 09:50am

If anyone is interested in MS / PhD / postdocs / research scientist positions at KAUST, I will be available to talk to you after the talk.

### February 20, 2018

Videos of the talks from the KAUST Research Workshop on Optimization and Big Data are now available. They can be found here.

Comment: At the moment the videos are accessible to KAUST community only, they will soon be available globally.

### February 13, 2018

New paper out: "SGD and Hogwild! convergence without the bounded gradients assumption" - joint work with Lam M. Nguyen, Phuong Ha Nguyen, Marten van Dijk, Katya Scheinberg and Martin Takáč.

Abstract: Stochastic gradient descent (SGD) is the optimization algorithm of choice in many machine learning applications such as regularized empirical risk minimization and training deep neural networks. The classical analysis of convergence of SGD is carried out under the assumption that the norm of the stochastic gradient is uniformly bounded. While this might hold for some loss functions, it is always violated for cases where the objective function is strongly convex. In (Bottou et al., 2016) a new analysis of convergence of SGD is performed under the assumption that stochastic gradients are bounded with respect to the true gradient norm. Here we show that for stochastic problems arising in machine learning such bound always holds. Moreover, we propose an alternative convergence analysis of SGD with diminishing learning rate regime, which is results in more relaxed conditions that those in (Bottou et al., 2016). We then move on the asynchronous parallel setting, and prove convergence of the Hogwild! algorithm in the same regime, obtaining the first convergence results for this method in the case of diminished learning rate.

### February 12, 2018

New paper out: "Accelerated stochastic matrix inversion: general theory and speeding up BFGS rules for faster second-order optimization" - joint work with Robert M. Gower, Filip Hanzely and Sebastian Stich.

Abstract: We present the first accelerated randomized algorithm for solving linear systems in Euclidean spaces. One essential problem of this type is the matrix inversion problem. In particular, our algorithm can be specialized to invert positive definite matrices in such a way that all iterates (approximate solutions) generated by the algorithm are positive definite matrices themselves. This opens the way for many applications in the field of optimization and machine learning. As an application of our general theory, we develop the first accelerated (deterministic and stochastic) quasi-Newton updates. Our updates lead to provably more aggressive approximations of the inverse Hessian, and lead to speed-ups over classical non-accelerated rules in numerical experiments. Experiments with empirical risk minimization show that our rules can accelerate training of machine learning models.

### February 10, 2018

New paper out: "Randomized block cubic Newton method" - joint work with Nikita Doikov.

Abstract: We study the problem of minimizing the sum of three convex functions: a differentiable, twice-differentiable and a non-smooth term in a high dimensional setting. To this effect we propose and analyze a randomized block cubic Newton (RBCN) method, which in each iteration builds a model of the objective function formed as the sum of the {\em natural} models of its three components: a linear model with a quadratic regularizer for the differentiable term, a quadratic model with a cubic regularizer for the twice differentiable term, and perfect (proximal) model for the nonsmooth term. Our method in each iteration minimizes the model over a random subset of blocks of the search variable. RBCN is the first algorithm with these properties, generalizing several existing methods, matching the best known bounds in all special cases. We establish ${\cal O}(1/\epsilon)$, ${\cal O}(1/\sqrt{\epsilon})$ and ${\cal O}(\log (1/\epsilon))$ rates under different assumptions on the component functions. Lastly, we show numerically that our method outperforms the state-of-the-art on a variety of machine learning problems, including cubically regularized least-squares, logistic regression with constraints, and Poisson regression.

### February 10, 2018

New paper out: "Stochastic spectral and conjugate descent methods" - joint work with Dmitry Kovalev, Eduard Gorbunov and Elnur Gasanov.

Abstract: The state-of-the-art methods for solving optimization problems in big dimensions are variants of randomized coordinate descent (RCD). In this paper we introduce a fundamentally new type of acceleration strategy for RCD based on the augmentation of the set of coordinate directions by a few spectral or conjugate directions. As we increase the number of extra directions to be sampled from, the rate of the method improves, and interpolates between the linear rate of RCD and a linear rate independent of the condition number. We develop and analyze also inexact variants of these methods where the spectral and conjugate directions are allowed to be approximate only. We motivate the above development by proving several negative results which highlight the limitations of RCD with importance sampling.

### February 5, 2018

OBD 2018 is starting! The KAUST Workshop on Optimization and Big Data just started. We have 19 amazing speakers and 21 deluxe e-posters lined up.

Update (February 12): Thanks for all who participated in the workshop, thanks you to this was an excellent event! Group photos:

### February 4, 2018

KAUST Research Workshop on Optimization and Big Data is starting tomorrow! We have 19 amazing speakers, and 21 deluxe poster talks and ePoster presentations.

This year, Tamás Terlaky (Lehigh) is the keynote speaker.

Thanks to the KAUST Office for Sponsored Research, The Alan Turing Institute and KICP.

### February 1, 2018

Nicolas Loizou is back at KAUST on a research visit. Welcome!

### January 28, 2018

Aritra Dutta (postdoc), Alibek Sailanbayev (MS/PhD student) and Samuel Horvath (MS/PhD student) are attending Applied Machine Learning Days at EPFL, Lausanne, Switzerland.

### January 27, 2018

Let me welcome Dmitry Kovalev and Elnur Gasanov (master students visiting from MIPT, Moscow) and Slavomir Hanzely (undergraduate student at Comenius University), who arrived at KAUST about a week ago and are working with me as interns.They will be here for about a month.

### January 18, 2018

New paper out: "A randomized exchange algorithm for computing optimal approximate designs of experiments" - joint work with Radoslav Harman and Lenka Filová.

Abstract: We propose a class of subspace ascent methods for computing optimal approximate designs that covers both existing as well as new and more efficient algorithms. Within this class of methods, we construct a simple, randomized exchange algorithm (REX). Numerical comparisons suggest that the performance of REX is comparable or superior to the performance of state-of-the-art methods across a broad range of problem structures and sizes. We focus on the most commonly used criterion of D-optimality that also has applications beyond experimental design, such as the construction of the minimum volume ellipsoid containing a given set of datapoints. For D-optimality, we prove that the proposed algorithm converges to the optimum. We also provide formulas for the optimal exchange of weights in the case of the criterion of A-optimality. These formulas enable one to use REX for computing A-optimal and I-optimal designs.

### January 16, 2018

I was travelling and am back at KAUST now. Let me welcome Eduard Gorbunov (a master's student visiting from MIPT, Moscow; will be here untill Feb 8), Matthias Ehrhardt (visiting from Cambridge, UK, untill February 10) and Elhoucine Bergou (new postdoc in my group, starting today).

### January 15, 2018

New paper out: "Randomized projection methods for convex feasibility problems: conditioning and convergence rates" - joint work with Ion Necoara and Andrei Patrascu.

Abstract: Finding a point in the intersection of a collection of closed convex sets, that is the convex feasibility problem, represents the main modeling strategy for many computational problems. In this paper we analyze new stochastic reformulations of the convex feasibility problem in order to facilitate the development of new algorithmic schemes. We also analyze the conditioning problem parameters using certain (linear) regularity assumptions on the individual convex sets. Then, we introduce a general random projection algorithmic framework, which extends to the random settings many existing projection schemes, designed for the general convex feasibility problem. Our general random projection algorithm allows to project simultaneously on several sets, thus providing great flexibility in matching the implementation of the algorithm on the parallel architecture at hand. Based on the conditioning parameters, besides the asymptotic convergence results, we also derive explicit sublinear and linear convergence rates for this general algorithmic framework.

### December 22, 2017

New paper out: "Momentum and stochastic momentum for stochastic gradient, Newton, proximal point and subspace descent methods" - joint work with Nicolas Loizou.

Abstract: In this paper we study several classes of stochastic optimization algorithms enriched with heavy ball momentum. Among the methods studied are: stochastic gradient descent, stochastic Newton, stochastic proximal point and stochastic dual subspace ascent. This is the first time momentum variants of several of these methods are studied. We choose to perform our analysis in a setting in which all of the above methods are equivalent. We prove global nonassymptotic linear convergence rates for all methods and various measures of success, including primal function values, primal iterates (in L2 sense), and dual function values. We also show that the primal iterates converge at an accelerated linear rate in the L1 sense. This is the first time a linear rate is shown for the stochastic heavy ball method (i.e., stochastic gradient descent method with momentum). Under somewhat weaker conditions, we establish a sublinear convergence rate for Cesaro averages of primal iterates. Moreover, we propose a novel concept, which we call stochastic momentum, aimed at decreasing the cost of performing the momentum step. We prove linear convergence of several stochastic methods with stochastic momentum, and show that in some sparse data regimes and for sufficiently small momentum parameters, these methods enjoy better overall complexity than methods with deterministic momentum. Finally, we perform extensive numerical testing on artificial and real datasets, including data coming from average consensus problems.

### December 15, 2017

I am now in Havana, Cuba, attending the 4th Conference on Optimization and Software. I am speaking about a stochastic version of the Chambolle-Pock algorithm [1, 2]. Filip Hanzely is here, too - speaking about randomized and accelerated methods for minimizing relatively smooth functions.

### December 12, 2017

A few random updates:

Nicolas and Jakub attended NIPS. Filip and me will soon fly to Cuba to give talks at the 4th Conference on Optimization Methods and Software. Mark Schmidt is joining us in the same session. The Fall 2017 semester at KAUST is over - I have had some fantastic students in my CS390FF (Big Data Optimization) class.

### November 30, 2017

Video recordings of the Data Science Summer School (held at Ecole Polytechnique in Aug/Sept 2017) lectures are now online.

Lecturers:

- Joshua Bengio (Montreal): Deep Learning

- Pradeep Ravikumar (CMU): Graphical Models

- Peter Richtarik (KAUST/Edinburgh): Randomized Optimization Methods

- Csaba Szepesvári (Alberta/Google DeepMind): Bandits

I have given a 5 hr course on Randomized Optimization Methods; the videos are here:

Video Lecture 1

Video Lecture 2

Video Lecture 3

Video Lecture 4

Video Lecture 5

### November 25, 2017

New paper out: "Online and batch supervised background estimation via L1 regression" - joint work with Aritra Dutta.

Abstract: We propose a surprisingly simple model for supervised video background estimation. Our model is based on L1 regression. As existing methods for L1 regression do not scale to high-resolution videos, we propose several simple and scalable methods for solving the problem, including iteratively reweighted least squares, a homotopy method, and stochastic gradient descent. We show through extensive experiments that our model and methods match or outperform the state-of-the-art online and batch methods in virtually all quantitative and qualitative measures.

### November 24, 2017

Dominik Csiba defended his PhD thesis "Data sampling strategies in stochastic algorithms for empirical risk minimization" today. Congratulations!

### November 16, 2017

Robert Gower is visiting me at KAUST for 2 weeks. He will give two talks during his visit, one at my research group seminar on Nov 21, and one at the CS graduate seminar on Nov 27.

### November 9, 2017

Nicolas Loizou is visiting me at KAUST. He will stay until early December, after which he is heading off for NIPS, to present our work on "Linearly convergent stochastic heavy ball method for minimizing generalization error".

### October 27, 2017

New paper out: "Linearly convergent stochastic heavy ball method for minimizing generalization error" - joint work with Nicolas Loizou.

Abstract: In this work we establish the first linear convergence result for the stochastic heavy ball method. The method performs SGD steps with a fixed stepsize, amended by a heavy ball momentum term. In the analysis, we focus on minimizing the expected loss and not on finite-sum minimization, which is typically a much harder problem. While in the analysis we constrain ourselves to quadratic loss, the overall objective is not necessarily strongly convex.

### October 25, 2017

I am on my way to Moscow, where I will kick-start a research project funded at MIPT by giving a talk at a workshop entitled Optimization at Work. As a part of this project, several MIPT students will join my team. The first three members are:

Dmitry Kovalev

Eduard Gorbunov

Elnur Gasanov

There are two more spots to be filled. If you are an exceptionally strong mathematics or computer science student of MIPT, get in touch with me.

Konstantin Mishchenko and Eduard Gorbunov are giving talks, too.

Update (Oct 27, 2017): I just gave my talk; I talked about stochastic Chambolle-Pock algorithm (joint work with A. Chambolle, M. J. Ehrhardt, and C.-B. Schoenlieb). See also follow up work on an application to PET reconstruction (Proceedings of SPIE, 2017).

### October 9, 2017

Nikita Doikov (Higher School of Economics, Moscow) is visiting me at KAUST. He will stay until late November. Nikita is a PhD student working under the supervision of Yurii Nesterov.

### October 5, 2017

Sebastian Stich (EPFL) is visiting me at KAUST; he will stay for a couple weeks.

### October 1, 2017

Nicolas Loizou is visiting Berkeley for 10 days. He is attending the Fast Iterative Methods in Optimization workshop held at the Simons Institute. The workshop is a part of the program Bridging Continuous and Discrete Optimization. On October 5, Nicolas will give a seminar talk in the Statistics department at Berkeley entitled "'Stochastic and doubly stochastic dual heavy ball methods for quadratic optimization with low-rank Hessian". This talk is based on a new joint paper with me which will be posted on arXiv soon.

### September 24, 2017

Aritra Dutta is attending a sectional meeting of the American Mathematical Society in Orlando, Florida. He is giving a talk based on the paper "A Batch-Incremental Video Background Estimation Model using Weighted Low-Rank Approximation of Matrices", co-authored with Xin Li and myself, in a "Special Session on Advances in Dirac Equations, Variational Inequalities, Sequence Spaces and Optimization". He will give the same talk at ICCV / RSL-CV in Venice, Italy on October 28.

### September 23, 2017

I'd like to welcome five new students who joined my group at KAUST in August/September:

Filip Hanzely (PhD student) - coming from Comenius University / University of Edinburgh [Scholar]

Samuel Horváth (MS/PhD student) - coming from Comenius University

Viktor Lukáček (PhD student) - coming from Comenius University

Konstantin Mishchenko (PhD student) - coming from MIPT / ENS Cachan / Paris Dauphine [CV]

Alibek Sailanbayev (MS/PhD student) - coming from Nazarbayev University [Scholar]

Filip Hanzely transfered to KAUST from Edinburgh where he spent 1 year as a PhD student under my supervision. He wrapped up his 1 year spent at Edinburgh with an MSc degree. Filip co-authored two papers during his time in Edinburgh: one on privacy-preserving gossip methods, and one on randomized algorithms for minimizing relatively smooth functions (this is the subject of his MSc thesis @ Edinburgh, the paper will be soon posted onto arXiv). Filip gave a talk on the latter paper at the SIAM Conference on Optimization in Vancouver, and presented a poster at the AN70 meeting at the Fields Institite in Toronto. Before coming to Edinburgh, Filip wrote a paper on testing for causality in reconstructed state spaces. Alibek wrote two papers during his undergraduate studies: one on pattern structures for structured attribute sets and one on data analysis in biomedical studies. Konstantin is writing two papers on distributed optimization based on results obtained during his Summer 2017 internship in Grenoble with Frank Iutzeler and Jerome Malick. He presented these results as a poster entitled "An asynchronous distributed proximal gradient algorithm" in a workshop on Decentralized Machine Learning, Optimization and Privacy held recently in Lille, France.

Filip, Samuel, Viktor, Konstantin and Alibek were all active in various national and international mathematical and computing competitions for high school and university students. Here is a list of some of their achievements:

2017 (Horváth), 37th Place, Vojtech Jarnik International Mathematical Competition, Ostrava, Czech Republic

2016 (Horváth), 36th Place, Vojtech Jarnik International Mathematical Competition, Ostrava, Czech Republic

2016 (Horváth), 3rd Prize, Int. Mathematical Competition for University Students, Blagoevgrad, Bulgaria

2016 (Sailanbayev), Semifinal, Programming contest ACM ICPC in NEERC region, Almaty, Kazakhstan

2015 (Sailanbayev), 2nd Prize, Int. Mathematical Competition for University Students, Blagoevgrad, Bulgaria

2015 (Mishchenko), 1st Prize, HSE Olympiad in Applied Mathematics and Informatics, Moscow, Russia

2014 (Mishchenko), 3rd Prize, MIPT Student Mathematical Olympiad, Moscow, Russia

2014 (Horváth), 18th Place, National Mathematical Olympiad, Bratislava, Slovakia

2014 (Horváth), 1st Place, Nitra District Mathematical Olympiad, Category A, Slovakia

2014 (Sailanbayev), 2nd Prize, Int. Mathematical Competition for University Students, Blagoevgrad, Bulgaria

2014 (Hanzely), 2nd Prize, Int. Mathematical Competition for University Students, Blagoevgrad, Bulgaria

2014 (Hanzely), 9th Place, Vojtech Jarnik International Mathematical Competition, Ostrava, Czech Republic

2014 (Lukáček), 26th Place, Vojtech Jarnik International Mathematical Competition, Ostrava, Czech Republic

2013 (Sailanbayev), Silver Medal, International Mathematical Olympiad, Santa Marta, Colombia

2013 (Hanzely), Bronze Medal, International Mathematical Olympiad, Santa Marta, Colombia

2013 (Sailanbayev), 1st Place, National Mathematical Olympiad, Kazachstan

2013 (Hanzely), 1st Place, National Mathematical Olympiad, Kosice, Slovakia

2013 (Sailanbayev), Gold Medal, International Zhautykov Olympiad, Almaty, Kazakhstan

2013 (Lukáček), 20th Place, Vojtech Jarnik International Mathematical Competition, Ostrava, Czech Republic

2012 (Lukáček), 3rd Prize, Int. Mathematical Competition for University Students, Blagoevgrad, Bulgaria

2012 (Mishchenko), 1st Prize, Moscow Mathematical Olympiad, Moscow, Russia

2012 (Mishchenko), 1st Prize, PhysTech International Olympiad in Mathematics

2012 (Hanzely), Bronze Medal, Middle European Mathematical Olympiad, Solothurn, Switzerland

2012 (Sailanbayev), Bronze Medal, International Mathematical Olympiad, Mar del Plata, Argentina

2012 (Sailanbayev), Silver Medal, Balkan Mathematical Olympiad, Antalya, Turkey

2012 (Lukáček), 2nd Place, International Correspondence Seminar in Mathematics (iKS)

2011 (Lukáček), Bronze Medal (26th Place), Middle European Mathematical Olympiad, Varaždin, Croatia

It's exciting to have you all here, welcome!

### September 13, 2017

I am back at KAUST now. The Lille workshop was very nice: excellent talks, great group of people.

I will soon start inviting speakers for the Optimization and Big Data workshop which will take place at KAUST during February 5-7, 2018.

### September 12, 2017

New paper out: "Global convergence of arbitrary-block gradient methods for generalized Polyak-Łojasiewicz functions" - joint work with Dominik Csiba.

Abstract: In this paper we introduce two novel generalizations of the theory for gradient descent type methods in the proximal setting. First, we introduce the proportion function, which we further use to analyze all known (and many new) block-selection rules for block coordinate descent methods under a single framework. This framework includes randomized methods with uniform, non-uniform or even adaptive sampling strategies, as well as deterministic methods with batch, greedy or cyclic selection rules. Second, the theory of strongly-convex optimization was recently generalized to a specific class of non-convex functions satisfying the so-called Polyak-Łojasiewicz condition. To mirror this generalization in the weakly convex case, we introduce the Weak Polyak-Łojasiewicz condition, using which we give global convergence guarantees for a class of non-convex functions previously not considered in theory. Additionally, we establish (necessarily somewhat weaker) convergence guarantees for an even larger class of non-convex functions satisfying a certain smoothness assumption only. By combining the two abovementioned generalizations we recover the state-of-the-art convergence guarantees for a large class of previously known methods and setups as special cases of our general framework. Moreover, our frameworks allows for the derivation of new guarantees for many new combinations of methods and setups, as well as a large class of novel non-convex objectives. The flexibility of our approach offers a lot of potential for future research, as a new block selection procedure will have a convergence guarantee for all objectives considered in our framework, while a new objective analyzed under our approach will have a whole fleet of block selection rules with convergence guarantees readily available.

### September 10, 2017

I am on my way to Lille, France, to attend a workshop on Decentralized Machine Learning, Optimization and Privacy (Sept 11-12, 2017).

Update (Sept 11): I have given my talk "Privacy preserving randomized gossip algorithms" today. The talk is based on this paper, and here are the slides.

### September 9, 2017

Dominik Csiba submitted his PhD thesis entitled "Data Sampling Strategies in Stochastic Algorithms for Empirical Risk Minimization" a couple weeks ago. The thesis consist of 6 chapters; I include links to the papers the chaopters are based on.

1. Introduction

2. Stochastic Dual Coordinate Ascent with Adaptive Probabilities

3. Primal Method for ERM with Flexible Mini-batching Schemes and Non- convex Losses

4. Importance Sampling for Minibatches

5. Coordinate Descent Faceoff: Primal or Dual?

6. Global Convergence of Arbitrary-Block Gradient Methods for Generalized Polyak-Lojasiewicz Functions

His defense/viva will take place at some point in the Fall; a pdf of the thesis will be made public afterwards.

### September 8, 2017

I am co-organizing the workshop "Sparse Approximation and Sampling" which is to be held in London sometime in May or June 2019. The precise dates will be fixed soon. This is a joint event of the Isaac Newton Institute and The Alan Turing Institute. This is one of three workshops which are part of a 6 month programme on "Approximation, sampling and compression in high dimensional problems" held at the Isaac Newton Institute during January-June 2019.

- Workshop organizers: Robert
Calderbank (Duke, USA), Anders
Hansen (Cambridge, UK),

Peter Richtarik (KAUST, KSA - Edinburgh, UK - The Alan Turing Institute, UK), Miguel Rodrigues (UCL, UK).

- Programme Scientific Advisory Committee: Robert Calderbank (Duke, USA), Emmanuel Candes (Stanford, USA), Ronald DeVore (Texas A&M, USA), Ingrid Daubechies (Duke, USA), Arieh Iserles (Cambridge, UK)

### September 1, 2017

I am now back at KAUST. The Eid al-Adha holiday started yesterday. I am looking forward to a bit of rest (or "stayvacation", as spending vacation at KAUST as opposed to somewhere else is called).

### August 29, 2017

New paper out: "The complexity of primal-dual fixed point methods for ridge regression" - joint work with Ademir Ribeiro (Federal University of Paraná).

Abstract: We study the ridge regression ($L_2$ regularized least squares) problem and its dual, which is also a ridge regression problem. We observe that the optimality conditions describing the primal and dual optimal solutions can be formulated in several different but equivalent ways. The optimality conditions we identify form a linear system involving a structured matrix depending on a single relaxation parameter which we introduce for regularization purposes. This leads to the idea of studying and comparing, in theory and practice, the performance of the fixed point method applied to these reformulations. We compute the optimal relaxation parameters and uncover interesting connections between the complexity bounds of the variants of the fixed point scheme we consider. These connections follow from a close link between the spectral properties of the associated matrices. For instance, some reformulations involve purely imaginary eigenvalues; some involve real eigenvalues and others have all eigenvalues on the complex circle. We show that the deterministic Quartz method--which is a special case of the randomized dual coordinate ascent method with arbitrary sampling recently developed by Qu, Richtarik and Zhang--can be cast in our framework, and achieves the best rate in theory and in numerical experiments among the fixed point methods we study.

### August 28, 2017

I have arrived to Paris. I am attending the Data Science Summer School (DS3) organized by Ecole Polytechnique. I am giving a 5 hour minicourse on Randomized Optimization Methods (here are the slides).

Some event stats (copy pasted from the event website):

400 participants

220 students (MSc, PhD) & postdocs, 100 professionals

16 experts (speakers, guests)

30 countries

6 continents

200 institutions

50 companies

6 sponsors

120 posters

female : male ratio = 3 : 10

### August 20, 2017

The first day of the Fall 2017 semester at KAUST is today. I am teaching CS390FF: Selected Topics in Data Sciences (Big Data Optimization).

Update (Sept 8): 26 students are enrolled in the course.

### August 13, 2017

I am back at KAUST now. The Fall 2017 semester is starting in a week.

### August 10, 2017

New paper out: "Faster PET reconstruction with a stochastic primal-dual hybrid gradient method" - joint work with Antonin Chambolle (Ecole Polytechnique), Matthias J. Ehrhardt (Cambridge), and Carola-Bibiane Schoenlieb (Cambridge).

Abstract: Image reconstruction in positron emission tomography (PET) is computationally challenging due to Poisson noise, constraints and potentially non-smooth priors—let alone the sheer size of the problem. An algorithm that can cope well with the first three of the aforementioned challenges is the primal-dual hybrid gradient algorithm (PDHG) studied by Chambolle and Pock in 2011. However, PDHG updates all variables in parallel and is there- fore computationally demanding on the large problem sizes encountered with modern PET scanners where the number of dual variables easily exceeds 100 million. In this work, we numerically study the usage of SPDHG—a stochastic extension of PDHG—but is still guaranteed to converge to a solution of the deterministic optimization problem with similar rates as PDHG. Numerical results on a clinical data set show that by introducing randomization into PDHG, similar results as the deterministic algorithm can be achieved using only around 10% of operator evaluations. Thus, making significant progress towards the feasibility of sophisticated mathematical models in a clinical setting.

### August 5, 2017

I have just arrived in Sydney, Australia - I am attending ICML. Looking forward to the excellent program!

### July 10, 2017

I am reviewing NIPS papers this week.

Update (after rebuttal): It's never a good strategy for authors to deny obvious issues raised by the reviewers simply do not exist.

### July 3, 2017

Jakub's PhD thesis is now on arXiv.

### July 3, 2017

I am on my way to the Fields Institute, Toronto, to attend a workshop entitled "Modern Convex Optimization and Applications: AN70". This event is organized in honour of Arkadi Nemirovski's 70th birthday. Arkadi is one of the most influential individuals in optimization, directly resposible for the existence of several of its most important and most beautiful subfields. Here is a very brief profile of this giant of our beloved field, taken from the website of a workshop I co-organized in Edinburgh in 2015.

Update (July 5, 2017): I have given my talk today, here are the slides.

Update (July 7, 2017): Filip delivered his pitch talk and presented his poster "Extending the Reach of Big Data Optimization: Randomized Algorithms for Minimizing Relatively Smooth Functions".

### July 2, 2017

New paper out: "A batch-incremental video background estimation model using weighted low-rank approximation of matrices" - joint work with Aritra Dutta (KAUST) and Xin Li (University of Central Florida).

Abstract: Principal component pursuit (PCP) is a state-of-the-art approach for background estimation problems. Due to their higher computational cost, PCP algorithms, such as robust principal component analysis (RPCA) and its variants, are not feasible in processing high definition videos. To avoid the curse of dimensionality in those algorithms, several methods have been proposed to solve the background estimation problem in an incremental manner. We propose a batch-incremental background estimation model using a special weighted low-rank approximation of matrices. Through experiments with real and synthetic video sequences, we demonstrate that our method is superior to the state-of-the-art background estimation algorithms such as GRASTA, ReProCS, incPCP, and GFL.

### June 27, 2017

IMA Fox Prize (2nd Prize) for Robert Mansel Gower

Robert M. Gower was awarded a Leslie Fox Prize (2nd Prize) by the Institute of Mathematics and its Applications (IMA) for the paper Randomized Iterative Methods for Linear Systems (SIAM J. Matrix Anal. & Appl., 36(4), 1660–1690, 2015), coauthored with me. The list of finalists can be found here.

The Leslie Fox Prize for Numerical Analysis of the Institute of Mathematics and its Applications (IMA) is a biennial prize established in 1985 by the IMA in honour of mathematician Leslie Fox (1918-1992). The prize honours "young numerical analysts worldwide" (any person who is less than 31 years old), and applicants submit papers for review. A committee reviews the papers, invites shortlisted candidates to give lectures at the Leslie Fox Prize meeting, and then awards First Prize and Second Prizes based on "mathematical and algorithmic brilliance in tandem with presentational skills".

### June 26, 2017

Jakub Konečný defended his PhD thesis "Stochastic, Distributed and Federated Optimization for Machine Learning" today. Congratulations!

Jakub joined my group as a PhD student in August 2013. His PhD was in his first year supported by the Principal's Career Development Scholarship, and in the subsequent years by a Google Europe Doctoral Fellowship in Optimization Algorithms. Jakub has co-authored 13 papers during his PhD (links to the papers can be found here). He has worked on diverse topics such as distributed optimization, machine learning, derivative-free optimization, federated learning, gesture recognition, semi-stochastic methods and variance-reduced algorithms for empirical risk minimization. He is joining Google Seattle in August 2017 as a research scientist.

### June 21, 2017

New paper out: "Privacy Preserving Randomized Gossip Algorithms" - joint work with Filip Hanzely (Edinburgh), Jakub Konečný (Edinburgh), Nicolas Loizou (Edinburgh) and Dmitry Grishchenko (Higher School of Economics, Moscow).

Abstract: In this work we present three different randomized gossip algorithms for solving the average consensus problem while at the same time protecting the information about the initial private values stored at the nodes. We give iteration complexity bounds for all methods, and perform extensive numerical experiments.

### June 18, 2017

I am now in Slovakia, visiting Radoslav Harman at Comenius University.

### June 15, 2017

New paper out: "Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications" - joint work with Antonin Chambolle (Ecole Polytechnique), Matthias J. Ehrhardt (Cambridge), and Carola-Bibiane Schoenlieb (Cambridge).

Abstract: We propose a stochastic extension of the primal-dual hybrid gradient algorithm studied by Chambolle and Pock in 2011 to solve saddle point problems that are separable in the dual variable. The analysis is carried out for general convex-concave saddle point problems and problems that are either partially smooth / strongly convex or fully smooth / strongly convex. We perform the analysis for arbitrary samplings of dual variables, and obtain known deterministic results as a special case. Several variants of our stochastic method significantly outperform the deterministic variant on a variety of imaging tasks.

### June 4, 2017

New paper out: "Stochastic reformulations of linear systems: algorithms and convergence theory" - joint work with Martin Takáč (Lehigh).

Abstract: We develop a family of reformulations of an arbitrary consistent linear system into a stochastic problem. The reformulations are governed by two user-defined parameters: a positive definite matrix defining a norm, and an arbitrary discrete or continuous distribution over random matrices. Our reformulation has several equivalent interpretations, allowing for researchers from various communities to leverage their domain specific insights. In particular, our reformulation can be equivalently seen as a stochastic optimization problem, stochastic linear system, stochastic fixed point problem and a probabilistic intersection problem. We prove sufficient, and necessary and sufficient conditions for the reformulation to be exact.

Further, we propose and analyze three stochastic algorithms for solving the reformulated problem---basic, parallel and accelerated methods---with global linear convergence rates. The rates can be interpreted as condition numbers of a matrix which depends on the system matrix and on the reformulation parameters. This gives rise to a new phenomenon which we call stochastic preconditioning, and which refers to the problem of finding parameters (matrix and distribution) leading to a sufficiently small condition number. Our basic method can be equivalently interpreted as stochastic gradient descent, stochastic Newton method, stochastic proximal point method, stochastic fixed point method, and stochastic projection method, with fixed stepsize (relaxation parameter), applied to the reformulations.

Comment: I have taught a course at the University of Edinburgh in Spring 2017 which was largely based on the results in this paper.

### June 3, 2017

I am now back at KAUST to welcome Aritra Dutta who just joined my group at KAUST as a postdoc. Aritra: welcome!

### May 26, 2017

I am in Edinburgh now - I'll be here until May 30. I am then giving a talk at Plymouth University on May 31 and at Cardiff University on June 1st. I'll be in London on June 2nd.

Update (June 4): This is the paper I talked about in Plymouth and Cardiff.

### May 20, 2017

I am in Vancouver as of today, attending the SIAM Conference on Optimization. I am giving a talk on Monday, May 22 (stochastic reformulations of linear systems), and so is Nicolas Loizou (stochastic heavy ball method) and Filip Hanzely (randomized methods for minimizing relatively smooth functions). Strangely, none of these three papers are online yet! Robert Gower is giving his talk on Tuesday (sketch and project: a tool for designing stochastic quasi-Newton methods and stochastic variance reduced gradient methods). The first part of the talk is based on this and this paper, the variance reduction part is also new and not online yet. Jakub Konečný on Wednesday (AIDE: fast and communication efficient distributed optimization).

Update (June 4): This is the paper I talked about in Vancouver. Here are the talk slides.

### May 15, 2017

Martin Takáč is visiting me at KAUST this week.

### May 10, 2017

New Approach to AI: Federated Learning

Standard machine learning approaches require centralizing the training data on one machine or in a datacenter. For models trained from user interaction with mobile devices, a new approach was just released by Google, a result of collaboration between Google, Jakub Konečný and myself.

The new approach is called Federated Learning; it is described in the following four paper:

[1] Keith Bonawitz, Vladimir Ivanov, Ben Kreuter, Antonio Marcedone, H. Brendan McMahan, Sarvar Patel, Daniel Ramage, Aaron Segal and Karn Seth

Practical Secure Aggregation for Privacy Preserving Machine Learning (3/2017)

[2] Jakub Konečný, H. Brendan McMahan, Felix X. Yu, Peter Richtárik, Ananda Theertha Suresh, Dave Bacon

Federated Learning: Strategies for Improving Communication Efficiency (10/2016)

[3] Jakub Konečný, H. Brendan McMahan, Daniel Ramage, Peter Richtárik

Federated Optimization: Distributed Machine Learning for On-Device Intelligence (10/2016)

[4] H. Brendan McMahan, Eider Moore, Daniel Ramage, Seth Hampson, Blaise Agüera y Arcas

Communication-Efficient Learning of Deep Networks from Decentralized Data (2/2016)

Federated Learning enables mobile phones to collaboratively learn a shared prediction model while keeping all the training data on device, decoupling the ability to do machine learning from the need to store the data in the cloud. This goes beyond the use of local models that make predictions on mobile devices by bringing model training to the device as well. The technology is now in use by around 1 billion Android devices.

The CEO of Google, Sundar Pichai, said:

“… we continue to set the pace in machine learning and AI research. We introduced a new technique for training deep neural networks on mobile devices called Federated Learning. This technique enables people to run a shared machine learning model, while keeping the underlying data stored locally on mobile phones."

The new technology is described in a Google Research Blog, dated April 2017, to a lay audience. Selected media coverage:

- Forbes
- The Verge
- Quartz
- TechRepublic
- ZDNet
- Computer Business Review
- Motherboard Vice
- Infoworld
- Silicon.co.uk

- Venturebeat
- Engadget
- Tech
Narratives

- GadgetHacks

- BGR
- AndroidAuthority
- AndroidHeadlines
- Tom's Guide
- Digital
Trends

- The Exponential View
- vvcat

- 9to5google

### May 9, 2017

New paper out: "Parallel stochastic Newton method" - joint work with Mojmír Mutný (ETH Zurich).

Abstract: We propose a parallel stochastic Newton method (PSN) for minimizing smooth convex functions. We analyze the method in the strongly convex case, and give conditions under which acceleration can be expected when compared to it serial stochastic Newton. We show how PSN can be applied to the empirical risk minimization problem, and demonstrate the practical efficiency of the method through numerical experiments and models of simple matrix classes.

### May 6, 2017

I am hosting two interns from Indian Institute of Technology, Kanpur this Summer at KAUST; they just arrived: Aashutosh Tiwari and Atal Narayan. Welcome!

### May 2, 2017

Most Downloaded SIMAX Paper

The paper "Randomized Iterative Methods for Linear Systems", coauthored with Robert M. Gower and published in 2015, is now the Most Downloaded Paper in the SIAM Journal on Matrix Analysis and Applications.

The paper was the Second Most Downloaded Paper since at least August 2016 when Robert noticed this and brought it to my attention. We have just noticed it climbed up to the #1 position. Thanks for all the interest in our work!

For those who want to pursue the line of work initiated in our paper, we recommend looking at the following follow-up papers where we address several extensions and obtain various additional insights and improvements:

1) Stochastic dual ascent for solving linear systems, arXiv:1512.06890

Here we lift the full rank assumption from our original SIMAX paper. In doing so, we discover a particularly beautiful duality theory behind the method. This also leads to the design of a novel stochastic method in optimization, which we call Stochastic Dual Ascent (SDA). With a bit of hindsight - we should have called it Stochastic Dual Subspace Ascent (SDSA).

2) Randomized quasi-Newton updates are linearly convergent matrix inversion algorithms, arXiv:1602.01768

The basic idea behind this paper is to apply a similar methodology we used to solve linear systems in the SIMAX paper to the problem of computing an inverse of a (large) matrix. I'll simply copy-paste the abstract here:

We develop and analyze a broad family of stochastic/randomized algorithms for inverting a matrix. We also develop specialized variants maintaining symmetry or positive definiteness of the iterates. All methods in the family converge globally and linearly (i.e., the error decays exponentially), with explicit rates. In special cases, we obtain stochastic block variants of several quasi-Newton updates, including bad Broyden (BB), good Broyden (GB), Powell-symmetric-Broyden (PSB), Davidon-Fletcher-Powell (DFP) and Broyden-Fletcher-Goldfarb-Shanno (BFGS). Ours are the first stochastic versions of these updates shown to converge to an inverse of a fixed matrix. Through a dual viewpoint we uncover a fundamental link between quasi-Newton updates and approximate inverse preconditioning. Further, we develop an adaptive variant of randomized block BFGS, where we modify the distribution underlying the stochasticity of the method throughout the iterative process to achieve faster convergence. By inverting several matrices from varied applications, we demonstrate that AdaRBFGS is highly competitive when compared to the well established Newton-Schulz and minimal residual methods. In particular, on large-scale problems our method outperforms the standard methods by orders of magnitude. Development of efficient methods for estimating the inverse of very large matrices is a much needed tool for preconditioning and variable metric optimization methods in the advent of the big data era.

3) Stochastic block BFGS: squeezing more curvature out of data, ICML 2016

In this work we apply the stochastic block BFGS method developed in the above paper to empirical risk minimization. Of course, much more is needed than just a straightforward application - but this is the initial idea behind the work.

4) Linearly convergent randomized iterative methods for computing the pseudoinverse, arXiv:1612.06255

Here we show that after suitable insights and modifications, the iterative sketching framework for inverting matrices from the "quasi-Newton" paper above can be used to compute the Moore-Penrose pseudoinverse of arbitrary (rectangular) real matrices. Extension to the complex setting is possible, but we did not do it.

5) Soon I will post a new paper on ArXiv which will go much deeper than the SIMAX paper - this work will represent what is to the best of my knowledge the deepest insight into the sketch-and-project we have at the moment.

Update (18.6.2017): The paper I mentioned in item 5 above is now on arXiv.

### April 19, 2017

Dominik Csiba is giving a talk entitled "The role of optimization in machine learning" at the Machine Learning MeetUp (MLMU) in Bratislava today (at 7pm). If you are around and interested in machine learning and/or optimization, I recommend you attend!

Update (May 26, 2017): A video recording of Dominik's talk is now online.

### April 17, 2017

Alibek Sailanbayev is visiting me at KAUST this week.

### April 10, 2017

### April 9, 2017

### April 6, 2017

About a week ago I have received an email (see a screenshot below) in which Springer notifies the optimization community about the top 5 trending articles in the Mathematical Programming (Series A and B) journals. It was a great surprise (and pleasure) to learn that our paper Parallel coordinate descent methods for big data optimization (coauthored with Martin Takáč) is #1 on the list!

### March 29, 2017

This paper has been the 2nd most downloaded SIMAX paper since (at least) August 2016. The first most downloaded paper was published in year 2000.

The Leslie Fox Prize for Numerical Analysis of the Institute of Mathematics and its Applications (IMA) is a biennial prize established in 1985 by the IMA in honour of mathematician Leslie Fox (1918-1992). The prize honours "young numerical analysts worldwide" (any person who is less than 31 years old), and applicants submit papers for review. A committee reviews the papers, invites shortlisted candidates to give lectures at the Leslie Fox Prize meeting, and then awards First Prize and Second Prizes based on "mathematical and algorithmic brilliance in tandem with presentational skills".

### March 17, 2017

### March 14, 2017

### March 13, 2017

Full list of tutorials:

- Yoshua Bengio (Montreal): Deep Learning

- Pradeep Ravikumar (Carnegie Mellon): Graphical Models

- Peter Richtarik (Edinburgh & KAUST): Randomized Optimization Methods

- Csaba Szepesvari (Alberta): Bandits

Plenary speakers:

- Cedric Archambeau (Amazon)

- Olivier Bousquet (Google)

- Damien Ernst (Liege)

- Laura Grigori (INRIA)

- Sean Meyn (Florida)

- Sebastian Nowozin (Microsoft Research)

- Stuart Russell (Berkeley)

The full program can be found here.

### March 7, 2017

### February 28, 2017

### February 26, 2017

PS: I am pleasantly surprised to see that the weather at KAUST is great at this time of the year!

### February 21, 2017

### February 14, 2017

### February 7, 2017

### January 29, 2017

### January 25, 2017

### January 23, 2017

### January 19, 2017

### January 16, 2017

### January 10, 2017

Lectureship in Industrial Mathematics, application deadline: February 1, 2017 (5pm UK time)

Two Lectureships in Statistics and Data Science, application deadline: February 7, 2017 (5pm UK time)

### January 7, 2017

*Some news:*Dominik Csiba is now based in Slovakia. He will be finishing off his PhD from there; and is expected to submit his thesis in the Summer of 2017. He will be picking up some optimization/machine learning related activities in Slovakia - do talk to him if you get a chance! For instance, on February 15, Dominik will give a talk at a Slovak Machine Learning Meetup (MLMU). Further, Dominik is a mentor in a Data Science BaseCamp. Here is a blurb from their website: "BaseCamp is an immersive full-time 8-week program for prospective data scientists. During 8 weeks you will deepen your theoretical knowledge, enhance your practical skills and become a qualified data scientist ready for your exciting data science career."

Read Old News (2016 and earlier)