{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Lab #2: Randomized Gradient Methods\n", "\n", "by Peter Richtarik\n", "\n", "## The Problem: Ridge Regression\n", "\n", "In this lab we will experiment with the NSync [1] algorithm applied to the ridge regression problem:\n", "\n", "$$\\min_{x\\in \\mathbb{R}^n} \\left[ f(x) = \\frac{1}{2}\\|Ax-b\\|_2^2 + \\frac{\\lambda}{2} \\|x\\|_2^2\\right],$$\n", "\n", "where $A$ is an $m\\times n$ matrix and $b\\in \\mathbb{R}^n$. Recall that the gradient of $f$ is given by\n", "\n", "$$\\nabla f(x) = A^T (Ax-b) + \\lambda x.$$\n", "\n", "[1] Richtarik and Takac. On optimal probabilities in stochastic coordinate descent methods. Optimization Letters, 2015.\n", "\n", "## NSync Algorithm\n", "\n", "The NSync algorithm proceeds as follows:\n", "\n", "- Initialization: \n", " - Start with $x_0\\in \\mathbb{R}^n$\n", " - Fix a proper sampling $\\hat{S}$ and compute positive parameters $v = (v_1,\\dots,v_n)$ such that \n", "\n", "$$ (f,\\hat{S}) \\sim ESO(v)$$\n", "\n", "- In iteration $k$ do:\n", " - Draw a fresh copy of $S_k\\sim \\hat{S}$ (i.e., independently from everything else)\n", " - For $i\\in S_k$ do\n", " $$ x_{k+1}^{(i)} = x_k^{(i)} - \\frac{1}{v_i} e_i^T \\nabla f(x_k) $$\n", " - For $i\\notin S_k$ do\n", " $$ x_{k+1}^{(i)} = x_k^{(i)}$$\n", "\n", "Recall that a sampling $\\hat{S}$ is proper if the quantity $p_i = \\mathbf{Prob}(i\\in \\hat{S})$ is positve for all $i\\in [n]=\\{1,2,\\dots,n\\}$.\n", "\n", "## Computing ESO parameters \n", "\n", "In order to be able to run the NSync method, we need to pre-compute the ESO parameters $v_1,\\dots,v_n$ before the methods starts. The following result will be helpful in this regard:\n", "\n", "${\\bf Theorem}$ Let $P = (p_{ij})\\in \\mathbb{R}^{n\\times n}$ be the probability matrix associated with sampling $\\hat{S}$. That is, $p_{ij} = \\mathbf{Prob}(i\\in \\hat{S}, j\\in \\hat{S})$. Then $(f,\\hat{S})\\sim ESO(v)$ holds if $v$ satisfies the inequality\n", "\n", "$$ P \\bullet (A^TA + \\lambda I) \\preceq Diag(p_1 v_1, \\dots, p_n v_n),$$\n", "\n", "where $p_i = p_{ii}$.\n", "\n", "We have the following useful consequence:\n", "\n", "${\\bf Corollary}$ If $\\hat{S}$ is a serial sampling, i.e., if $p_{ij} = 0$ whenever $i\\neq j$, then $P$ is diagonal and so \n", "\n", "$$P \\bullet (A^TA + \\lambda I) = Diag(p_1 (\\|A_{:1}\\|_2^2 + \\lambda), \\dots, p_n (\\|A_{:n}\\|_2^2 + \\lambda)).$$\n", "\n", "Therefore, $(f,\\hat{S})\\sim ESO(v)$ with \n", "\n", "$$v_i = \\|A_{:i}\\|_2^2 + \\lambda, \\qquad i=1,\\dots,n.$$" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 1. Data" ] }, { "cell_type": "code", "execution_count": 53, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "1000x1 Array{Float64,2}:\n", " 0.0116559\n", " 0.0 \n", " 0.0 \n", " 0.0 \n", " 0.115493 \n", " -0.214575 \n", " 0.0766919\n", " 0.0104832\n", " -0.0666041\n", " 0.0 \n", " -0.0320028\n", " -0.0290425\n", " -0.0933021\n", " ⋮ \n", " 0.0127912\n", " -0.014639 \n", " 0.0 \n", " -0.0180058\n", " -0.0493315\n", " 0.0 \n", " 0.0124382\n", " 0.0 \n", " 0.0138768\n", " 0.0447612\n", " 0.0154187\n", " 0.0 " ] }, "execution_count": 53, "metadata": {}, "output_type": "execute_result" } ], "source": [ "m = 10\n", "n = 1000\n", "A = sprandn(m,n,0.1)/m # sparse random matrix\n", "b = ones(m,1)\n", "lambda = 1\n", "\n", "x_star = (A'*A + lambda*eye(n))\\(A'*b) # finds the point x where derivative is zero (the solution)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 2. NSync" ] }, { "cell_type": "code", "execution_count": 48, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "NSync (generic function with 1 method)" ] }, "execution_count": 48, "metadata": {}, "output_type": "execute_result" } ], "source": [ "function NSync(A, b, lambda, T)\n", " \n", " m, n = size(A)\n", " \n", " v = zeros(n,1)\n", " x = zeros(n,1) # staring point\n", " g = -b # g will be maintained to be equal to A*x - b\n", " \n", " for i=1:n # compute ESO parameters\n", " v[i] = (norm(A[:,i]))^2 + lambda\n", " end\n", " \n", " for k = 1:T\n", " i = rand(1:n) # choose a random index \n", " h = (1/v[i]) * ( A[:,i]' * g + lambda * x[i] ) # compute the update\n", " x[i] = x[i] - h[1] # perform the update \n", " g = g - h[1] * A[:,i] # update the residual\n", " end\n", " \n", " return x\n", " \n", "end" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 3. Testing" ] }, { "cell_type": "code", "execution_count": 54, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "7.067384249535481e-15" ] }, "execution_count": 54, "metadata": {}, "output_type": "execute_result" } ], "source": [ "T = 50*n\n", "x = NSync(A,b,lambda,T)\n", "norm(x-x_star)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Problem A\n", "\n", "Write a new version of the NSync function and call it NSyncPlot. This function will also visualize the results via a plot, as in Lab 1. Copy paste the relevant code from Lab 1 and modify to make it work." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Problem B\n", "\n", "Implement NSync for 2 different serial samplings and compare the results." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Problem C\n", "\n", "Implement NSync with the deterministic sampling for which $\\hat{S}=\\{1,2,\\dots,n\\}$ with probability 1. NSync transforms into standard gradient descent. Compute the parameters $v$ satisfying the ESO assumption, and run the method for 50 iterations. Compare with NSync which uses the serial sampling and which is run for $T = 50n$ iterations. Is this comparison fair? \n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Problem D\n", "\n", "Implement a version of NSync for which $|\\hat{S}| = 100$ with probability 1. Compare with the previously developed versions." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Problem E\n", "\n", "Find a way to perform the for loop in NSync in parallel. You will need to study Julia's documentation to pull this off." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Problem F\n", "\n", "Write a new version of NSync (with any sampling you like), this time applied to logistic regression. " ] } ], "metadata": { "kernelspec": { "display_name": "Julia 0.3.11", "language": "julia", "name": "julia-0.3" }, "language_info": { "file_extension": ".jl", "mimetype": "application/julia", "name": "julia", "version": "0.3.11" } }, "nbformat": 4, "nbformat_minor": 0 }