{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Lab #4: Curvature\n", "\n", "by Peter Richtarik\n", "\n", "## The Problem: Ridge Regression\n", "\n", "As in Lab 2, in this lab we will consider the ridge regression problem:\n", "\n", "$$\\min_{x\\in \\mathbb{R}^n} \\left[ f(x) = \\frac{1}{2}\\|Ax-b\\|_2^2 + \\frac{\\lambda}{2} \\|x\\|_2^2\\right],$$\n", "\n", "where $A$ is an $m\\times n$ matrix and $b\\in \\mathbb{R}^n$. Recall that the gradient of $f$ is given by\n", "\n", "$$\\nabla f(x) = A^T (Ax-b) + \\lambda x$$\n", "\n", "and that the Hessian of $f$ is given by\n", "\n", "$$\\nabla^2 f(x) = A^T A + \\lambda I \\overset{\\text{def}}{=} H,$$\n", "\n", "where $I$ is the $n\\times n$ identity matrix.\n", "\n", "It is easy to verify that for all $x,h\\in \\mathbb{R}^n$ we have\n", "\n", "$$f(x+h) = f(x) + \\langle \\nabla f(x), h \\rangle + \\frac{1}{2} \\langle H h, h \\rangle.$$ \n", "\n", "Hence, $f$ satisfies the 2 assumptions from today's lecture ($G$-strong convexity and $M$-smoothness) with $G=M=H$.\n", "\n", "## Goals\n", "\n", "In this lab, we shall play with Methods 1, 2 and 3 discussed in today's lecture. Recall that Method 3 (NSync [1]) was covered in Lab 2.\n", "\n", "[1] Richtarik and Takac. On optimal probabilities in stochastic coordinate descent methods. Optimization Letters, 2015." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 1. Data: 3-dimensional problem from lecture" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "3x1 Array{Float64,2}:\n", " -54.3567 \n", " 6.83187\n", " 48.598 " ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "m = 3\n", "n = 3\n", "H = [1.0000 0.9900 0.9999; 0.9900 1.0000 0.9900; 0.9999 0.9900 1.0000]\n", "b = ones(m,1)\n", "lambda = 5\n", "I = eye(3)\n", "l = eigmin(H)\n", "lambda = l / 2\n", "B = H - lambda*I\n", "A = chol(B) # makes sure A is such that H = A'*A + lambda*I \n", "\n", "x_star = (A'*A + lambda*eye(n))\\(A'*b) # finds the point x where derivative is zero (the solution)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 2. Ridge regression function" ] }, { "cell_type": "code", "execution_count": 44, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "Ridge (generic function with 1 method)" ] }, "execution_count": 44, "metadata": {}, "output_type": "execute_result" } ], "source": [ "function Ridge(A,b,lambda,x)\n", " \n", " g = A*x - b\n", " return 0.5*(g'*g + lambda*x'*x) \n", " \n", "end" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 3. $\\tau$-nice sampling\n", "\n", "Recall that by sampling we mean a random subset of $[n]=\\{1,2,\\dots,n\\}$. More formally, a sampling is a random set-valued mapping with values in $2^{[n]}$, where $[n]$. A sampling $\\hat{S}$ is uniquely determinied by the probabilities $p_S = \\mathbb{P}(\\hat{S}=S)$ attached to all subsets $S\\subseteq [n]$. \n", "\n", "$\\bf Definition.$ $\\tau$-nice sampling is the (unique) sampling $\\hat{S}$ defined by \n", "\n", "$$\\mathbb{P}(\\hat{S}=S) = \\begin{cases} \\frac{1}{n \\choose \\tau}, & \\text{if} \\quad |S| = \\tau, \\\\ 0, & \\text{otherwise.}\\end{cases}$$\n", "\n", "In words, this means that $\\hat{S}$ is a random subset of $[n] = \\{1,2,\\dots,n\\}$ of cardinality $\\tau$, chosen uniformly at random.\n", "\n", "### Problem A \n", "Your task is to implement the $\\tau$-nice sampling as a function. In particular, write function $\\verb\"taunice\"$ whose input is $n$ and $\\tau\\in [n]$ and whose output is a vector $S$ of size $\\tau$ such that $$\\hat{S} = \\{S[1],\\dots,S[\\tau]\\}.$$" ] }, { "cell_type": "code", "execution_count": 34, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "taunice (generic function with 1 method)" ] }, "execution_count": 34, "metadata": {}, "output_type": "execute_result" } ], "source": [ "function taunice(n, tau)\n", " \n", " ??????\n", " \n", " return S\n", " \n", "end" ] }, { "cell_type": "code", "execution_count": 37, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "2x2 Array{Float64,2}:\n", " 50.2513 -49.7487\n", " -49.7487 50.2513" ] }, "execution_count": 37, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Testing the taunice function\n", "\n", "S = taunice(n,2)\n", "C = H[S,S] # compute a random submatrix of the Hessian of size 2x2\n", "inv(C)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 4. Method 1 (Randomized Newton)\n", "\n", "The code is written for you. Some pieces are missing.\n", "\n", "### Problem B\n", "\n", "Fill in the 5 question marks so that the method works as it should." ] }, { "cell_type": "code", "execution_count": 141, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "Method1 (generic function with 2 methods)" ] }, "execution_count": 141, "metadata": {}, "output_type": "execute_result" } ], "source": [ "function Method1(A, b, lambda, tau, T, skip)\n", " \n", " println(\"Method 1 : START\")\n", " \n", " m, n = size(A)\n", " \n", " time = 0\n", " tic()\n", " \n", " H = ??? # compute the Hessian\n", " x = zeros(n,1) # starting point\n", " g = -b # g will be maintained to be equal to A*x - b\n", " \n", " time = time + toq()\n", " \n", " # remembering and plotting \n", " xs = zeros(n, floor(Integer, T/skip) + 1) # will remember the iterates x here\n", " xs[:,1] = x # remember iterate x\n", " println(\"iteration: $(0), function value:$(Ridge(A,b,lambda,x))\")\n", " \n", " \n", " for k = 1:T\n", " \n", " tic()\n", " \n", " S = ???(n, ???) # choose a random index \n", " \n", " d = A[:,S]'*g + lambda*x[S] # compute partial derivatives for coordinates in S\n", " update = ???*d # compute the update\n", " x[S] = x[S] - update # perform the update\n", " g = ??? # update the residual\n", " \n", " time = time + toq()\n", " \n", " # remembering and plotting \n", " if k % skip == 0\n", " xs[:,round(Int,k/skip)+1] = x # remember iterate x\n", " println(\"iteration: $(k), function value:$(Ridge(A,b,lambda,x))\")\n", " end\n", "\n", " end\n", "\n", " println(\"Time = \", time)\n", " return x, xs\n", " \n", "end" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 5. Method 3 (NSync, copied from Lab 2)\n", "\n", "The NSync method below is copied from Lab 2. It only works with the serial uniform sampling $\\hat{S}$. That is, the method updates in each iteration a single coordinate of the vector $x\\in \\mathbb{R}^n$ chosen uniformly at random. More formally, $\\hat{S}$ satsifies \n", "\n", "$$\\mathbb{P}(\\hat{S}=\\{i\\}) = \\tfrac{1}{n} \\qquad \\text{for} \\qquad i=1,2,\\dots,n.$$\n", "\n", "### Problem C\n", "\n", "In order to compare it with Method 1, implement NSync using the $\\tau$-nice sampling. Call the new method $\\verb\"Method 3\"$, and include also an additional input parameter: $\\tau$. Be careful to chose the ESO parameters $v_1,\\dots,v_n$ inside the function right! Recall they need to satisfy:\n", "\n", "$$\\mathbf{P} \\bullet H \\preceq Diag(p \\bullet v),$$\n", "\n", "where $\\mathbf{P}$ is the probability matrix of sampling $\\hat{S}$. That is, this is the $n\\times n$ matrix with entries $\\mathbf{P}_{ij} = \\mathbb{P}( \\{i,j\\} \\subset \\hat{S})$. \n", "\n", "If $n$ is small, suitable $v$ can be computed through the computation of a maximum eigenvalue. If $n$ is large however, a simpler way for computing suitable $v$ is needed. " ] }, { "cell_type": "code", "execution_count": 23, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "NSync (generic function with 1 method)" ] }, "execution_count": 23, "metadata": {}, "output_type": "execute_result" } ], "source": [ "function NSync(A, b, lambda, T)\n", " \n", " m, n = size(A)\n", " \n", " v = zeros(n,1)\n", " x = zeros(n,1) # staring point\n", " g = -b # g will be maintained to be equal to A*x - b\n", " \n", " for i=1:n # compute ESO parameters\n", " v[i] = (norm(A[:,i]))^2 + lambda\n", " end\n", " \n", " for k = 1:T\n", " i = rand(1:n) # choose a random index \n", " h = (1/v[i]) * ( A[:,i]' * g + lambda * x[i] ) # compute the update\n", " x[i] = x[i] - h[1] # perform the update \n", " g = g - h[1] * A[:,i] # update the residual\n", " end\n", " \n", " return x\n", " \n", "end" ] }, { "cell_type": "code", "execution_count": 50, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "Method3 (generic function with 1 method)" ] }, "execution_count": 50, "metadata": {}, "output_type": "execute_result" } ], "source": [ "function Method3(A, b, lambda, tau, T)\n", " \n", " m, n = size(A)\n", " \n", " v = zeros(n,1)\n", " x = zeros(n,1) # staring point\n", " g = -b # g will be maintained to be equal to A*x - b\n", " \n", " for i=1:n # compute ESO parameters\n", " v[i] = (norm(A[:,i]))^2 + lambda\n", " end\n", " \n", " for k = 1:T\n", " i = rand(1:n) # choose a random index \n", " h = (1/v[i]) * ( A[:,i]' * g + lambda * x[i] ) # compute the update\n", " x[i] = x[i] - h[1] # perform the update \n", " g = g - h[1] * A[:,i] # update the residual\n", " end\n", " \n", " return x\n", " \n", "end" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 6. Plotting\n", "\n", "This function was not used anywhere in this lab - but feel free to use it." ] }, { "cell_type": "code", "execution_count": 120, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "PlotResults (generic function with 2 methods)" ] }, "execution_count": 120, "metadata": {}, "output_type": "execute_result" } ], "source": [ "using PyPlot\n", "\n", "function PlotResults(x_star, xs, skip)\n", " \n", " ax = axes()\n", " plt[:plot](skip*(0:length(xs)-1), [sqrt((xs[:,i]-x_star)'*(xs[:,i]-x_star)) for i=1:length(xs)] , \":\", linewidth=3.0, label=L\"||x - x^*||\")\n", " legend(loc=\"upper right\")\n", " ylabel(\"error\", fontsize=20)\n", " xlabel(\"iterations\")\n", " ax[:set_yscale](\"log\")\n", " plt[:show]\n", " \n", "end" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 7. Testing the method(s)" ] }, { "cell_type": "code", "execution_count": 145, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Method 1 : START\n", "iteration: 0, function value: [1.5]\n", "iteration: 1, function value: [0.5613023240294428]\n", "iteration: 2, function value: [0.5583477258144952]\n", "iteration: 3, function value: [0.5583477258144952]\n", "iteration: 4, function value: [0.5554227474447875]\n", "iteration: 5, function value: [0.5554227474447876]\n", "iteration: 6, function value: [0.5525270919813898]\n", "iteration: 7, function value: [0.2672834911403561]\n", "iteration: 8, function value: [0.26658102573270065]\n", "iteration: 9, function value: [0.2665792957810137]\n", "iteration: 10, function value: [0.26657929152068593]\n", "Time = 0.000442322\n", "Error of Method 1 = 0.00045807242417239335\n", "Error of Method 3 = 72.92402167626439\n" ] } ], "source": [ "T = 10 # number of iterations\n", "skip = 1\n", "\n", "x1, xs = Method1(A,b,lambda,2,T, skip)\n", "println(\"Error of Method 1 = \", norm(x1-x_star))\n", "\n", "x3 = Method3(A,b,lambda,2,T)\n", "println(\"Error of Method 3 = \", norm(x3-x_star))\n", "\n", "\n", "# PlotResults(x_star, xs, skip)\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Probem C\n", "\n", "Code up also Method 2. Admittedly, this not a practical method. However, code it up anyway and aim to reconstruct the plot for the 3D function from the lecture." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Problem D\n", "\n", "Modify the functions in such a way that you obtain semi-logarithmic plots of function values for all 3 methods." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Problem E\n", "\n", "Rewrite Method 1 so that it solves the logistic regression problem instead." ] } ], "metadata": { "kernelspec": { "display_name": "Julia 0.3.12", "language": "julia", "name": "julia-0.3" }, "language_info": { "file_extension": ".jl", "mimetype": "application/julia", "name": "julia", "version": "0.3.12" } }, "nbformat": 4, "nbformat_minor": 0 }