Blind mountain climbers
How would you go about getting to the top of a mountain if you were blindfolded? Perhaps you would take a few tentative steps in each direction, getting a feel for the land before deciding which way to go. Maybe you would go in the direction where the mountain was steepest, in order to get to the top faster. How would you even know when you had reached the top?
This is the everyday experience of Peter Richtarik, a mathematician working in the field of continuous optimization. Optimization is, unsurprisingly, about optimizing things: finding the fastest algorithm, the best explanation of data, the most cost-effective use of resources. Sometimes the problems involve huge amounts of data; for example, Google having to find the most relevant webpage out of billions of options. It is impossible to 'see' all this data at once, in the same way that our mountain-climber cannot ask to look at a map of the land. The task for mathematicians like Peter is to find an optimum solution by analysing only small amounts of the data at a time.
One of the tools in Peter's field is a technique called Principal Component Analysis. This reveals the underlying structure of the data and whether there is any correlation between the variables. If we think of the data as being a cloud of points, like stars in a galaxy, we can try to see if there are particular directions which contain more data points than others. In the case of the Andromeda galaxy (pictured left), there are two principal components that account for the majority of the stars. This means that instead of locating each star using a 3-dimensional co-ordinate (x,y,z) we can locate it using a combination of the red and blue vectors in the picture. Thus we can think of Andromeda as a 2-dimensional object rather than a 3-dimensional object, and this makes all our calculations simpler.
As a more serious example, suppose that 300 people are sick with a genetic disease and we want to find out which of 20,000 possible genes is responsible. The data is a giant matrix in which the entries reveal how active particular genes are inside particular people. This activity level is called gene expression. By using principal component analysis we can find a few linear combinations of the genes which account for the majority of the 6 million entries.
Whilst this does make the problem easier, it still does not accomplish the goal of finding the precise genes responsible for the disease. Although we can explain the data using just a few combinations of genes, each of these combinations still potentially involves all of the 20,000 genes. (Analogously, the red and blue arrows in Andromeda are still themselves defined by 3 co-ordinates.)
Some of Peter's current research at Edinburgh is focused on ways of eliminating some of the genes from the game altogether. That is, not only finding small numbers of linear combinations of genes which explain the data, but on finding linear combinations of small numbers of genes which explain the data. This method is called Sparse Principal Component Analysis and Peter (and his collaborators) are trying to find methods to do the calculations as quickly as possible.
All of this work involves heavy-duty mathematics and heavy-duty computations. What can companies do when their data sets are so large that their computers simply aren't powerful enough to do the analysis? Contrary to our intuition, it has been shown that sometimes our blind mountain climber is better off picking a random path than trying to compute the best local one. Instead of scouting out the land, the climber flips a coin to decide whether to go north/south or east/west and takes a single step in that direction. They then flip the coin again and take another random step. This random algorithm can be shown to get you as close as you like to the top of the mountain, provided you take enough steps.
The random-walking method has two advantages: flipping a coin to decide a direction is both easier and faster than doing calculations of the slope at each step. When the mountains involved live in a million-dimensional space, this statistical method is the only practical way to solve the problem. Peter's work involves taking these ideas and developing faster and better algorithms to get climbers to the tops of mountains. There are many surprising applications of his work, including building bridges and de-blurring images, and there is absolutely no doubt that in today's data-hungry world these optimization techniques will become increasingly important.
If you are interested in getting more details about Peter's work or in working with him, then visit his website for a list of papers and contact details. The University of Edinburgh also offers a Masters programme in Operational Research which Peter helps to teach.
About our mathematician: Peter is a talented photographer and you can find some of his work on the corridors of the Edinburgh maths department in the form of the staff photos. In his free time, he enjoys playing the guitar, trampolining and debating philosophy. He has also devoted much time to promoting mathematics to school children in Slovakia.