School of Mathematics

Student Research Article - Jacob Bradley

PhD Student Jacob Bradley has written the following article as part of our series of Student Research Articles!

Cancer is a disease that has profound impacts on our society every day. Despite this, we, scientists and doctors, feel like we know very little about cancer relative to how much there is to know. Why is this?

It's simple: cancer is not one disease. Two patients' tumours may be caused by different things, occur in different places, and have very different effects. More so than any other disease, every cancer is unique. To understand this variety, we have to look at the one thing all cancers have in common: mutations. Mutations are where DNA, the information-storing molecule in cells, has been changed by accident somewhere along its sequence. In tumours this causes cells to become detached from the normal rules that govern when cells reproduce and die. Some tumours have lots of mutations throughout their genome (complete DNA sequence), while some have very few. No two tumours share the same pattern of mutations; the human genome is so long that the chances of damage occurring in exactly the same places are minute.

In the situation we've described there are far more possible outcomes (patterns of mutation) than we will ever have samples to work with. This makes doing classical statistics, which often works on the assumption that we have a large sample size compared to the amount of data contained in each sample, very hard. Research addressing this difficulty is called high-dimensional statistics, and is the mathematical side of what I do.

In statistics there are broadly two types of questions we might like to answer. Firstly, given some data, how is that data structured and what correlations exist within it? Secondly, how does our data relate to another, separate, piece of information? We can illustrate these two types of question quite nicely in the context of cancer genomics.

As we know, tumour cells accumulate mutations. Some are important for the development of the tumour, and some are just along for the ride, caused by the same processes as the important mutations but of little effect. The difficulty is distinguishing the important from the unimportant, when the same set of mutations rarely occurs in two different tumours. To do this we need to build statistical models describing the process by which mutations accumulate, and build into these models structure that reflects our biological knowledge. Relevant knowledge might include how DNA is organised into genes, chromosomes and coding units. We hope to recover information about what locations in the genome may be important for the success or failure of a tumour, or for causing other mutations. This helps biologists refine their experiments to understand exactly what is happening no matter how clever our method, they get the final say.

Next we need to understand how mutations interact with the busy world of a tumour. There is a decades-old debate in biology about how DNA's role is best interpreted. Some people think of it as an ingredients list for consulting whenever a specific item is needed, others as more like the recipe itself, a set of instructions for running a cell. In cancer, we might ask to what extent the properties of a tumour can be predicted just from its mutations. There is a practical motivation for this, namely that sometimes that's all the information we have. An emerging technology for sampling tumour DNA is liquid biopsy, where DNA is extracted from blood samples. This is in contrast to solid biopsy, where a tumour is surgical removed. Solid biopsy gives us access to more information, but is very invasive and sometimes impossible. In my (biological) work I try to understand the relationship between mutations and the other processes in the tumour environment. In particular I care about two types of molecules, RNAs and neoantigens, which can be directly measured by solid (but not liquid) biopsy. These molecules are imprortant in determining how well a tumour will respond to a specific set of drugs called immunotherapies. If we can predict how they behave while only being able to see the mutations in a tumour, then we only need to use liquid biopsy when assessing patients for immunotherapy.

In essence, my job is to balance the power of using learning techniques that are as broad as possible against the advantages of utilising the wealth of biological knowledge available. Too many biological constraints and we lose the power of statistics to make new discoveries. Too little biological guidance and we're lost in a sea of high-dimensional data. If you know how to strike this balance, please let me know...