The unreasonable effectiveness of the Ewens Sampling Formula
The Ewens Sampling Formula (ESF) arose initially in population genetics, where it gave the distribution of the number of alleles represented once, twice, … in a sample of genes from a large population. Since then, it has appeared in many different areas of mathematics, statistics and computer science. In this talk I will focus on one aspect of this enormous literature, motivated by the following problem: Starting with n cooked spaghetti strands, tie randomly chosen ends together to produce a collection of spaghetti loops. What is the expected number of loops? What can be said about the distribution of the number of loops of length 1, 2, …? What is the behaviour of the longest loops when n is large? What is the probability that all the loops have different lengths?
I will describe a number of related examples, including prime factorisation, random mappings and random permutations, illustrating the central role played by the ESF. I will also discuss methods for simulating decomposable combinatorial structures by exploiting another wonder of the ESF world, namely the Feller Coupling. Analysis of a children’s playground game shows that apparently small departures from the Feller model can open up a number of unsolved problems.