### Esteban Tabak (New York University)

#### Filtering confounding factors from data through optimal transport

*Wednesday 14 October 2015 at 12.10, JCMB 5215*

##### Abstract

Real data is typically multicausal. Clinical and genomic medical data, for
instance, depends not only on an underlying illness that one may seek to
diagnose, but also on factors as diverse as the age, sex and ethnicity of the
patient and the lab where the tests were performed.

This talk will present a methodology for filtering from datasets the effects
of external factors using the mathematical theory of optimal transport. The
procedure eliminates the variability associated with each factor by mapping
the probability distributions conditioned to each value of the factor into a
unified target distribution, while minimizing alterations to all other
variability. This permits cleaning the data of confounding effects, as well
as amalgamating datasets from different sources.

Required extensions to optimal transport theory include making it data-driven
(i.e. with all information available in terms of samples instead of explicit
probability distributions), mapping more than one distribution - up to a
continuum - into a common, unknown target, and combining maps arising from
the various factors into individual maps per sample.

### Seminars by year

*Current*
*2016*
*2015*
*2014*
*2013*
*2012*
*2011*
*2010*
*2009*
*2008*
*2007*
*2006*
*2005*
*2004*
*2003*
*2002*
*2001*
*2000*
*1999*
*1998*
*1997*
*1996*