Esteban Tabak (New York University)

Filtering confounding factors from data through optimal transport
Wednesday 14 October 2015 at 12.10, JCMB 5215


Real data is typically multicausal. Clinical and genomic medical data, for instance, depends not only on an underlying illness that one may seek to diagnose, but also on factors as diverse as the age, sex and ethnicity of the patient and the lab where the tests were performed.

This talk will present a methodology for filtering from datasets the effects of external factors using the mathematical theory of optimal transport. The procedure eliminates the variability associated with each factor by mapping the probability distributions conditioned to each value of the factor into a unified target distribution, while minimizing alterations to all other variability. This permits cleaning the data of confounding effects, as well as amalgamating datasets from different sources.

Required extensions to optimal transport theory include making it data-driven (i.e. with all information available in terms of samples instead of explicit probability distributions), mapping more than one distribution - up to a continuum - into a common, unknown target, and combining maps arising from the various factors into individual maps per sample.

Seminars by year

Current 2016 2015 2014 2013 2012 2011 2010 2009 2008 2007 2006 2005 2004 2003 2002 2001 2000 1999 1998 1997 1996