Network-guided high-dimensional feature selection in precision medicine

Differences in disease predisposition or response to treatment can be explained in great part by genomic differences between individuals. This realization has given birth to precision medicine, where treatment is tailored to the genome of patients. This field depends on collecting considerable amounts of molecular data for large numbers of individuals, which is being enabled by thriving developments in genome sequencing and other high-throughput experimental technologies.

Unfortunately, we still lack effective methods to reliably detect, from this data, which of the genomic features determine a phenotype such as disease predisposition or response to treatment. One of the major issues is that the number of features that can be measured is large (easily reaching tens of millions) with respect to the number of samples for which they can be collected (more usually of the order of hundreds or thousands), posing both computational and statistical difficulties.

In my talk I will discuss several ways to address this problem, from reducing the dimensionality of the feature space by imposing on it a structure derived from prior knowledge, to increasing the number of samples via multi-task approaches.