Analogy to summary statistic imputation

Integrative Statistical Analysis of -omics and GWAS data

Analogy to summary statistic imputation

Integrative Statistical Analysis of -omics and GWAS data

I finished my PhD under the main supervision of Zoltán Kutalik in September 2018.

You can download my thesis or check out the slides from my public defense.

Abstract

Increasing our knowledge about biology in humans is essential for advances in medicine, such as early-stage diagnoses of diseases, drug development, public health strategies, and precision medicine. One approach to tackle this task is, to collect data on different components of a biological mechanism of interest, link these parts and try to construct an underlying model that helps us to explain the disease. To collect data, DNA is measured and the status of a disease is recorded for each individual in a dedicated group of people. In a first step, an analyst compares for each genetic variant across the whole genome the genetic mutations between people with the disease and healthy people; this is called a genome-wide association study (GWAS). Such first association screens rarely point right away to the true causal variants, but combined with additional biomedical (-omics) data and additional statistical methods it is possible to narrow down the true cause and gain insight into the biology of a disease. For example, by using GWAS results for two diseases (e.g. cardiovascular disease and obesity) and a statistical method called Mendelian randomisation, we are able to examine the causal effect of obesity on cardiovascular disease, or vice versa. These statistical follow-up investigations often require GWAS results for genetic variants than were unmeasured. During my PhD, I investigated a method called summary statistic imputation that precisely aims to solve the problem of inferring GWAS results for unmeasured genetic variants. Summary statistic imputation uses GWAS results and data from public sequencing data. My main findings were that imputation accuracy varies depending on certain characteristics of a genetic variant (e.g. low accuracy for rare mutations), as well as the size of publicly available sequencing data (low accuracy for small sized sequencing data). A further finding is, that summary statistic imputation can compete with imputation techniques that are based on individual-level data for certain subgroups of genetic variants (e.g. common variants).

With the help of summary statistic imputation researchers can facilitate follow-up investigations and thus gain more insight into the biology of diseases.

Avatar
Sina Rüeger
(Genomic) Data Scientist