Our Discoveries

We are a group of highly motivated and creative people, discovering and innovating together to make precision medicine a reality through the catalytic power of big genomic data, statistical genetics, and artificial intelligence.


Disease risk prediction by polygenic risk scores

Genome-wide polygenic scores – which integrate information from many common DNA variants into a single continuous measure of inherited risk – have shown striking ability to stratify risk for incident and prevalent diseases in recent studies, mostly in European ancestry. Whether this concept can be effectively extended to other ethnic populations, e.g., South Asians, has remained uncertain. We derived and tested a genome-wide polygenic score tuned specifically to individuals of South Asian ancestry. Our results confirm a robust association of a new genome-wide polygenic score with the risk of coronary artery disease, with strikingly consistent results across South Asians living in the United Kingdom, Bangladesh, and India (Wang et al. 2020, J Am Coll Cardiol. ). In other studies, our results show that polygenic scores can not only be used to predict the second event among people already with a high risk of coronary artery disease (Emdin*, Bhatnagar*, Wang* et al. 2020, Circ Genomic Precis Med.), but also can be used to predict the genetic level of disease biomarkers and provide help in enriching effective samples for clinical trials (Dron*, Wang* et al. 2021, Circ Genomic Precis Med.).


Genetic modifier discovery for common and rare diseases

Not all people carrying disease-causing mutations or high-risk genetic variants develop the disease. Thus, identifying the genetic and non-genetic modifiers is important to understand the disease mechanisms and build useful and accurate statistical models for disease risk prediction and disease prevention. By genetic admixture-mapping and analyzing gene expression data, we identified the UBD gene as a genetic modifier of APOL1 related kidney disease (two high-risk APOL1 disease risk alleles increased the risk about ten times in a recessive form). Following that, cell-based experiments showed that the UBD and APOL1 protein has physical interaction and that UBD expression mitigates APOL1-mediated cell death (Zhang*, Wang* et al. 2018, PNAS). On the other hand, illustrated by three diseases (coronary artery disease, breast cancer, and colorectal cancer) and by modeling the rare monogenic disease-causing mutations in the polygenic risk background, we show that the polygenic risk has a significant impact on the disease penetrance in individuals carrying the monogenic mutations, in some cases bringing risk closer to the population average(Fahed*, Wang*, homburger*, et al. 2020, Nat. Commun). More importantly, this model is generally applicable to many other conditions. The integration of the risk conferred by monogenic mutations with polygenic backgrounds has the opportunity to provide more accurate risk prediction and enable precise patient stratification and management.


Genetic diagnosis, disease risk gene discovery, and drug target discovery and optimization

We studied whole-exome sequencing data of more than 400 focal segmental glomerulosclerosis (FSGS, a rare form of kidney disease) families and 600 ancestry-matched controls. From this study, we 1) dissected the genetic architecture of FSGS from the common and rare variant perspective, 2) cataloged the rare variants in genes known to cause FSGS disease to estimate the rate of genetic diagnosis, 3) analyzed the disease-causing mutation pattern of protein domain enrichment for advancing the understanding of disease mechanisms, 4) developed new statistical models to discover potential new genes that increase the disease risk of FSGS (Wang et al. 2019, JASN ). Meanwhile, we also created a novel visualization method for presenting large family pedigrees in a small printable area (Chun*, Wang* et al. 2020, KI Reports). On the drug optimization arm, we analyzed the genomic sequencing data of about 50,000 people from the UK Biobank, distinguished the side effects of Volanesorsen (the only FDA-approved treatment for familial chylomicronemia syndrome) is because of the drug-specific effects, not because of the impact of the target gene (Khetarpal*, Wang* et al. 2019, N. Engl. J. Med). This clarification of the cause of the drug’s side effects is crucial for drug optimization as it paved the theoretical pathway for drug optimization. I am also analyzing the whole-genome/exome sequencing data of about 258,000 participants (41,000 coronary artery disease cases and 217,000 controls) to discover novel coronary artery disease risk genes using rare variant burden tests (results in submission).


Method development and statistical modeling

Identifying genes that have been positively selected in human evolution can not only help us understand the adaptive evolution of human beings, but also help understand the risk mechanism of diseases. Although thanks to the rapid development of next-generation gene sequencing and genotyping technology, a large amount of genetic data has been rapidly generated in recent years. However, genetic testing methods for accurately and efficiently analyzing and identifying genes subject to positive selection from these data are still very limited. Furthermore, it is even more challenging to pinpoint the driver mutation that was selected. We, therefore, designed a fast and accurate method for detecting positive selection genes and locating positive selection driving mutations. This algorithm is comparable in statistical power to the most effective and most commonly used algorithm(iHS) at that time, but the calculation speed is more than 10,000 times faster. More importantly, the accuracy of the new algorithm for locating positive selection effect sites is significantly better than that of the iHS algorithm. In addition, when the selection effect is relatively weak, the positioning ability of the new algorithm far exceeds other traditional algorithms (Wang et al. 2014, Mol Biol Evol, and He, Wang et al. 2015, Genome Research).