Bayesian Deconvolution in Alzheimer's Disease Brain Tissues

Project Overview

This project was completed as part of an advanced coursework on Bayesian modeling and computational biology. I led the conceptual development and applied a Bayesian deconvolution method to real-world bulk RNA-seq data from Alzheimer’s disease (AD) brain tissues, with the goal of uncovering cell-type-specific contributions to gene expression patterns and disease risk.

Objectives

Develop a cell-type deconvolution pipeline tailored to AD brain tissues using BayesPrism, a Bayesian method.
Construct cell-state-informed references from snRNA-seq data and assess performance across different resolutions.
Apply the model to real-world bulk RNA-seq data to understand relationships between cell-type composition, AD gene expression, and genetic risk (e.g., APOE ε4).

Methods

Reference Construction: Derived multi-resolution cell-state references (3–682 states) from ssREAD single-nucleus RNA-seq data of the human prefrontal cortex.
Bayesian Inference: Used Gibbs sampling for posterior inference with Dirichlet priors on cell-state proportions and latent gene contributions.
Model Diagnostics: Assessed convergence using trace plots, autocorrelation, 𝑅̂ values, and effective sample size (ESS).
Benchmarking: Validated the model using pseudo-bulk data and compared performance against Bisque and DSSC using RMSE and correlation metrics.

Real-World Application

Applied the method to bulk RNA-seq data from post-mortem AD brain tissues (GSE174367). Key findings included:

Improved prediction of AD risk after adjusting for deconvoluted cell-type proportions (AUC increased from 0.641 to 0.705).
A 14.9% increase in the APOE ε4 odds ratio after cell-type adjustment, suggesting potential confounding by cellular composition.
Strong correlations between estimated proportions of astrocytes and neurons with AD expression signatures (e.g., r = 0.82 for excitatory neurons).

Code Repository

The project code is available on GitHub: xc448/BST228_final_proj

Highlights

Robust Bayesian inference using MCMC yielded stable and interpretable estimates.
Biological insight: Cell-type-specific patterns aligned with known AD pathophysiology.
Translational relevance: Adjusting for cellular composition improved genetic risk prediction for AD.

Acknowledgments

This project was completed as part of a team-based graduate course project. I contributed to the conceptual design, literature review, real-world data curation, adaptation and evaluation of the Bayesian deconvolution model, biological interpretation, visualization, manuscript writing, and presentation.

Collaborators contributed critically to upstream components, including reference construction, data preprocessing, cell-state identification methods, custom Gibbs sampler implementation, benchmarking with existing tools, and refinement of both the manuscript and presentation. This project would not have been possible without a fully collaborative effort, and I am grateful for the shared intellectual and technical contributions.