Long-read sequencing in short tandem repeats
Developing a targeted sequencing-based pipeline that profiles microsatellite instability at single-nucleotide resolution.
Project Overview
MSIanalyzer is a targeted Nanopore sequencing pipeline for profiling microsatellite instability (MSI) at single-nucleotide resolution. By focusing on five Bethesda panel loci, our approach reveals the complexity of repeat diversity in cancer and normal cells.
Objectives
- Develop a high-resolution method to quantify MSI using long-read sequencing.
- Enable robust, read-level comparison of MSI profiles across samples.
- Expore MSI profiles and allelic diversity in MMR-deficient cancer cells.
Methodology
- Sample Design: Targeted amplicons from five microsatellite loci were sequenced in:
- MSI-high cell lines: HCT15, HCT116
- MSI-stable cell lines: TK6, U2OS
- Peripheral blood mononuclear cells (PBMCs) from two healthy donors
-
Sequencing Platform: Oxford Nanopore Technology (ONT) was used for long-read sequencing.
- Repeat Calling Algorithm:
- Custom anchor-extension method to identify repeat motifs, allowing for interrupted patterns.
- Thresholding informed by ONT-specific error profiles.
- Statistical Analysis:
- Applied cluster-aware Dirichlet-multinomial and beta-binomial models.
- Accounted for read-level clustering within samples to support valid between-sample comparisons.
Tools
- Python package (
MSIanalyzer
): See GitHub repository here - Google Colab notebook demo is available here
Manuscript Now Available on bioRxiv!
- Title: MSIanalyzer: Targeted Nanopore Sequencing Enables Single Nucleotide Resolution Analysis of Microsatellite Instability Diversity
- DOI: 10.1101/2025.06.26.661510
- Online Date: June 28, 2025
Key Findings
- Identified distinct MSI profiles in HCT15 and HCT116 compared to other cell types.
- Detected allelic diversity across MSI loci and sample types.
- Demonstrated the value of preserving read-level resolution for detailed repeat analysis.