Long-read sequencing in short tandem repeats

Developing a targeted sequencing-based pipeline that profiles microsatellite instability at single-nucleotide resolution.

Project Overview

MSIanalyzer is a targeted Nanopore sequencing pipeline for profiling microsatellite instability (MSI) at single-nucleotide resolution. By focusing on five Bethesda panel loci, our approach reveals the complexity of repeat diversity in cancer and normal cells.

Objectives

  • Develop a high-resolution method to quantify MSI using long-read sequencing.
  • Enable robust, read-level comparison of MSI profiles across samples.
  • Expore MSI profiles and allelic diversity in MMR-deficient cancer cells.

Methodology

  • Sample Design: Targeted amplicons from five microsatellite loci were sequenced in:
    • MSI-high cell lines: HCT15, HCT116
    • MSI-stable cell lines: TK6, U2OS
    • Peripheral blood mononuclear cells (PBMCs) from two healthy donors
  • Sequencing Platform: Oxford Nanopore Technology (ONT) was used for long-read sequencing.

  • Repeat Calling Algorithm:
    • Custom anchor-extension method to identify repeat motifs, allowing for interrupted patterns.
    • Thresholding informed by ONT-specific error profiles.
  • Statistical Analysis:
    • Applied cluster-aware Dirichlet-multinomial and beta-binomial models.
    • Accounted for read-level clustering within samples to support valid between-sample comparisons.

Tools

  • Python package (MSIanalyzer): See GitHub repository here
  • Google Colab notebook demo is available here

Manuscript Now Available on bioRxiv!

  • Title: MSIanalyzer: Targeted Nanopore Sequencing Enables Single Nucleotide Resolution Analysis of Microsatellite Instability Diversity
  • DOI: 10.1101/2025.06.26.661510
  • Online Date: June 28, 2025

Key Findings

  • Identified distinct MSI profiles in HCT15 and HCT116 compared to other cell types.
  • Detected allelic diversity across MSI loci and sample types.
  • Demonstrated the value of preserving read-level resolution for detailed repeat analysis.