WASP2 (Currently in pre-development): Allele-specific pipeline for unbiased read mapping(WIP), QTL discovery(WIP), and allelic-imbalance analysis



  • Python >= 3.7
  • numpy
  • pandas
  • scipy
  • pysam
  • pybedtools



Recommended installation through conda, and given environment

conda env create -f environment.yml


Allelic Imbalance Analysis

Analysis pipeline currently consists of two tools (Count and Analysis)


Count Tool

Counts alleles in ATAC peaks that overlap heterozygous SNP’s


python run_analysis.py count -a [BAM] -g [VCF] -s [VCF Sample] -r [Peaks] {OPTIONS}

Required Arguments

  • -a/–alignment: BAM file containing alignments.
  • -g/–genotypes: VCF file with genotypes.
  • -s/–sample: Sample name in VCF file.
  • -r/–regions: Regions of interest in narrowPeak, GTF, or BED format. (ONLY narrowPeak support implemented)

Single-Cell Additional Requirements

  • -sc/–singlecell: Flag that denotes data is single-cell.
  • -b/–barcodes: 2 Column TSV that contains barcodes and their group/cell mapping.

Optional Arguments

  • -o/–output: Directory to output counts. (Default. CWD)
  • –nofilt: Skip step that pre-filters reads that overlap regions of interest
  • –keeptemps: Keep intermediary files during preprocessing step, outputs to directory if given with flag, otherwise outputs to CWD.


Analysis Tool

Analyzes Allelic Imbalance per ATAC peak given allelic count data


python run_analysis.py analysis [COUNTS] {OPTIONS}

Required Arguments

  • COUNTS: first positional argument, output data from count tool

Single-Cell Additional Requirements

  • -sc/–singlecell: Flag that denotes data is single-cell

Optional Arguments

  • –min: Minimum allele count needed for analysis. (Default. 10)
  • -o/–output: Directory to output counts. Defaults to CWD if not given. (Default. CWD)
  • -m/–model: Model used for measuring imbalance. Choice of “single”, “linear”, or “binomial”. (Default. “single”)



  • Unbiased Read Mapping Curently in development

Allelic Imbalance Pipeline

  • Counts

    • Need to implement RNA-Seq and Gene support
    • More robust for different inputs for bulk and single-cell data
  • Analysis

    • More specific implementations for single-cell data


View Github