CRISPR-detector
Here we propose our CRISPR-detector to facilitate the CRISPR-edited amplicon and whole genome sequencing data analysis, with functions that existing tools are not able to provide.
CRISPR-detector brings the following four key innovations :
- optimized processing time allowing for hundreds of amplicons or whole genome sequencing data;
- integrated structural variation calling;
- edited and control sample co-analysis, to remove background variants not induced by gene-editing;
- functional and clinical consequences annotation of editing-induced mutations.
System requirements
Sentieon module
Download sentieon toolkit from
https://s3.amazonaws.com/sentieon-release/software/sentieon-genomics-202010.03.tar.gz
You may request a license by sending emails to [email protected]
export SENTIEON_LICENSE=PATH_TO_SENTIEON/sentieon-genomics-202010.03/localhost_eval.lic
export PATH=PATH_TO_SENTIEON/sentieon-genomics-202010.03/bin:$PATH
Python packages
pip install biopython
pip install pyfaidx
pip install -U textwrap3
conda install blast
conda install samtools
ANNOVAR
Download ANNOVAR from
https://www.openbioinformatics.org/annovar/annovar_download_form.php
perl annotate_variation.pl -downdb -webfrom annovar avdblist humandb/ -buildver hg38
perl annotate_variation.pl -buildver hg38 -downdb -webfrom annovar refGene humandb/
perl annotate_variation.pl -buildver hg38 -downdb -webfrom annovar clinvar_20210501 humandb/
export PATH=PATH_TO_ANNOVAR/annovar:$PATH
Organism Homo sapiens Experiment type sequencing data support variant annotations from refGene & ClinVar, other species may only support refGene annotations
Usage
1. Single amplicon & pooled amplicons sequencing data analysis
python CRISPRdetectorCORE.py
–sample, sample name & output dir
–e1, treated group fq1 path, required = True
–e2, treated group fq2 path, required = False
–c1, control group fq2 path, required = False
–c2, control group fq2 path, required = False
–ref_fasta, single or pooled amplicon(s) sequence(s) path in fasta format, required=True
–threads, number of threads to run sentieon minimap2 & driver module, default=1
–anno, annotate variants with ANNOVAR [1] or not run ANNOVAR [2], required=False
–assembly, path to assembly in fasta format : hg38.fa mm9.fa … required=False
–species, species : Homo_sapiens,Mus_musculus… required=False
–db, ANNOVAR database path, required=False
2. Whole genome sequencing (WGS) data analysis
python CRISPRdetectorWGS.py
–e1, treated group fq1 path, required = True
–e2, treated group fq2 path, required = False
–c1, control group fq2 path, required = False
–c2, control group fq2 path, required = False
–sample, sample name & output dir
–threads, number of threads to run sentieon minimap2 & driver module, default=1
–bed, bed format file input to call variants of interest region, required=False
–anno, annotate variants with ANNOVAR [1] or not run ANNOVAR [2], required=False
–assembly, path to assembly in fasta format : hg38.fa mm9.fa … required=False
–species, species : Homo_sapiens,Mus_musculus… required=False
–db, ANNOVAR database path, required=False