Κ


inGAP-sv: structural variation detection and visualization
Ji Qi (qij@fudan.edu.cn) and Fangqing Zhao (zhfq@mail.biols.ac.cn)
12/16/2010
Κ
We developed an integrative next-generation genome analysis pipeline (inGAP), which employed a Bayesian principle to detect single nucleotide polymorphisms (SNPs), small insertion/deletions (indels). inGAP has been applied to a number of genome projects, including bacteria, yeast, plants and mammals. Here we extend this pipeline to identify and visualize large-size structural variations, including insertions, deletions, inversions and translocations.
Κ
1.ΚΚΚ What inGAP-sv can do?
ΚΚΚΚΚΚ Refine short read alignment by re-aligning short reads around a putative SV.
ΚΚΚΚΚΚ Detect large-size structural variations using paired end sequencing reads.
ΚΚΚΚΚΚ Visualize SAM-formatted alignments and SVs.
Κ
2.ΚΚΚ How does inGAP-sv identify SVs?
ΚΚΚΚΚΚ Classify mapped paired-end reads into normal/anomalous mapping types.
ΚΚΚΚΚΚ Detect gapped regions which cannot be covered by normally mapped paired reads.
ΚΚΚΚΚΚ Detect SVs based on various anomalous mapping combinations. Read qualities, mapping qualities, and ratio of paired-end reads relative to average mapping densities will be considered for the calculation of SV quality.
Κ
3.ΚΚΚ How to get started?
ΚΚΚΚΚΚ inGAP-sv requires two files, a FASTA formatted reference sequence and a SAM alignment
ΚΚΚΚΚΚ A PTT formatted annotation file for the reference sequence is optional.
ΚΚΚΚΚΚ A demo application is preloaded in inGAP-sv.
Κ
4.ΚΚΚ Whats the difference between inGAP-sv and other SV tools?
ΚΚΚΚΚΚ Most of the current SV tools can only detect very short indels (e.g. 1-10bp); inGAP-sv and a few others (e.g. breakdancer) work well with large SVs (>100bp)
ΚΚΚΚΚΚ inGAP-sv is a one-stop SV detector. Users can identify, visualize, annotate and manually edit SVs using inGAP-sv.
ΚΚΚΚΚΚ Compared with other command-line based SV tools, visualization of paired reads in inGAP-sv can significantly reduce the false discovery rate.
Κ
5.ΚΚΚ Whats the performance of inGAP-sv?
ΚΚΚΚΚΚ We firstly tested inGAP-sv using simulated data with large SVs (100-1000bp) from the Yoruban genome (NA18507). inGAP-sv could successfully identify 75%-90% of large indels and >85% of inversions with high accuracy rate. Detailed evaluation is in progress.
ΚΚΚΚΚΚ We also applied inGAP-sv to an Arabidopsis thaliana genome re-sequencing project. inGAP-sv have identified 815 insertions and 1000 deletions. We compared these indels to the Monsanto A. thaliana assembly, and found that 78% of the deletions could be covered by Monsanto contigs and 99% of them were correct. 71% of insertions could be covered by Monsanto contigs and 96% of them were correct.
ΚΚΚΚΚΚ inGAP-sv supports parallel computing.
Κ
6.ΚΚΚ How to access inGAP-sv?
ΚΚΚΚΚΚ Users can download the latest version of inGAP from http://sourceforge.net/projects/ingap/. We provide binaries for Windows, Linux, MacOS/X.
ΚΚΚΚΚΚ A quick manual is available at http://schuster-33.bx.psu.edu/shared/manual.pdf .
Κ
7.ΚΚΚ Screenshots
ΚΚΚΚΚΚ Main functions and work flow of inGAP-sv
ΚΚΚΚΚΚ Deletions detected by inGAP-sv
ΚΚΚΚΚΚ Insertions detected by inGAP-sv
ΚΚΚΚΚΚ Inversions detected by inGAP-sv
ΚΚΚΚΚΚ Schematic view of SVs