Secondary Analysis Algorithms Provided by Pacific Biosciences
Following are descriptions of the secondary analysis algorithms provided
by Pacific Biosciences.
AHA
- AHA ("A Hybrid Assembler")
is the Pacific Biosciences hybrid assembly algorithm. It is based
on the open source assembly software package AMOS, with additional
software components tailored to Pacific Biosciences' long reads and
error profile.
AmpliconAssembly
Calls
phased consensus sequences from pooled amplicon sequence data. Up
to 20 distinct amplicons can pooled.
Reads
are clustered into high-level groups, then each group is phased and
consensus is called using the Quiver algorithm.
If
the sample is barcoded, separate calls are made for each barcode.
Base Modification
Detection with Motif Finding
- Identifies putative sites of base modification
as well as common bacterial base modifications (6-mA, 4-mC, and optionally
TET-converted 5-mC), and then analyzes the methyltransferase recognition
motifs.
- Detection can use either a control
sample or an in silico control consisting of expected kinetic signals.
BLASR
- Maps reads to genomes by finding the
highest scoring local alignment or set of local alignments between
the read and the genome. The initial set of candidate alignments is
found by querying a rapidly searched pre-computed index of the reference
genome, and then refining until only high scoring alignments are retained.
The base assignment in alignments is optimized and scored using all
available quality information, such as insertion and deletion quality
values.
- Because alignment approximates an exhaustive
search, alignment significance is computed by comparing optimal alignment
score to the distribution of all other significant alignment scores.
BridgeMapper
Reports
when the unmapped portions of reads, above a threshold, have significant
mapping to a specified reference.
Visualizes
split alignments of Pacific Biosciences subreads by displaying reads
with portions mapped to separate locations.
Genomic
Consensus (Quiver)
- Identifies haploid SNPs and single-base
indels by comparing a multiple sequence alignment of mapped reads
against a reference sequence.
- Variant calls are made using a simple
plurality algorithm.
GMAP
- Third-party application that maps Pacific
Biosciences reads onto a reference as if they were cDNA, allowing
for large insertions corresponding to putative introns.
HGAP (Hierarchical Genome Assembly Process)
- Performs high quality de
novo assembly using a single PacBio library preparation.
- HGAP consists of pre-assembly, de novo assembly with Celera® Assembler,
and assembly polishing with Quiver.
PacBioToCA/CeleraAssembler
- The Celera® Assembler’s error correction
and assembly programs.
- For full documentation of pacBioToCA
and the Celera® Assembler, click here.
ReadsofInserts
- Computes single-molecule consensus
including Reads of Insert and Circular Consensus Sequences (CCS).
- Provides DNA barcode analysis on these
reads when samples have been multiplexed.