RS_PreAssembler Protocol

Use this protocol to build a set of highly accurate long reads for use in de novo assembly.

Takes each read exceeding a minimum length, aligns all reads against it, trims the edges, and then takes the consensus.
Uses the Hierarchical Genome Assembly Process (HGAP). HGAP includes pre-assembly, de novo assembly with Celera® Assembler, and assembly polishing with Quiver.

Filtering Parameters (PreAssembler Filter v1)

Minimum Subread Length: Subreads shorter than this value (in base pairs) are filtered out and excluded from analysis.
Minimum Polymerase Read Quality: Polymerase reads with lower quality than this value are filtered out and excluded from analysis.
Minimum Polymerase Read Length: Polymerase reads shorter than this value (in base pairs) are filtered out and excluded from analysis.

Assembly Parameters (PreAssembler v2)

Compute Minimum Seed Read Length: Specify whether or not to compute the minimum seed read length that results in at least 30X target genome coverage, by the longest subreads. This is based on the genome size you specified.
Minimum Seed Read Length: The minimum length of reads (in base pairs) to use as seeds for pre-assembly.
Number of Seed Read Chunks: The number of pieces to split the data files into while running PreAssembler.
Alignment Candidates Per Chunk: The number of alignments to consider for each read for a particular chunk.
Total Alignment Candidates: The number of potential alignments BLASR should consider across all chunks for a particular read.
BLASR Options (Advanced): -bestn and -nCandidates values should be roughly equal to the expected seed read coverage.
Genome Size: The expected genome size after assembly.

SMRT® Portal Help