RS_PreAssembler Protocol
Use this protocol to build a set of highly accurate long reads for use
in de novo assembly.
- Takes each read exceeding a minimum length, aligns all reads
against it, trims the edges, and then takes the consensus.
- Uses the Hierarchical Genome Assembly Process (HGAP). HGAP
includes pre-assembly, de novo assembly with Celera® Assembler, and
assembly polishing with Quiver.
Filtering
Parameters (PreAssembler Filter v1)
- Minimum Subread Length:
Subreads shorter than this
value (in base pairs) are filtered out and excluded from analysis.
- Minimum Polymerase Read Quality:
Polymerase reads with lower quality
than this value are filtered out and excluded from analysis.
- Minimum Polymerase Read Length:
Polymerase reads shorter than
this value (in base pairs) are filtered out and excluded from analysis.
Assembly Parameters (PreAssembler v2)
- Compute Minimum Seed Read Length:
Specify whether or not to compute the minimum
seed read length that results in at least 30X target genome coverage,
by the longest subreads. This is based on the genome size you specified.
- Minimum Seed Read Length:
The minimum length of reads (in base pairs) to use as seeds for pre-assembly.
- Number of Seed Read Chunks: The
number of pieces to split the data files into while running PreAssembler.
- Alignment Candidates Per Chunk:
The number of alignments to consider for each
read for a particular chunk.
- Total Alignment Candidates:
The number of potential alignments BLASR should consider across all chunks for a particular read.
- BLASR Options (Advanced):
-bestn
and -nCandidates
values should be roughly equal to the expected seed read coverage.
- Genome Size: The expected
genome size after assembly.