RS_Minor_Variant Protocol (Beta)
Use this protocol to call minor variants in a heterogeneous data set
against a user-provided reference sequence.
The
protocol includes two main steps:
Realignment/Data
Acquisition: Unambiguously re-map query sequences to the reference
in an unbiased manner, avoiding the reference bias problem of straight
alignment.
Variant
Detection: Call variants above sequencing noise using the LoFreq algorithm.
Data Prep
Parameters (Reads Of Insert)
- Minimum Full Passes: The
minimum number of full-length passes over the insert DNA for the read
to be emitted.
- Minimum Predicted Accuracy:
The minimum predicted accuracy (in %) of the reads of insert emitted.
- Minimum Length of Reads of Insert
(In Bases): Reads of insert shorter
than this value (in base pairs) are filtered out and excluded from
analysis.
- Maximum Length of Reads of Insert
(In Bases): Reads of insert longer
than this value (in base pairs) are filtered out and excluded
from analysis.
Mapping
Parameters (BLASR RoI v1)
- Maximum Divergence (%):
The maximum allowed divergence of a read from the reference sequence.
- Minimum Anchor Size: The
minimum size of the read (in base pairs) that must match against the
reference sequence.
- Write Output as a BAM File:
Specify whether or not to output a BAM representation of the cmp.h5
file.
- Write BED Coverage File:
Specify whether or not to output a BED representation of the depth
of coverage summary.
- Place Repeats Randomly:
Specify that if BLASR maps a read to more than one location with equal
probability, then it randomly selects which location it chooses as
the best location. If not
set, BLASR defaults to the first on the list of matches.
Variants Parameters (MinorVariants v1)
- Minimum Subread Length:
Subreads shorter than this
value (in base pairs) are filtered out and excluded from analysis.
- Minimum Site Coverage:
The minimum required site coverage to score variants at that site.
- Minimum Subread Alignment Accuracy:
The minimum alignment accuracy of a subread.
- Maximum P-value: The maximum
probability of obtaining a test statistic at least as extreme as was
observed for this variant; a threshold for reporting variants. Specifies
whether something is sufficiently different from our null model to
warrant further investigation. Lower p-values are more
significant; variants with large p-values are probably a result
of machine error.
- Frequency Confidence Interval:
The confidence interval to report about the frequency. (Each variant
occurs at a given frequency within a sample. Example:
For a value of 0.95, we have a roughly 95% confidence that the frequency
is between the reported interval.)
- Call Amino Acid Variants:
Specify whether or not to report the name of the variant in amino-acid
space. (Example: E662K for
a reference glutamic acid (E) at position 662 mutated into a lysine
(K)). This can simplify analysis.
Note: You must
provide an in-frame reference translatable end-to-end with no more
than a terminal stop codon.