Report Terminology
Following are definitions of the terms displayed in the reports generated
by SMRT Portal.
General
- Filtering Report
- Polymerase Read Bases:
The number of bases in the polymerase read.
- Polymerase Reads: The number
of polymerases generating high quality reads. Polymerase reads are
trimmed to the high quality region and include bases from adaptors,
as well as potentially multiple passes around a circular template.
- Polymerase Read N50: 50%
of all polymerase reads are longer than this value.
- Polymerase Read Length:
The mean trimmed read length of all polymerase reads. The value includes
bases from adaptors as well as multiple passes around a circular template.
- Polymerase Read Quality:
The mean single-pass read quality of all polymerase reads.
- Post-Filter Polymerase Read Bases:
The number of bases in the polymerase reads after filtering, including
adaptors.
- Post-Filter Polymerase Reads:
The number of polymerases generating trimmed reads after filtering.
Polymerase reads include bases from adaptors and multiple passes around
a circular template.
- Post-Filter Polymerase Read Length:
The mean trimmed read length of all polymerase reads after filtering.
The value includes bases from adaptors as well as multiple passes
around a circular template.
- Post-Filter Polymerase Read Quality:
The mean single-pass read quality of all polymerase reads after filtering.
General
- Subread Filtering Report
- Mean Subread length: The
mean length of the subreads that passed filtering.
- Total Number of Bases:
The total number of bases in the subreads that passed filtering.
- N50: 50% of all bases come
from subreads longer than this value.
- Number of Reads: The total
number of reads that passed filtering.
General -
Barcoding Report
- Barcode Name: The name
of the barcode used.
- Reads: The number of reads
associated with the barcode.
- Bases: The number of bases
associated with the barcode.
General
- Reads of Insert Report
- Movie: The name of the
movie file containing the Reads of Insert.
- Reads Of Insert: The number
of Reads of Insert.
- Read Bases Of Insert: The
total number of bases.
- Mean Read Length Of Insert:
The mean read length of the Reads of Insert.
- Read Accuracy Of Insert:
The mean accuracy of the Reads of Insert.
- Mean Read Quality Of Insert:
The mean read quality of the Reads of Insert.
- Mean Number Of Passes:
The mean number of passes used to generate the Reads of Insert.
Diagnostic
- Adapters Report
- Adapter Dimers (0-10bp):
The % of pre-filter ZMWs which have observed inserts of 0-10bp. These
are likely adapter dimers.
- Short Inserts (11-100bp):
The % of pre-filter ZMWs which have observed inserts of 11-100bp.
These are likely short fragment contamination.
Diagnostic
- Loading Report
- SMRT Cell ID: ID number
of the SMRT Cell(s) used in this run.
- Productive ZMWs: The number
of ZMWs for this SMRT Cell that produced results with Productivity
= 1.
- ZMW Loading For Productivity 0:
The percentage of ZMWs that are empty, with no polymerase.
- ZMW Loading For Productivity 1:
The percentage of ZMWs that are productive and sequencing.
- ZMW Loading For Productivity 2:
The percentage of ZMWs that are not P0 (empty) or P1 (productive).
This may occur for a variety of reasons - the sequence data is not
usable.
Diagnostic
- Spike-In Control Report
- Control Sequence: The name
of the control sequence.
- Fraction Control Reads:
The fraction of post-filter polymerase reads that align to the control
reference.
- Control Polymerase Read Length N50: 50% of all polymerase reads from the control sample are
longer than this value.
- Control Polymerase Read Length Mean: The mean mapped read length of the polymerase reads from
the control sample.
- Number of Control Reads:
The total number of polymerase reads from the control sample that
passed filtering.
- Control Subread Accuracy:
The mean single-pass accuracy of the mapped polymerase reads from
the control sample.
- Control Polymerase Read Length 95%: The 95th percentile of mapped read length of the polymerase
reads from the control sample.
Resequencing
- Mapping Report
- Post-Filter Reads: The
number of reads that passed filtering.
- Mapped Read: The number
of post-filter reads that mapped to the reference sequence.
- Mapped Subreads: The number
of post-filter subreads that mapped to the reference sequence.
- Mapped CCS Reads: The number
of post-filter CCS reads that mapped to the reference sequence.
- Mapped CCS Read Bases:
The number of post-filter CCS read bases that mapped to the reference
sequence. This does not include adapters.
- Mapped CCS Read Length:
The mean read length of post-filter CCS reads that mapped to the reference
sequence.
- Mapped CCS Read Accuracy:
The mean accuracy of post-filter CCS reads that mapped to the reference
sequence.
- Mapped Subread Bases: The
number of post-filter bases from all subreads that mapped to the reference
sequence. This does not include adapters.
- Mapped Subread Accuracy:
The mean accuracy of post-filter subreads that mapped to the reference
sequence.
- Mapped Read Length: The
mean read length of post-filter subreads that mapped to the reference
sequence. This does not include adapters.
- Mapped Read Length of Insert (bp):
The mean read length of all insert sequences, which includes only
mapped sequences. The read length of insert is approximately the longest
subread length per ZMW.
- Mapped Read Length of Insert:
The mean read length of all insert sequences, which includes only
mapped sequences. The read length of insert is approximately the longest
subread length per ZMW.
- Mapped Polymerase Read Length:
The mean read length of post-filter polymerase reads that mapped to
the reference sequence. This includes adapters.
- Mapped Polymerase Read Length 95% (bp): The 95th percentile of read length of
post-filter polymerase reads that mapped to the reference sequence.
- Mapped Polymerase Read Length Max (bp): The maximum read length of post-filter polymerase
reads that mapped to the reference sequence.
- Mapped Polymerase Read Length N50: 50% of the read length of post-filter polymerase reads
that mapped to the reference sequence are longer than this value.
- Mapped Subread Length N50 (bp):
50% of full subreads that mapped to the reference sequence are longer
than this value. Full subreads are subreads flanked by two adapters.
- Mapped Subread Length Mean (bp):
The mean length of full subreads that mapped to the reference sequence.
Full subreads are subreads flanked by two adapters.
- Mapped Subread Length:
The mean length of post-filter subreads that mapped to the reference
sequence.
- Mapped ROI: The total number
of Reads of Insert that mapped to the reference for the job.
- Mapped ROI Bases Mean:
The mean length of the Reads of Insert bases that mapped to the reference.
- Mapped ROI Bases N50: 50%
of the Reads of Insert bases that mapped to the reference sequence
are longer than this value.
- Mapped ROI Total: The total
number of Reads of Insert base pairs that mapped to the reference
for the job.
- Mapped ROI Concordance:
The mean concordance of the mapped Reads of Inserts compared to the
reference sequence.
- Mean Mapped Subread Concordance:
The mean concordance of the subreads that mapped to the reference
sequence.
Resequencing - Coverage
Report
- Mean Coverage: The mean
depth of coverage across the reference sequence.
- Missing Bases (%): The
percentage of the reference sequence that has zero coverage.
- Missing Bases: The percentage
of the reference sequence that has zero coverage.
Analysis
- Variants Report
- Reference: The name of
the reference sequence.
- Reference Length: The length
of the reference sequence.
- Bases Called: The percentage
of reference sequence that has ≥ 1x coverage. % Bases Called + % Missing
Bases should equal 100.
- % Bases Called: The percentage
of reference sequence that has ≥ 1x coverage. % Bases Called + % Missing
Bases should equal 100.
- Consensus Accuracy: The
accuracy of the consensus sequence compared to the reference.
- Base Coverage: The mean
depth of coverage across the reference sequence.
Analysis
- Top Variants Report
- Sequence: The name of the
reference sequence.
- Position: The position
of the variant along the reference sequence.
- Variant: The variant position,
type, and affected nucleotide.
- Type: The variant type:
Insertion, Deletion, or Substitution.
- Coverage: The coverage
at position.
- Confidence: The confidence
of the variant call.
- Genotype: Includes the
full number of chromosomes (diploid) or half the number (haploid).
Assembly
- Pre-Assembly Report
- Seed Bases: The number
of bases from seed reads.
- Pre-Assembled Yield: The
percentage of seed read bases that were successfully aligned to generate
pre-assembled reads.
- Pre-Assembled Reads Length:
The average length of the pre-assembled reads.
- Length Cutoff: Reads with
lengths greater than the length cutoff are used as seed reads for
pre-assembly.
- Pre-Assembled Bases: The
number of bases in the pre-assembled reads.
- Pre-Assembled Reads: The
number of reads output by the pre-assembler. Pre-assembled reads are
very long, highly accurate reads that can be used as input to a de
novo assembler.
- Pre-Assembled N50: 50%
of the pre-assembled reads are longer than this value.
- Draft Contigs: The number
of contigs output by Celera Assembler, which may include singleton
and degenerate contigs. After assembly polishing with Quiver, the
final number of contigs may be smaller.
- Reads Assembled (%): The
fraction of all reads that are assembled into contigs in the final
assembly.
Assembly
- Polished Assembly Report
- Polished Contigs: The set
of contigs from the de novo assembly that were corrected by Quiver.
- Max Contig Length: The
length of the longest contig in the final assembly.
- Sum of Contig Lengths:
The sum of the lengths of all contigs in the final assembly.
- N50 Contig Length: 50%
of the bases in the final contig are longer than this value.
Assembly
- Iterations Report
- Assembly Iterations: The
number of iterations of overlap-layout-consensus performed by the
de novo or hybrid assembly algorithm.
Assembly
- Top Corrections Report
- Correction: The location
and type of correction.
Assembly
- Correction Report
- Consensus Concordance:
The percent identity between the original and the corrected contig.
Hybrid
Assembly - Final Assembly Report
- Number: The number of scaffolds,
contigs, or gaps in the initial or final assembly.
- Max Length: The length
of the longest scaffold, contig, or gap in the initial or final assembly.
- N50 Length: 50% of all
bases in the initial or final scaffold/contig/gap are longer than
this value.
- Sum Length: The sum of
the lengths of all scaffolds, contigs, or gaps in the initial or final
assembly.
- Initial Scaffolds: The
distribution of the lengths of the scaffold sequences before completing
the AHA algorithm. Scaffolds are composed of contigs optionally separated
by gap sequences.
- Final Scaffolds: The distribution
of the lengths of the scaffold sequences after completing the AHA
algorithm. Scaffolds are composed of contigs optionally separated
by gap sequences.
- Initial Contigs: The distribution
of the lengths of the contig sequences before completing the AHA algorithm.
Contigs are stretches of continuous sequence that do not contain gaps.
- Final Contigs: The distribution
of the lengths of the contig sequences after completing the AHA algorithm.
Contigs are stretches of continuous sequence that do not contain gaps.
- Initial Gaps: The distribution
of the lengths of the gaps between contig sequences before completing
the AHA algorithm.
- Final Gaps: The distribution
of the lengths of the gaps between contig sequences after completing
the AHA algorithm.
Hybrid
Assembly - Assembly Iterations Report
- Input Contigs: The number
of contigs used as input to the AHA algorithm.
- Min Align Score: The minimum
alignment score between a read and a contig to use the alignment for
scaffolding.
- Min Link Redundancy: The
minimum number of reads that must link two contigs for those contigs
to be connected in a scaffold.
- Min Subread Length: The
minimum length required for a subread to be used by the AHA algorithm.
- Min Contig Length: The
minimum length required for a contig to be used by the AHA algorithm.
- Scaffolds Across Assembly Iterations:
The number of scaffolds at a particular iteration of the AHA algorithm.
- Linking Reads Across Assembly Iterations: The number of linking reads at a particular iteration
of the AHA algorithm.
Modifications - Motifs
Report
- Motifs: The nucleotide
sequence of the methyltransferase recognition motif, using the standard
IUPAC nucleotide alphabet.
- Modified Position: The
position within the motif that is modified. The first base is 1. Example:
The modified adenine in GATC is at position 2.
- Modification Type: The
type of chemical modification most commonly identified at that motif.
These are: 6mA, 4mC, 5mC, or modified_base (modification not recognized
by the software.)
- % Motifs Detected: The
percentage of times that this motif was detected as modified across
the entire genome.
- # Of Motifs Detected: The
number of times that this motif was detected as modified across the
entire genome.
- # Of Motifs In Genome:
The number of times this motif occurs in the genome.
- Mean Modification QV: The
mean modification QV for all instances where this motif was detected
as modified.
- Mean Motif Coverage: The
mean coverage for all instances where this motif was detected as modified.
- Partner Motif: For motifs
that are not self-palindromic, this is the complementary sequence.
Amplicons
- Input Metrics Report
- Sample: The number of the
sample.
- Chimeric: The number of
consensus sequences flagged as likely coming from PCR cross-over events.
- Chimeric (%): The percentage
of consensus sequences flagged as likely coming from PCR cross-over
events.
- Noise: The number of consensus
sequences that have a very low predicted accuracy (<95%) despite
sufficient coverage (>20 reads and >10% all sequences in the
current bin) to be called an novel allele.
- Noise (%): The percentage
of consensus sequences that have a very low predicted accuracy (<95%)
despite sufficient coverage (>20 reads and >10% all sequences
in the current bin) to be called an novel allele.
- Good: The number of consensus
sequences not categorized as Chimeric or Noise.
- Good (%): The percentage
of consensus sequences not categorized as Chimeric or Noise.
Amplicons
- Consensus Summary Report
- Sequence Cluster: A name
given to the cluster of sequences roughly corresponding to one amplicon.
- Sequence Phase: A name
given to each phased haplotype within a sequence cluster.
- Length (Bp): The length
of the consensus amplicon sequence.
- Estimated Accuracy: The
estimated accuracy of the consensus amplicon sequence.
- Subreads Coverage: The
number of subreads used to call consensus for this sequence.
IsoSeq - Classify
Report
- Number of reads of insert:
The number of reads of insert.
- Number of five prime reads:
The number of reads of insert with 5 prime signal detected.
- Number of three prime reads:
The number of reads of insert with 3 prime signal detected.
- Number of poly-A reads:
The number of reads of insert with poly-A and 3 prime signals detected.
- Number of filtered short reads:
The number of reads whose read length is less than the specified Minimum
Sequence Length.
- Number of full-length reads:
The number of full-length reads of insert. (Full-length reads are
reads which have both prime signals and poly-A detected.)
- Number of non-full-length reads:
The number of non-full-length reads of insert. (Full-length reads
are reads which have both prime signals and poly-A detected.)
- Number of full-length non-chimeric reads: The number of full-length non-artificial-concatemer
reads of insert. Full-length reads are reads which have both prime
signals and poly-A detected.
- Average full-length non-chimeric read length: The average length of full-length, non-chimeric
reads of insert.
IsoSeq - Cluster Report
- Number of consensus isoforms:
The number of consensus isoform reads.
- Average consensus isoforms read length: The average length of isoform reads that match the
reference sequence.
- Number of polished high-quality isoforms: The number of isoforms, polished using Quiver, whose
sum of base-calling error probability at each site is less
than or equal to a threshold value.
- Number of polished low-quality isoforms: The number of isoforms, polished using Quiver, whose
sum of base-calling error probability at each site is more than or
equal to a threshold value.
Site
Acceptance Test Report
- Instrument: The name of
the instrument on which the Site Acceptance Test is running.
- Genome Covered: The percent
of genomes in the sample covered by the Site Acceptance Test.
- Mean Mapped Read Length:
The mean length of the post-filter reads that mapped to the reference
sequence.
- Reads in Cell: The total
number of reads generated from the SMRT Cell used in the Site Acceptance
Test.
Generic Overview Reports
- SMRT Cells: The number
of SMRT Cells used for the job.
- Movies: The number of movies
generated by the job.
- Number of Bases: The total
number of bases generated by the job.
- N50 Read Length: 50% of
all reads generated by this job are longer than this value.
- Mean Read Length: The mean
length of all the reads generated by the job.
- Mean Read Score: The mean
Read Score for the job. (The Read Score is a de novo prediction of
the mapped accuracy of subreads from a single ZMW.)
- Mapped Reads: The number
of post-filter reads that mapped to the reference sequence.
- Average Reference Length:
The average length of the reference used for the job.
- Average Reference Bases Called:
The percentage of the reference sequence that has ≥ 1x coverage.
- Average Reference Consensus Concordance:
The average accuracy of the consensus sequence compared to the reference
for the job.
- Average Reference Coverage:
The average depth of coverage across references.
- Longest Reference Contig:
The name of the longest contig in the reference sequence.