Report Terminology

Following are definitions of the terms displayed in the reports generated by SMRT Portal.

General - Filtering Report

Polymerase Read Bases: The number of bases in the polymerase read.
Polymerase Reads: The number of polymerases generating high quality reads. Polymerase reads are trimmed to the high quality region and include bases from adaptors, as well as potentially multiple passes around a circular template.
Polymerase Read N50: 50% of all polymerase reads are longer than this value.
Polymerase Read Length: The mean trimmed read length of all polymerase reads. The value includes bases from adaptors as well as multiple passes around a circular template.
Polymerase Read Quality: The mean single-pass read quality of all polymerase reads.
Post-Filter Polymerase Read Bases: The number of bases in the polymerase reads after filtering, including adaptors.
Post-Filter Polymerase Reads: The number of polymerases generating trimmed reads after filtering. Polymerase reads include bases from adaptors and multiple passes around a circular template.
Post-Filter Polymerase Read Length: The mean trimmed read length of all polymerase reads after filtering. The value includes bases from adaptors as well as multiple passes around a circular template.
Post-Filter Polymerase Read Quality: The mean single-pass read quality of all polymerase reads after filtering.

General - Subread Filtering Report

Mean Subread length: The mean length of the subreads that passed filtering.
Total Number of Bases: The total number of bases in the subreads that passed filtering.
N50: 50% of all bases come from subreads longer than this value.
Number of Reads: The total number of reads that passed filtering.

General - Barcoding Report

Barcode Name: The name of the barcode used.
Reads: The number of reads associated with the barcode.
Bases: The number of bases associated with the barcode.

General - Reads of Insert Report

Movie: The name of the movie file containing the Reads of Insert.
Reads Of Insert: The number of Reads of Insert.
Read Bases Of Insert: The total number of bases.
Mean Read Length Of Insert: The mean read length of the Reads of Insert.
Read Accuracy Of Insert: The mean accuracy of the Reads of Insert.
Mean Read Quality Of Insert: The mean read quality of the Reads of Insert.
Mean Number Of Passes: The mean number of passes used to generate the Reads of Insert.

Diagnostic - Adapters Report

Adapter Dimers (0-10bp): The % of pre-filter ZMWs which have observed inserts of 0-10bp. These are likely adapter dimers.
Short Inserts (11-100bp): The % of pre-filter ZMWs which have observed inserts of 11-100bp. These are likely short fragment contamination.

Diagnostic - Loading Report

SMRT Cell ID: ID number of the SMRT Cell(s) used in this run.
Productive ZMWs: The number of ZMWs for this SMRT Cell that produced results with Productivity = 1.
ZMW Loading For Productivity 0: The percentage of ZMWs that are empty, with no polymerase.
ZMW Loading For Productivity 1: The percentage of ZMWs that are productive and sequencing.
ZMW Loading For Productivity 2: The percentage of ZMWs that are not P0 (empty) or P1 (productive). This may occur for a variety of reasons - the sequence data is not usable.

Diagnostic - Spike-In Control Report

Control Sequence: The name of the control sequence.
Fraction Control Reads: The fraction of post-filter polymerase reads that align to the control reference.
Control Polymerase Read Length N50: 50% of all polymerase reads from the control sample are longer than this value.
Control Polymerase Read Length Mean: The mean mapped read length of the polymerase reads from the control sample.
Number of Control Reads: The total number of polymerase reads from the control sample that passed filtering.
Control Subread Accuracy: The mean single-pass accuracy of the mapped polymerase reads from the control sample.
Control Polymerase Read Length 95%: The 95th percentile of mapped read length of the polymerase reads from the control sample.

Resequencing - Mapping Report

Post-Filter Reads: The number of reads that passed filtering.
Mapped Read: The number of post-filter reads that mapped to the reference sequence.
Mapped Subreads: The number of post-filter subreads that mapped to the reference sequence.
Mapped CCS Reads: The number of post-filter CCS reads that mapped to the reference sequence.
Mapped CCS Read Bases: The number of post-filter CCS read bases that mapped to the reference sequence. This does not include adapters.
Mapped CCS Read Length: The mean read length of post-filter CCS reads that mapped to the reference sequence.
Mapped CCS Read Accuracy: The mean accuracy of post-filter CCS reads that mapped to the reference sequence.
Mapped Subread Bases: The number of post-filter bases from all subreads that mapped to the reference sequence. This does not include adapters.
Mapped Subread Accuracy: The mean accuracy of post-filter subreads that mapped to the reference sequence.
Mapped Read Length: The mean read length of post-filter subreads that mapped to the reference sequence. This does not include adapters.
Mapped Read Length of Insert (bp): The mean read length of all insert sequences, which includes only mapped sequences. The read length of insert is approximately the longest subread length per ZMW.
Mapped Read Length of Insert: The mean read length of all insert sequences, which includes only mapped sequences. The read length of insert is approximately the longest subread length per ZMW.
Mapped Polymerase Read Length: The mean read length of post-filter polymerase reads that mapped to the reference sequence. This includes adapters.
Mapped Polymerase Read Length 95% (bp): The 95th percentile of read length of post-filter polymerase reads that mapped to the reference sequence.
Mapped Polymerase Read Length Max (bp): The maximum read length of post-filter polymerase reads that mapped to the reference sequence.
Mapped Polymerase Read Length N50: 50% of the read length of post-filter polymerase reads that mapped to the reference sequence are longer than this value.
Mapped Subread Length N50 (bp): 50% of full subreads that mapped to the reference sequence are longer than this value. Full subreads are subreads flanked by two adapters.
Mapped Subread Length Mean (bp): The mean length of full subreads that mapped to the reference sequence. Full subreads are subreads flanked by two adapters.
Mapped Subread Length: The mean length of post-filter subreads that mapped to the reference sequence.
Mapped ROI: The total number of Reads of Insert that mapped to the reference for the job.
Mapped ROI Bases Mean: The mean length of the Reads of Insert bases that mapped to the reference.
Mapped ROI Bases N50: 50% of the Reads of Insert bases that mapped to the reference sequence are longer than this value.
Mapped ROI Total: The total number of Reads of Insert base pairs that mapped to the reference for the job.
Mapped ROI Concordance: The mean concordance of the mapped Reads of Inserts compared to the reference sequence.
Mean Mapped Subread Concordance: The mean concordance of the subreads that mapped to the reference sequence.

Resequencing - Coverage Report

Mean Coverage: The mean depth of coverage across the reference sequence.
Missing Bases (%): The percentage of the reference sequence that has zero coverage.
Missing Bases: The percentage of the reference sequence that has zero coverage.

Analysis - Variants Report

Reference: The name of the reference sequence.
Reference Length: The length of the reference sequence.
Bases Called: The percentage of reference sequence that has ≥ 1x coverage. % Bases Called + % Missing Bases should equal 100.
% Bases Called: The percentage of reference sequence that has ≥ 1x coverage. % Bases Called + % Missing Bases should equal 100.
Consensus Accuracy: The accuracy of the consensus sequence compared to the reference.
Base Coverage: The mean depth of coverage across the reference sequence.

Analysis - Top Variants Report

Sequence: The name of the reference sequence.
Position: The position of the variant along the reference sequence.
Variant: The variant position, type, and affected nucleotide.
Type: The variant type: Insertion, Deletion, or Substitution.
Coverage: The coverage at position.
Confidence: The confidence of the variant call.
Genotype: Includes the full number of chromosomes (diploid) or half the number (haploid).

Assembly - Pre-Assembly Report

Seed Bases: The number of bases from seed reads.
Pre-Assembled Yield: The percentage of seed read bases that were successfully aligned to generate pre-assembled reads.
Pre-Assembled Reads Length: The average length of the pre-assembled reads.
Length Cutoff: Reads with lengths greater than the length cutoff are used as seed reads for pre-assembly.
Pre-Assembled Bases: The number of bases in the pre-assembled reads.
Pre-Assembled Reads: The number of reads output by the pre-assembler. Pre-assembled reads are very long, highly accurate reads that can be used as input to a de novo assembler.
Pre-Assembled N50: 50% of the pre-assembled reads are longer than this value.
Draft Contigs: The number of contigs output by Celera Assembler, which may include singleton and degenerate contigs. After assembly polishing with Quiver, the final number of contigs may be smaller.
Reads Assembled (%): The fraction of all reads that are assembled into contigs in the final assembly.

Assembly - Polished Assembly Report

Polished Contigs: The set of contigs from the de novo assembly that were corrected by Quiver.
Max Contig Length: The length of the longest contig in the final assembly.
Sum of Contig Lengths: The sum of the lengths of all contigs in the final assembly.
N50 Contig Length: 50% of the bases in the final contig are longer than this value.

Assembly - Iterations Report

Assembly Iterations: The number of iterations of overlap-layout-consensus performed by the de novo or hybrid assembly algorithm.

Assembly - Top Corrections Report

Correction: The location and type of correction.

Assembly - Correction Report

Consensus Concordance: The percent identity between the original and the corrected contig.

Hybrid Assembly - Final Assembly Report

Number: The number of scaffolds, contigs, or gaps in the initial or final assembly.
Max Length: The length of the longest scaffold, contig, or gap in the initial or final assembly.
N50 Length: 50% of all bases in the initial or final scaffold/contig/gap are longer than this value.
Sum Length: The sum of the lengths of all scaffolds, contigs, or gaps in the initial or final assembly.
Initial Scaffolds: The distribution of the lengths of the scaffold sequences before completing the AHA algorithm. Scaffolds are composed of contigs optionally separated by gap sequences.
Final Scaffolds: The distribution of the lengths of the scaffold sequences after completing the AHA algorithm. Scaffolds are composed of contigs optionally separated by gap sequences.
Initial Contigs: The distribution of the lengths of the contig sequences before completing the AHA algorithm. Contigs are stretches of continuous sequence that do not contain gaps.
Final Contigs: The distribution of the lengths of the contig sequences after completing the AHA algorithm. Contigs are stretches of continuous sequence that do not contain gaps.
Initial Gaps: The distribution of the lengths of the gaps between contig sequences before completing the AHA algorithm.
Final Gaps: The distribution of the lengths of the gaps between contig sequences after completing the AHA algorithm.

Hybrid Assembly - Assembly Iterations Report

Input Contigs: The number of contigs used as input to the AHA algorithm.
Min Align Score: The minimum alignment score between a read and a contig to use the alignment for scaffolding.
Min Link Redundancy: The minimum number of reads that must link two contigs for those contigs to be connected in a scaffold.
Min Subread Length: The minimum length required for a subread to be used by the AHA algorithm.
Min Contig Length: The minimum length required for a contig to be used by the AHA algorithm.
Scaffolds Across Assembly Iterations: The number of scaffolds at a particular iteration of the AHA algorithm.
Linking Reads Across Assembly Iterations: The number of linking reads at a particular iteration of the AHA algorithm.

Modifications - Motifs Report

Motifs: The nucleotide sequence of the methyltransferase recognition motif, using the standard IUPAC nucleotide alphabet.
Modified Position: The position within the motif that is modified. The first base is 1. Example: The modified adenine in GATC is at position 2.
Modification Type: The type of chemical modification most commonly identified at that motif. These are: 6mA, 4mC, 5mC, or modified_base (modification not recognized by the software.)
% Motifs Detected: The percentage of times that this motif was detected as modified across the entire genome.
# Of Motifs Detected: The number of times that this motif was detected as modified across the entire genome.
# Of Motifs In Genome: The number of times this motif occurs in the genome.
Mean Modification QV: The mean modification QV for all instances where this motif was detected as modified.
Mean Motif Coverage: The mean coverage for all instances where this motif was detected as modified.
Partner Motif: For motifs that are not self-palindromic, this is the complementary sequence.

Amplicons - Input Metrics Report

Sample: The number of the sample.
Chimeric: The number of consensus sequences flagged as likely coming from PCR cross-over events.
Chimeric (%): The percentage of consensus sequences flagged as likely coming from PCR cross-over events.
Noise: The number of consensus sequences that have a very low predicted accuracy (<95%) despite sufficient coverage (>20 reads and >10% all sequences in the current bin) to be called an novel allele.
Noise (%): The percentage of consensus sequences that have a very low predicted accuracy (<95%) despite sufficient coverage (>20 reads and >10% all sequences in the current bin) to be called an novel allele.
Good: The number of consensus sequences not categorized as Chimeric or Noise.
Good (%): The percentage of consensus sequences not categorized as Chimeric or Noise.

Amplicons - Consensus Summary Report

Sequence Cluster: A name given to the cluster of sequences roughly corresponding to one amplicon.
Sequence Phase: A name given to each phased haplotype within a sequence cluster.
Length (Bp): The length of the consensus amplicon sequence.
Estimated Accuracy: The estimated accuracy of the consensus amplicon sequence.
Subreads Coverage: The number of subreads used to call consensus for this sequence.

IsoSeq - Classify Report

Number of reads of insert: The number of reads of insert.
Number of five prime reads: The number of reads of insert with 5 prime signal detected.
Number of three prime reads: The number of reads of insert with 3 prime signal detected.
Number of poly-A reads: The number of reads of insert with poly-A and 3 prime signals detected.
Number of filtered short reads: The number of reads whose read length is less than the specified Minimum Sequence Length.
Number of full-length reads: The number of full-length reads of insert. (Full-length reads are reads which have both prime signals and poly-A detected.)
Number of non-full-length reads: The number of non-full-length reads of insert. (Full-length reads are reads which have both prime signals and poly-A detected.)
Number of full-length non-chimeric reads: The number of full-length non-artificial-concatemer reads of insert. Full-length reads are reads which have both prime signals and poly-A detected.
Average full-length non-chimeric read length: The average length of full-length, non-chimeric reads of insert.

IsoSeq - Cluster Report

Number of consensus isoforms: The number of consensus isoform reads.
Average consensus isoforms read length: The average length of isoform reads that match the reference sequence.
Number of polished high-quality isoforms: The number of isoforms, polished using Quiver, whose sum of base-calling error probability at each site is less than or equal to a threshold value.
Number of polished low-quality isoforms: The number of isoforms, polished using Quiver, whose sum of base-calling error probability at each site is more than or equal to a threshold value.

Site Acceptance Test Report

Instrument: The name of the instrument on which the Site Acceptance Test is running.
Genome Covered: The percent of genomes in the sample covered by the Site Acceptance Test.
Mean Mapped Read Length: The mean length of the post-filter reads that mapped to the reference sequence.
Reads in Cell: The total number of reads generated from the SMRT Cell used in the Site Acceptance Test.

Generic Overview Reports

SMRT Cells: The number of SMRT Cells used for the job.
Movies: The number of movies generated by the job.
Number of Bases: The total number of bases generated by the job.
N50 Read Length: 50% of all reads generated by this job are longer than this value.
Mean Read Length: The mean length of all the reads generated by the job.
Mean Read Score: The mean Read Score for the job. (The Read Score is a de novo prediction of the mapped accuracy of subreads from a single ZMW.)
Mapped Reads: The number of post-filter reads that mapped to the reference sequence.
Average Reference Length: The average length of the reference used for the job.
Average Reference Bases Called: The percentage of the reference sequence that has ≥ 1x coverage.
Average Reference Consensus Concordance: The average accuracy of the consensus sequence compared to the reference for the job.
Average Reference Coverage: The average depth of coverage across references.
Longest Reference Contig: The name of the longest contig in the reference sequence.

SMRT® Portal Help

Report Terminology

General - Filtering Report

General - Subread Filtering Report

General - Barcoding Report

General - Reads of Insert Report

Diagnostic - Adapters Report

Diagnostic - Loading Report

Diagnostic - Spike-In Control Report

Resequencing - Mapping Report

Resequencing - Coverage Report

Analysis - Variants Report

Analysis - Top Variants Report

Assembly - Pre-Assembly Report

Assembly - Polished Assembly Report

Assembly - Iterations Report

Assembly - Top Corrections Report

Assembly - Correction Report

Hybrid Assembly - Final Assembly Report

Hybrid Assembly - Assembly Iterations Report

Modifications - Motifs Report

Amplicons - Input Metrics Report

Amplicons - Consensus Summary Report

IsoSeq - Classify Report

IsoSeq - Cluster Report

Site Acceptance Test Report

Generic Overview Reports