Circular consensus sequencing (CCS) calculates consensus sequences from multiple “passes” around a circularized single DNA molecule (SMRTbell). CCS uses the Quiver framework to achieve optimal consensus results given the number of passes availalble.
The Circular Consensus module reads basecall data directly from bax.h5 files (formerly, bas.h5 files) supplied on the command-line or via SMRT Portal. Adapter annotations in the bax.h5 file are used to find the subread intervals of the raw read.
The Circular Consensus module emits standard FASTA/Q files and a ccs.h5 file (similar to a bax.h5 file) containing a sequence for each ZMW whose consensus sequence meets the quality filters. The filters ensure that the read made a minimum number of complete passes over the insert sequence (default is a minimum of one full pass), and that the expected accuracy of the consensus sequence exceeds some value (default is 90% accuracy).
If the user provides a file named “m130524_055855_sherri_c100525322550000001823081109281363_s1_p0.1.bax.h5” is used as input to P_CCS, the workflow will output files named:
in the job output data directory.
The command line interface to invoke CCS within a SMRTanalysis installation is:
ConsensusTools.sh CircularConsensus
-q <outFastq> # default <outFastq> = <file>.fastq
-f <outFasta> # default <outFasta> = <file>.fasta
--h5 <outH5> # default <outH5> = <file>.ccs.h5
-n <numWorkers> # Number of threads to use when processing ZMWs
-c <chemistryMapping> # Chemistry mapping xml file
--minPredictedAccuracy # Requested accuracy threshold
--minFullPasses # Requested minimum number of passes
<file>.bax.h5 # Input bax.h5 file
Instead of a single bax.h5, one can also provide a file of file names (“FOFN”) of individual bax.h5 files using the --fofn flag.
The chemistry_mapping.xml file is produced by SMRT Pipe and contains information about the sequencing chemistry used to generate the data. The tool will autmatically select the appropriate Quiver parameters based on this information.
In this example the bax.h5 file listed will be used as input, and FASTA, FASTQ, and ccs.h5 output files will be produced.
The PacBio RS produces sequencing data by reading a circular SMRTbell molecule containing the insert DNA of insert, flanked by hairpin adapters. During primary analysis, the raw read (“polymerase read”) is segmented by identifying the locations of adapter sequence. The segments between adapter hits–corresponding to the insert DNA sequence–are excised as “subreads” and are used as the starting point for CCS analysis.
A subread is sometimes termed a “pass”, as well, because it represents the sequence read from a single pass of the polymerase across the insert sequence. A subread is called a “full pass” if it is flanked on both ends by adapter sequence. Otherwise it is called a “partial pass.”
The subreads intervals of the raw read are determined using the adapter annotations stored in the bax.h5 file. The subreads are loaded into the Quiver consensus calling framework, which iteratively refines the consensus sequence using a PacBio specific error model, and rich per-base QVs (InsertionQV, DeletionQV, MergeQV) contained in the bax.h5 file. For more details on the Quiver method, see our HGAP publication here:
http://www.nature.com/nmeth/journal/v10/n6/full/nmeth.2474.html
The --minFullPasses flag allows control over how many (full pass) subreads are required in order for a consensus read to be output.
Consensus for a single subread is the subread itself, just as the mean of a list containing one item is just the item itself.
Note that --minFullPasses is just one among many filters available. Using --minFullPasses={0,1} may not result in any single subread consensus reads if the --minPredictedAccuracy filter is set higher than the average single pass accuracy of the sequencing chemistry.
Users of SMRT Portal can interface with CCS via the RS_ReadsOfInsert.1.xml protocol. Additionally, SMRT Portal provides a protocol called RS_ReadsOfInsert_Mapping.1.xml, which performs a subsequent mapping step.
The following parameters are exposed in SMRTportal:
The reports generated in SMRTportal for a CCS analysis include:
Users of SMRTpipe can interface with CCS via the P_CCS module.