Arguments & Options

Amplicon Analysis

Overview

There are more options available for Long Amplicon Analysis than are described in this document, and the available options change with each released version of LAA. For a full set of the available command-line options can be viewed by invoking the following command from within a SMRT Analysis environment:

~$  ConsensusTools.sh  AmpliconAnalysis  --help

Barcoding is handled by its own module, but options relevent to AmpliconAnalysis are described here for conveniences. A full list of barcode-related options can be viewed by invoking:

~$  pbbarcode  --help

Descriptions

Barcoding Options

  • adapterSidePad - Number of padded bases to expect between SMRTbell adapter and barcode (Default=4)
  • insertSidePad - Number of padded bases to expect between barcode and primer (Default=4)
  • scoreMode (symmetric, paired) - The mode in which barcodes should be scored (Default=symmetric)
  • scoreFirst - Whether to attempt to score the left-most barcode in a ZMW (Default=False)

AmpliconAnalysis Options (Multiplexing / Barcoding)

  • barcodes - File of filenames containing *.bc.h5 files of barcoding data to use
  • doBc - Specify a subset of all barcode pairs to process (Default=All)
  • minBarcodeScore - Minimum average barcode score to require of barcoded subreads (Default=0)
  • whiteList - A list of subreads to use in TXT or FASTA format. NOTE: Incompatible with barcoding.

AmpliconAnalysis Options (General)

  • fofn - File of filenames containing *.bas.h5 files of sequencing data to use
  • minLength - Minimum length to require from all subreads used (Default=3000)
  • minReadScore - Minimum ReadScore to require from all subreads used (Default=0.75)
  • minSnr - Minimum Signal-to-Noise ratio to require from all subreads used (Default=3.0)
  • maxReads - Maximum number of subreads to use per barcoded sample (Default=2000)
  • maxPhasingReads - Maximum number of subreads to use for allelic phasing (Default=500)
  • ignoreEnds - Number of bases to ignore for phasing purposes from each end (Default=0)
  • trimEnds - Number of bases to trim from the ends of each consensus sequence (Default=0)
  • numThreads - Maximum number of processors to use during Amplicon Analysis (Default=1)
  • noClustering - Disable the coarse clustering step and skip to the fine phasing step (Default=False)
  • noPhasing - Disable the fine phasing step and skip to the consensus step (Default=False)
  • takeN - Report only the top N consensus sequences from each barcoded sample (Default=0)
  • fastxByBarcode - Separate consensus sequences into different files by barcode (Default=False)

Technical Note

Most options share the same name when called from the commandline or with SMRTpipe. Options where the names differ, or where the options are made available through SMRT Portal as well are listed in the table below

Commandline SMRTpipe SMRT Portal Field
minLength minLength Minimum Subread Length
maxReads maxReads Maximum Number of Subreads
ignoreEnds ignoreEnds Ignore Primer Sequence When Clustering
trimEnds trimEnds Trim Ends Of Sequences
takeN takeN Provide Only The Most Supported Sequences
noClustering noClustering Coarse Cluster Subreads By Gene Family (1)
noPhasing noPhasing Phase Alleles (2)
fastxByBarcode fastxByBarcode Split Results From Each Barcode Into...
minBarcodeScore score (3) Minimum Barcode Score
  1. These options are reversed in SMRT Portal, i.e. un-check the Phase Alleles check-box to add the associated –noPhasing flag to the commandline
  2. This option must be added to the P_Barcode module of the SMRTpipe protocol instead of the P_AmpliconAnalysis module
  3. This option is displayed under the Barcode tab of the Long Amplicon Analysis protocol.

Recommendations

Barcoding Recommendations

  1. adapterSidePad, insertSidePad, scoreMode - these options must reflect the experimental design of the sample. If you do no know the correct settings for these options, please ask the person who ran your sample.
  2. scoreFirst - the left-most scored barcode tends to be lower-scoring and less trustworthy. In un- or low-multiplexed samples where coverage is high, this option should be left off. However in highly-multiplexed samples the increased yield is generally worth the slightly higher error rate.
  3. nProcs - the number of processors to use. Usually set to one less than the number of available processors on the cluster node to be used.

Analysis Recommendations

  1. minLength - The most important setting for Amplicon Analysis is the filter on minimum subread length - for the algorithm to work efficiently, most subreads must cover most of the length of the target amplicons. In general, the minimum length should be between 75-95% the length of the smallest target amplicon. A cut-off that is too short leads to truncated products and problems with clustering, while a cut-off that is too high can lead to missing or reduced coverage of some alleles.
  2. ignoreEnds - The other important setting for reducing the occurance of spurious or duplicate sequence clusters, and thus overall output sequence quality. Errors introduced by primers during PCR, particularly due to degenerate bases, show up as allelic differences during Amplicon Analysis. Careful setting of this option will prevent this without sacrificing sensitivity. IgnoreEnds should be set to the length of the longest primer (including barcode) used, usually 30-50bp.
  3. minBarcodeScore - If data from a multiplexed, barcoded run of amplicons is still generating spurious amplicons after setting the ignoreEnds options, the cause may be cross-talk between barcodes. The minBarcodeScore option from the command-line can be used to reduce this cross-talk at the expense of some yield. MinBarcodeScore should be set to approximately the length of the full barcode sequence used (including any padding), or ~14-18 for the recommended 16mer barcodes.
  4. minSnr - This option is the best available over-all quality filter for Amplicon Analysis. Higher values will reduce the number of spurious consensus sequences and homopolymer indel errors at the expense of sample yield. These improvements are noticable up to an SNR cut-off of ~4.0, with diminishing returns past that up to a minimum SNR of ~6.0.
  5. minReadScore - This option is generally not useful as a filter, and should be avoided in favor of using minSnr whenever possible.
  6. maxPhasingReads - Unless working with samples with highly skewed mixtures of alleles, the default setting should work for almost all samples.
  7. maxReads - To ensure the highest quality consensus sequences, maxReads should be set high enough to allow for all gene clusters to get the full 500 reads used for phasing. Since no mixture of PCR products is perfectly equal, slightly more reads must be used to ensure sufficient coverage of the rarest product. In general, we recommend a setting of ~700 reads per expected locus. So ~2000 reads for a mixture of 3 gene products, and ~3500 reads for a mixture of 5 gene products.
  8. numThreads - The number of processors to use. Usually set to one less than the number of available processors on the cluster node to be used.

Table Of Contents

Previous topic

Running from the Command-Line

Next topic

Example Files

This Page