Secondary analysis data files
SMRT Portal generates a number of secondary analysis data files. You can download these data files and use them as input for further downstream processing, pass them on to collaborators, or upload to public genome sites.
Available file formats
- H5: Hierarchical Data Format; a file-system-like data format. (Click here for further details.)
- SAM: Sequence Alignment Map is a generic nucleotide alignment format that describes the alignment of query sequences or sequencing reads to a reference sequence or assembly. (Click here for further details.)
- BAM: Binary version of the Sequence Alignment Map (SAM) format. (Click here for further details.)
- BAI: The index file for a file generated in the BAM format. (This is a non-standard file type.)
- FASTA: FASTA-formatted sequence files contains either nucleic acid sequence (such as DNA) or protein sequence information. FASTA files store multiple sequences in a single file. (Click here for further details.)
- GFF: General Feature Format, used for describing genes and other features associated with DNA, RNA and Protein sequences. (Click here for further details.)
- VCF: Variant Call Format, for use with the molecular visualization and analysis program VMD. (Click here for further details.)
- BED: Format that defines the data lines displayed in an annotation track. (Click here for further details.)
- CSV: Comma-Separated Values file. Can be viewed using Microsoft Excel or a text editor.
- GML: An XML representation of the scaffold graph that results from scaffolding contigs using the AHA hybrid assembly algorithm.