When working with long read sequencing (LRS), you will typically obtain full-length transcripts with IDs that are specific to your experiment, and therefore are not supported by tappAS. Still you can use tappAS to study your LRS dataset but additional steps are required.
To make LRS data compatible with tappAS, we make use of the SQANTI and IsoAnnotLite tools.
(The current version of IsoAnnotLite is 1.1, make sure you use the last version of the script).
SQANTI is a pipeline for the structural characterization of isoforms obtained by full-length transcript sequencing. SQANTI takes full-length transcript sequences in fasta format, which can be obtained after Iso-seq3 (PacBio) or FLAIR (Nanopore) processing. The only additional requirement is that the species genome and transcriptome files are available, therefore SQANTI is restricted today to work with sequenced species. SQANTI provides a wide range of descriptors of transcript quality and generates a graphical report to aid in the interpretation of LRS results. More information on SQANTI can be obtained here.
There are two ways to transform your SQANTI output to tappAS:
- Transform structural information. In this scenario, the SQANTI transcript types (FSM, ISM, NIC, etc) are processed into a tappAS gff3 file containing this information. This can be fed to tappAS to visualize transcript models and study differential expression, isoform usage and UTR analysis. However, no functional analyses is possible here.
- Transform structural information and add functional data. This is the recommended option if your specie is supported by tappAS. In this case you make use of the tappAS species specific GFF3 to map functional elements to your LRS dataset. Please, bear in mind that in this case, you must run SQANTI with the same reference genome annotation as used in tappAS.
IsoAnnotLite is a python3 script that takes a SQANTI output, and optionally a tappAS precomputed gff3 file, and returns a new gff3 file fully compatible with the tappAS software.
How to proceed
There are four basic steps you need to follow:
As example, the next command will be a form to call the script with a gff3 reference to use as annotation:
- Use SQANTI with your LRS fasta file to obtain «_corrected.gtf», «_classification.txt» and «_junctions.txt» files.
- If your species is supported by tappAS, download the corresponding GFF3 here.
- Download the IsoAnnotLite, script and unzip.
- Run the basic IsoAnnotLite command as indicated below (note that the -gff3 parameter is optional).
python IsoAnnotLite.py my_corrected.gtf my_classification.txt my_junctions.txt -gff3 tappAS.gff3
For example, for a mouse LRS dataset obtained by PacBio, the command will be:
python IsoAnnotLite.py PacBio_corrected.gtf PacBio_classification.txt PacBio_junctions.txt -gff3 Mus_musculus_Ensembl_86.gff3
Note that SQANTI files should be provided in the order indicated above. In case of problems, use the argument «-h» to get help.