TappAS’ website contains an Overview section that describes the necessary inputs to create a project, the application’s computational requirements, the graphical interface, and some of tappAS main features. We are also working on a series of Tutorials demonstrating different aspects of the application and how to run and interpret Functional Iso-Transcriptomics analyses. You can see all currently published tutorials here.
In addition, our team gave a live tutorial on Functional Iso-Transcriptomics (FIT) analysis at ISMB 2020 online conference. If interested in watching the lecture and/or doing the exercises, you can find the part of the tutorial addressing tappAS in this video: https://www.youtube.com/watch?reload=9&v=yUYlLOQmO1A. The necessary files used during the hands-on, as well as a slides containing all hands-on exercise descriptions, are available here. Note that all files used for this purpose are the same as those provided in the Demo project (see FAQ right below).
Finally, we recommend using the “Help” buttons available in all of tappAS windows and panels. These buttons work as built-in documentation and will provide extra information for whatever it is a user is viewing at the moment: analysis parameters window before running an analysis, a results table, a plot or summary graphics.
TappAS includes a Demo dataset that users can load by selecting the option “Demo” upon project creation (see image below). The Installation PDF contains the exact steps to follow to create a Demo project, and this is the dataset that we generally recommend for users to try out the application. It is also the same dataset that is used in our recently published tappAS paper, where users may find more examples of analysis, how to couple them, how to interpret results and the graphics and plots generated by the application.
TappAS s a GFF3-like format for its transcriptome annotation, but it is not exactly the same as just providing a GFF3 file for the transcriptome used for isoform quantification.
When we refer to tappAS annotation file, we are referring to not only traditional transcript annotation, but also to functional annotation. This means that our GFF3 files are formatted to combine both traditional annotation features (transcripts, exons, etc.) and functional labels, including, but not limited to, UTR lengths, CDS positions, predicted PFAM domains, sequence motifs, etc. More details about annotated functional categories and their source databases are available in our tappAS paper. Details about this GFF3-based formatting can be found in the “Projects” section of our website’s Overview page.
Users can potentially use any annotation file with tappAS. However, it is required that users functionally annotate the file and re-format it as a tappAS-compatible GFF3. Since this is not a trivial task, we currently provide pre-annotated reference GFF3 files for ENSEMBL and RefSeq for mouse, human and a few other model organisms, but only for some versions of the annotation. We recommend checking the tappAS paper for further information on transcriptome versions and the functional label categories they contain.
I’m working with a species that is already available in tappAS, but I want to use a newer version of the transcriptome. How do I get a functionally-annotated, tappAS-compatible GFF3 file?
If you want to use your own version of the transcriptome instead of one of tappAS’ available annotation versions, the only current way around this is to use our IsoAnnotLite tool to transfer functional labels from our GFF3 files to your own GTF annotation file. This will create a GFF3 file that is compatible with tappAS and that contains at least part of our reference transcriptome’s functional labels.
However, this process is done based on genomic positions, and therefore some functional annotations may be lost in the process and/or some transcripts may remain unnanoted because they were not in the transcriptome version that we’re using. More information on how to use IsoAnnotLite with any annotation file can be found here. Of note, you will require SQANTI3 output files to use IsoAnnotLite, but the tool can also be run jointly with SQANTI3 via the the –isoAnnotLite argument (see SQANTI3 README for details).
We are aware that currently available transcriptomes in tappAS were annotated a few years ago, however, annotating a transcriptome de novo is a very time-consuming, computationally costly task. At the moment, we are working on updating our transcriptomes, but it will not be immediate. We recommend checking the version of the organism’s transcriptome that is available in tappAS before analyzing your data, and using a matching release for quantification.
I am working with a non-model organism, or with an organism that is not included in tappAS’ reference GFF3 files. What should I do?
If your organism is not already annotated in tappAS, you can use IsoAnnotLite to re-format your GTF file and make a tappAS-compatible GFF3 file. However, bear in mind that this file will not contain functional labels, and therefore that you’ll only be able to run tappAS Differential Module.
While any organism can be potentially annotated at the functional level, adding functional labels is quite complex. At the moment, we are working on having a robust pipeline for de novo transcriptome annotation, unfortunately, it is not ready for public use yet. Of note, IsoAnnotLite contains an option to positionally transfer functional labels from an already-annotated GFF3 file, but cross-species usage is very likely to fail.
Only edgeR and NOIseq were implemented for DE analysis in tappAS, while DESeq2 was excluded from the DE module. Is there any rationale behind this choice?
NOISeq is an R package for QC of RNA-Seq data and DE analysis that was developed by our group, and we routinely use it to perform our own analyses. We also included edgeR because it is one of the reference tools in the field, and even though it performs very similarly to DESeq2 (see this blog post by Mike Love for a more thorough description), it includes the TMM normalization method. TMM is one of the most suitable methods for normalization prior to DE analysis, outperforming global scaling methods such as TPM or RPKM, which remove important differences between samples and should be used only for within-sample comparisons (see our group’s review paper on best practices for RNA-Seq analysis for more info). NOISeq also includes the possibility to run edgeR’s TMM function using a wrapper.
Finally, it has been shown that DESeq2 is less optimal in terms of runtime/memory efficiency in comparison to edgeR (see this paper), but we’re considering implementing it in future releases of the application.
I would like to know more about how tappAS’ analysis are conducted. Where can I find a more detailed description?
I processed my long read samples separately, but tappAS requires one transcriptome for all samples. How do I generate one?
The best way to use tappAS is to generate one transcriptome for all sequenced samples. To do this, users should pre-process all SMRT cells together using IsoSeq3, and then run SQANTI3 on the output of this joint run. IsoSeq3 documentation contains more info on how to merge SMRT cells -typically all users will need to do is merge the output of the refine command, and then repeat the clustering step. See the IsoSeq3 documentation IsoSeq3 documentation for details.
WARNING: note that the QC report produced by SQANTI3 can help you make informed decisions about isoforms that might be false positives or low quality, and we strongly advise you to remove them from your transcriptome before you continue your analysis. We recommend reading the SQANTI paper to get a better idea of how to produce a high-quality, curated transcriptome.
Yes, tappAS (as well as SQANTI3 and IsoAnnotLite) are fully compatible with Nanopore data as well as PacBio data.
To get a transcript counts matrix, we recommend finding a compatible short-read dataset (in case you did not generate Illumina data for your samples) and map these Illumina reads to your species’ genome using the curated GTF from SQANTI3. To get transcript counts fast, you can use a pseudoaligner such as Kallisto, or use a splice-awer mapper such as STAR followed by a quantification tool like RSEM to get counts.
Should I use the FL counts file output by long-read processing tools, such as Cupcake and TAMA, as my count matrix?
Long-read data currently provides less sequencing depth in comparison to short-read datasets. To make the most of Functional Iso-Transcriptomics analysis, we recommend using short-reads for quantification.
Users will need an RNA-Seq dataset with at least 2 replicates in order for tappAS to work. However, the more replicates, the better.
In tappAS 0.99.15, we update some sources and how tappAS works with the annotations files. If you don’t have the lasted version of tappAS your next projects downloading a new annotation will failed. By other hand, your old projects wouldn’t worked with new annotations by the same change.
If you want to update tappAS, you have to delete your old projects and delete your References folder in your tappasWorkspace before run the new version. Then, you can create your old projects again and they will work with the new changes.
Remember that tappAS is still a beta version and changes like this could happen in the future. If you want to update tappAS to version 0.99.15 or lasted and you have a lower version can follow this link.
By default tappAS is using 1.7Gb of RAM if your computer has at least 8Gb, but in some cases it isn’t enough to run all analysis. If you have some problems with that you can run tappAS with more RAM if is available.
To do that follow the next steps:
1º Go to tappAS default folder (usually tappas.0.99.xx where xx is your release version)
2º Run the tappas.jar file with the next commant line:
java -XmsAM -XmxBM -jar tappas.jar
Where A and B are the minimum and maximum RAM (in Mb) that you want to give to the app. If you want to give 2Gb as minimun and 6Gb as maximum you need to run:
java -Xms2000M -Xmx6000M -jar tappas.jar
(You need to assign ram in Mb)
tappAS: a comprehensive computational framework for the analysis of the functional impact of differential splicing
Lorena de la Fuente, Ángeles Arzalluz-Luque, Manuel Tardáguila, Héctor del Risco, Cristina Martí, Sonia Tarazona, Pedro Salguero, Raymond Scott, Ana Alastrue-Agudo, Pablo Bonilla, Jeremy Newman, Lauren McIntyre, Victoria Moreno-Manzano, Ana Conesa
bioRxiv 690743; doi: https://doi.org/10.1101/690743