How many replicates per sample do I need to…
Users will need an RNA-Seq dataset with at least 2 replicates in order for tappAS to work. However, the more replicates, the better.
Users will need an RNA-Seq dataset with at least 2 replicates in order for tappAS to work. However, the more replicates, the better.
Long-read data currently provides less sequencing depth in comparison to short-read datasets. To make the most of Functional Iso-Transcriptomics analysis, we recommend using short-reads for quantification.
To get a transcript counts matrix, we recommend finding a compatible short-read dataset (in case you did not generate Illumina data for your samples) and map these Illumina reads to your species’ genome using the curated GTF from SQANTI3. To get transcript counts fast, you can use a pseudoaligner such as Kallisto, or use a splice-awer mapper such as STAR followed by a quantification tool like RSEM to get counts.
Yes, tappAS (as well as SQANTI3 and IsoAnnotLite) are fully compatible with Nanopore data as well as PacBio data.
The best way to use tappAS is to generate one transcriptome for all sequenced samples. To do this, users should pre-process all SMRT cells together using IsoSeq3, and then run SQANTI3 on the output of this joint run. IsoSeq3 documentation contains more info on how to merge SMRT cells -typically all users will need to do is merge the output of the refine command, and then repeat the clustering step. See the IsoSeq3 documentation IsoSeq3 documentation for details.
WARNING: note that the QC report produced by SQANTI3 can help you make informed decisions about isoforms that might be false positives or low quality, and we strongly advise you to remove them from your transcriptome before you continue your analysis. We recommend reading the SQANTI paper to get a better idea of how to produce a high-quality, curated transcriptome.
We have included a detailed description of the methods implemented in tappAS in our tappAS paper (see Methods section).
NOISeq is an R package for QC of RNA-Seq data and DE analysis that was developed by our group, and we routinely use it to perform our own analyses. We also included edgeR because it is one of the reference tools in the field, and even though it performs very similarly to DESeq2 (see this blog post by Mike Love for a more thorough description), it includes the TMM normalization method. TMM is one of the most suitable methods for normalization prior to DE analysis, outperforming global scaling methods such as TPM or RPKM, which remove important differences between samples and should be used only for within-sample comparisons (see our group’s review paper on best practices for RNA-Seq analysis for more info). NOISeq also includes the possibility to run edgeR’s TMM function using a wrapper.
Finally, it has been shown that DESeq2 is less optimal in terms of runtime/memory efficiency in comparison to edgeR (see this paper), but we’re considering implementing it in future releases of the application.
If your organism is not already annotated in tappAS, you can use IsoAnnotLite to re-format your GTF file and make a tappAS-compatible GFF3 file. However, bear in mind that this file will not contain functional labels, and therefore that you’ll only be able to run tappAS Differential Module.
While any organism can be potentially annotated at the functional level, adding functional labels is quite complex. At the moment, we are working on having a robust pipeline for de novo transcriptome annotation, unfortunately, it is not ready for public use yet. Of note, IsoAnnotLite contains an option to positionally transfer functional labels from an already-annotated GFF3 file, but cross-species usage is very likely to fail.
We are aware that currently available transcriptomes in tappAS were annotated a few years ago, however, annotating a transcriptome de novo is a very time-consuming, computationally costly task. At the moment, we are working on updating our transcriptomes, but it will not be immediate. We recommend checking the version of the organism’s transcriptome that is available in tappAS before analyzing your data, and using a matching release for quantification.
If you want to use your own version of the transcriptome instead of one of tappAS’ available annotation versions, the only current way around this is to use our IsoAnnotLite tool to transfer functional labels from our GFF3 files to your own GTF annotation file. This will create a GFF3 file that is compatible with tappAS and that contains at least part of our reference transcriptome’s functional labels.
However, this process is done based on genomic positions, and therefore some functional annotations may be lost in the process and/or some transcripts may remain unnanoted because they were not in the transcriptome version that we’re using. More information on how to use IsoAnnotLite with any annotation file can be found here. Of note, you will require SQANTI3 output files to use IsoAnnotLite, but the tool can also be run jointly with SQANTI3 via the the –isoAnnotLite argument (see SQANTI3 README for details).