Chapter 6 - RNA-seq Analysis
Contents
Chapter 6 - RNA-seq Analysis#
Authors: Nasheath Ahmed
Maintainers: Nasheath
Version: 0.1
License: CC-BY-NC-SA 4.0
Bulk RNA Sequence Analysis#
What is RNA Sequencing?#
Process of using high-throughput sequencing methods to quantify gene expession and provide insight into the transcriptome of cells and tissues. It allows us to gain information on which genes are active and their trancription levels. RNA sequencing can allow for the discovery of novel transcripts and their associated genes.
How are samples prepared and sequenced?#
High quality RNA samples are essential to successful RNA-seq sample preparation. The simple pipeline includes: isolating the RNA from tissues of various samples, breaking the RNA into smaller fragments, converting the RNA fragments into complementary DNA, adding sequencing adaptors to the ends of the DNA, amplify the fragments using PCR. Extract mRNA from tissue of interest, then break it up by process called sonification. Breaks into fragments of about 200 basepairs.
Paired-End Sequencing vs Single-Read Sequencing#
Paired-End sequencing is where both ends of the fragmented DNA can be sequenced. It produces twice the number of reads in the same amount of time. This method allows for more accurate read alignment and can be used to detect insertion-deletion(indel) variants (Nakazato et al., 2013). Single-read sequencing utilizes the fragmented DNA from only one end. It is good choice for methods such as small RNA-Seq.
The fragments of DNA are then sequenced using a high throughput sequencer such as Illumina. The output of the sequencing data is a data file called FASTQwhich stores the nucleotide sequences and the quality scores for the sequences.
Sequence Alignment and Aligners#
Aligment is the process of correlating the fragments of DNA that were sequenced through the
Quality Control#
Quality control is a critical step in ensuring the raw and generated reads from the samples collected are accurate in the transcript measurements and can be used in downstream analysis. The quality of RNA sequencing data can be divided into the RNA quality, the raw read data quality, alignment quality, and expression data quality.
Quality Control tools: FastQC, FastQScreen, FASTX
Data Normalization#
Data Normalization is an essential component in the pipeline of the RNA sequencing analysis. In gene expression data, normalization is used to correct for differences in the amount of total RNA that was extracted from each sample This ensures that the gene expression levels are comparable across samples, allowing for more accurate comparison and interpretation of the results.
After aligning the raw fastq files and obtaining a count matrix of genes/transcripts per sample,
Dimensionality reduction and Data Visualizations#
Differential Gene Expression Analysis Algorithms and Tools#
Differential gene expression is where we look at the distruution of expression in one group versus another group of samples, and determine a gene to be differentially expressed if the distributions of the genes are statistically different.
Enrichment analysis#
Small Molecule Predictions#
Analyzing Large Patient Cohorts#
Single Cell RNA Sequence Analysis#
Technologies used to collect scRNA-seq data#
File formats for Single Cell Data#
Tools and workflows#
Alignment#
Quality control Metrics#
Data normalization and Imputation#
Dimensionality reduction and data visualization#
Automatic identification of clusters#
Differential gene expression analysis#
Cell type identification#
Trajectory analysis#
Spatial single cell transcriptomics#
Nakazato T, Ohta T, Bono H. Experimental design-based functional mining and characterization of high-throughput sequencing data in the sequence read archive. PLoS One. 2013;8(10):e77910.