RNA-seq Data Analysis Course
From Raw Reads to Biological Insights — A Hands-on Training Program
Learning Outcomes
Understand the biological and technical principles of RNA sequencing, library preparation, and experimental design
Perform quality control, trimming, and pre-processing of raw RNA-seq reads using industry-standard tools
Align reads to reference genomes and transcriptomes; quantify gene and transcript expression
Identify differentially expressed genes using DESeq2 and edgeR and interpret statistical results
Produce publication-quality plots, heatmaps, volcano plots, and comprehensive analysis reports
Perform gene ontology enrichment and pathway analysis to translate DEG lists into biological insight
Run a fully reproducible, scalable RNA-seq analysis using the nf-core/rnaseq pipeline on real datasets on HPC
Program Overview
Biological Background
RNA-seq overview, applications, library preparation, and protocols. Experimental design: replication, batch effects, and statistical power. Sequencing technologies: Illumina, Nanopore, and PacBio SMRT.
View Module 1 →Computational Overview & Data Access
Linux command-line refresher and Ibex HPC orientation. Genomic file formats (FASTA, FASTQ, BAM, GTF). Retrieving reference genomes and public RNA-seq datasets from SRA using sra-toolkit and the NCBI datasets tool.
View Module 2 →QC & Preprocessing
Raw data quality evaluation with FastQC, adapter trimming and quality filtering with fastp, ribosomal and contaminant read removal, and aggregated QC reporting with MultiQC.
View Module 3 →Read Alignment
Splice-aware alignment to GRCh38 with STAR, BAM post-processing with samtools and Picard, alignment QC with RSeQC and QualiMap, and visualization in IGV.
View Module 4 →Quantification
Transcript quantification with Salmon, post-alignment QC with RSeQC and dupRadar, exploratory analysis and sample-level QC (PCA, sample-distance heatmap) in Python and R.
View Module 5 →Standardized Analysis I: nf-core/rnaseq
Introduction to Nextflow and nf-core. Configure and run the nf-core/rnaseq pipeline on Ibex with the KAUST institutional profile — samplesheet setup, key parameters, and interpreting the MultiQC output.
View Module 6 →Differential Expression Analysis
Normalization strategies, statistical testing, experimental design, and contrasts. Differential expression analysis in R using DESeq2 — PCA, volcano plots, MA plots, and heatmaps with ggplot2 and pheatmap.
View Module 7 →Functional Enrichment
Gene Ontology enrichment with clusterProfiler, KEGG pathway analysis, and Gene Set Enrichment Analysis (GSEA) with fgsea — interpreting enrichment results and producing publication-quality dot plots.
View Module 8 →Standardized Analysis II: nf-core/differentialabundance
End-to-end automated differential expression using nf-core/differentialabundance — from count matrices to an interactive Shiny report, covering contrasts configuration and output interpretation.
View Module 9 →Real-world Analysis Capstone
End-to-end analysis of GSE136366 using nf-core/rnaseq and nf-core/differentialabundance. Interpret MultiQC reports, perform DEA, run enrichment analysis, and present results to the group.
View Module 10 →Practical Information
Schedule
- Dates: 5-9 April 2026
- Format: Lectures and hands-on lab sessions
- Venue: Building 9 Room 2120 — KAUST, Thuwal, Saudi Arabia
- Duration: One Week
Prerequisites
- Linux/command-line experience (or completion of Introduction to Applied Bioinformatics)
- Understanding of molecular biology (DNA, RNA, gene expression)
- Ibex HPC account
- Laptop with terminal access
- Finish the instructions on the setup page