Environment Setup
Complete these setup instructions before the program starts. If you encounter issues, attend one of our pre-program support sessions.
- Sunday, February 1 at 16:15 (Zoom links sent by email)
- Monday, February 2 at 14:00 (Zoom links sent by email)
Step 1: Command Line Terminal Setup
Critical step: make sure it works
For macOS Users
Terminal
macOS includes a built-in Terminal application with a Unix-based shell.
# Open Terminal
# Applications > Utilities > Terminal
# Or use Spotlight: Cmd + Space, type "Terminal"
# Verify your shell
echo $SHELL
Install Essential Toos
brew install wget curl git
xcode-select --install
For Windows Users (WSL)
Windows Subsystem for Linux (WSL) provides a Linux environment on Windows.
Enable WSL
# Open PowerShell as Administrator and run:
wsl --install
# This installs WSL with Ubuntu by default
# Restart your computer when prompted
Complete Ubuntu Setup
# After restart, Ubuntu will launch automatically
# Create a username and password when prompted
# Update package lists
sudo apt update
sudo apt upgrade -y
Install Essential Tools
# Install basic utilities
sudo apt install -y wget curl git build-essential
Your Windows drives are mounted under /mnt/. For example, C:\Users\YourName\Documents is accessible at /mnt/c/Users/YourName/Documents.
For Linux Users
You likely already have a terminal. Ensure you have essential tools:
# Ubuntu/Debian
sudo apt update
sudo apt install -y wget curl git build-essential
# Fedora/CentOS
sudo dnf install -y wget curl git gcc gcc-c++ make
Step 2: Conda/Mamba Setup
We use Miniforge with mamba for faster and more reliable package installation. Miniforge comes with mamba pre-installed and works well on all platforms including Apple Silicon Macs.
Download Miniforge and Install
# Navigate to your home directory
cd ~
# For Linux (and Windows):
curl -L -O https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-Linux-x86_64.sh
bash Miniforge3-Linux-x86_64.sh
# For macOS (Apple Silicon M1/M2/M3) - RECOMMENDED:
curl -L -O https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-MacOSX-arm64.sh
bash Miniforge3-MacOSX-arm64.sh
# For macOS (Intel):
curl -L -O https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-MacOSX-x86_64.sh
bash Miniforge3-MacOSX-x86_64.sh
# Follow the prompts:
# - Press Enter to review license, then type 'yes' to accept
# - Press Enter to accept default installation location
# - Type 'yes' when asked to run 'conda init'
Initialize and Verify
# Close and reopen your terminal, OR run:
source ~/.bashrc # For bash
# source ~/.zshrc # For zsh
# Verify conda and mamba are installed
conda --version
mamba --version
# For Mac OSX users, check your computer and conda CPU architecture and make sure they match
uname -m
conda info | grep subdir
# You should see (base) in your prompt
Configure Channels
Add bioconda and conda-forge channels for bioinformatics packages.
# Add channels in correct priority order
conda config --add channels bioconda
conda config --add channels conda-forge
Step 3: Create Course Environment
Create a dedicated environment with all required tools using mamba for faster installation.
Create the Environment
# Create environment with Python 3.9 using mamba
mamba create -n genomics python=3.11 -y
# Activate the environment
conda activate genomics
# Your prompt should now show (genomics)
Install Bioinformatics Tools
# Install core bioinformatics tools using mamba (faster than conda)
mamba install -c conda-forge -c bioconda sra-tools fastqc fastp seqkit multiqc star salmon samtools igv wget tree -y
Install Python Packages
# Install Python data science packages
mamba install -c conda-forge -c bioconda pandas numpy matplotlib seaborn scikit-learn openpyxl gprofiler-official -y
Verify Installation
# Test that tools are accessible
fastqc --version
fastp --version
STAR --version
samtools --version
salmon --version
multiqc --version
# Test Python packages
python -c "import pandas; import seaborn; print('Python packages OK')"
Step 4: R and Bioconductor Setup
R is needed for DESeq2 and clusterProfiler analysis.
Install R via Mamba
# Make sure your environment is activated
conda activate genomics
# Install R and its packages using mamba
mamba install -c conda-forge -c bioconda \
r-base=4.4 \
r-tidyverse \
r-biocmanager \
bioconductor-deseq2 \
bioconductor-tximport \
bioconductor-clusterprofiler \
bioconductor-org.hs.eg.db \
bioconductor-enrichplot \
r-pheatmap \
r-ggrepel \
r-ggplot2 \
r-pheatmap \
r-rcolorbrewer \
-y
Verify R Installation
# Test if R packages installed successfully
R -q -e "suppressWarnings({library(ggplot2); library(pheatmap); library(RColorBrewer); library(DESeq2); library(tximport); library(clusterProfiler)}); cat('All R packages loaded successfully\n')"
Step 5: Verify Complete Setup
Run this verification script to ensure everything is properly installed.
echo "Checking bioinformatics tools..."
tools=("fastqc" "fastp" "STAR" "salmon" "samtools" "multiqc" "seqkit")
for tool in "${tools[@]}"; do
if command -v $tool &> /dev/null; then
echo "✓ $tool found"
else
echo "✗ $tool NOT found"
fi
done
echo "Checking Python packages..."
python -c "
import sys
packages = ['pandas', 'numpy', 'matplotlib', 'seaborn']
for pkg in packages:
try:
__import__(pkg)
print(f'✓ {pkg}')
except ImportError:
print(f'✗ {pkg} NOT found')
"
echo "Checking R packages..."
Rscript -e "
packages <- c('DESeq2', 'clusterProfiler', 'tximport', 'ggplot2')
for (pkg in packages) {
if (require(pkg, character.only=TRUE, quietly=TRUE)) {
cat(paste('✓', pkg, '\n'))
} else {
cat(paste('✗', pkg, 'NOT found\n'))
}
}"
Troubleshooting
Common Issues
Run source ~/.bashrc or restart your terminal. If still not working, re-run conda init bash.
Try creating a fresh environment or install packages one at a time to identify conflicts:
conda create -n bioinfo_fresh python=3.11 -y
conda activate bioinfo_fresh
conda install -c bioconda fastqc -y
# Continue with other packages...
Update your package lists:
sudo apt update
Try installing dependencies first:
# In terminal (not R)
sudo apt install -y libcurl4-openssl-dev libssl-dev libxml2-dev
Getting Help
- Attend pre-program support sessions (Zoom links will be provided)
- Email us with your error messages and system information
Quick Reference: Daily Commands
# Activate your environment (run this every time you open a new terminal)
conda activate genomics
# Navigate to your project directory
#mkdir ~/genomics #run this the first time you make the folder
cd ~/genomics
# Deactivate when done
conda deactivate
Linux Command Line Cheatsheet
Navigation
# Print current directory
pwd
# List files and directories
ls # Basic listing
ls -l # Detailed listing (permissions, size, date)
ls -la # Include hidden files (starting with .)
ls -lh # Human-readable file sizes
# Change directory
cd /path/to/dir # Go to specific directory
cd ~ # Go to home directory
cd .. # Go up one level
cd - # Go to previous directory
File Operations
# Create files and directories
mkdir mydir # Create directory
mkdir -p dir1/dir2 # Create nested directories
touch file.txt # Create empty file
# Copy files
cp file.txt backup.txt # Copy file
cp -r mydir/ mydir_backup/ # Copy directory recursively
# Move/rename files
mv oldname.txt newname.txt # Rename file
mv file.txt /path/to/dest/ # Move file
# Remove files
rm file.txt # Delete file
rm -r mydir/ # Delete directory recursively
rm -i file.txt # Ask before deleting
Viewing Files
# Display file contents
cat file.txt # Show entire file
head file.txt # Show first 10 lines
head -n 20 file.txt # Show first 20 lines
tail file.txt # Show last 10 lines
tail -f logfile.txt # Follow file updates in real-time
less file.txt # View with scrolling (q to quit)
# View compressed files
gzip -dc file.gz # View gzipped file (works on macOS and Linux)
zless file.gz # View gzipped file with scrolling
File Information
# Count lines, words, characters
wc file.txt # Lines, words, characters
wc -l file.txt # Lines only
# File size
du -h file.txt # Size of file
du -sh mydir/ # Total size of directory
# Disk space
df -h # Disk usage summary
Searching
# Find files
find . -name "*.fastq" # Find by name pattern
find . -type f -size +100M # Find files larger than 100MB
# Search file contents
grep "pattern" file.txt # Lines containing pattern
grep -r "pattern" mydir/ # Search recursively in directory
grep -i "pattern" file.txt # Case-insensitive search
grep -c "pattern" file.txt # Count matching lines
Pipes and Redirection
# Redirect output
command > file.txt # Write output to file (overwrite)
command >> file.txt # Append output to file
command 2> error.txt # Redirect errors to file
# Pipes (send output to another command)
cat file.txt | head # First 10 lines
cat file.txt | grep "pattern" # Filter lines
cat file.txt | wc -l # Count lines
cat file.txt | sort | uniq # Sort and remove duplicates
Compression
# Gzip
gzip file.txt # Compress (creates file.txt.gz)
gunzip file.txt.gz # Decompress
# Tar archives
tar -cvf archive.tar dir/ # Create archive
tar -xvf archive.tar # Extract archive
tar -czvf archive.tar.gz dir/ # Create compressed archive
tar -xzvf archive.tar.gz # Extract compressed archive
Permissions
# View permissions
ls -l file.txt # Shows -rw-r--r-- format
# Change permissions
chmod +x script.sh # Make executable
chmod 755 script.sh # rwx for owner, rx for others
# Change ownership
chown user:group file.txt
Process Management
# View running processes
ps # Your processes
ps aux # All processes
top # Interactive process viewer (q to quit)
htop # Better interactive viewer (if installed)
# Control processes
command & # Run in background
Ctrl+C # Stop current process
Ctrl+Z # Suspend current process
bg # Resume suspended process in background
kill PID # Terminate process by ID
killall processname # Terminate by name
Useful Shortcuts
# Keyboard shortcuts
Ctrl+C # Cancel current command
Ctrl+L # Clear screen
Ctrl+A # Move cursor to start of line
Ctrl+E # Move cursor to end of line
Ctrl+R # Search command history
Tab # Auto-complete file/directory names
Up/Down # Navigate command history
# History
history # Show command history
!123 # Run command #123 from history
!! # Repeat last command