Environment Setup

Complete these setup instructions before the program starts. If you encounter issues, attend one of our pre-program support sessions.

Pre-Program Support Sessions

Step 1: Command Line Terminal Setup

Critical step: make sure it works

For macOS Users

Terminal

macOS includes a built-in Terminal application with a Unix-based shell.

# Open Terminal
# Applications > Utilities > Terminal
# Or use Spotlight: Cmd + Space, type "Terminal"

# Verify your shell
echo $SHELL

Install Essential Toos


brew install wget curl git
xcode-select --install
For Windows Users (WSL)

Windows Subsystem for Linux (WSL) provides a Linux environment on Windows.

Enable WSL

# Open PowerShell as Administrator and run:
wsl --install

# This installs WSL with Ubuntu by default
# Restart your computer when prompted

Complete Ubuntu Setup

# After restart, Ubuntu will launch automatically
# Create a username and password when prompted

# Update package lists
sudo apt update
sudo apt upgrade -y

Install Essential Tools

# Install basic utilities
sudo apt install -y wget curl git build-essential
Accessing Windows Files from WSL

Your Windows drives are mounted under /mnt/. For example, C:\Users\YourName\Documents is accessible at /mnt/c/Users/YourName/Documents.

For Linux Users

You likely already have a terminal. Ensure you have essential tools:

# Ubuntu/Debian
sudo apt update
sudo apt install -y wget curl git build-essential

# Fedora/CentOS
sudo dnf install -y wget curl git gcc gcc-c++ make

Step 2: Conda/Mamba Setup

We use Miniforge with mamba for faster and more reliable package installation. Miniforge comes with mamba pre-installed and works well on all platforms including Apple Silicon Macs.

Download Miniforge and Install

# Navigate to your home directory
cd ~

# For Linux (and Windows):
curl -L -O https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-Linux-x86_64.sh
bash Miniforge3-Linux-x86_64.sh

# For macOS (Apple Silicon M1/M2/M3) - RECOMMENDED:
curl -L -O https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-MacOSX-arm64.sh
bash Miniforge3-MacOSX-arm64.sh

# For macOS (Intel):
curl -L -O https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-MacOSX-x86_64.sh
bash Miniforge3-MacOSX-x86_64.sh


# Follow the prompts:
# - Press Enter to review license, then type 'yes' to accept
# - Press Enter to accept default installation location
# - Type 'yes' when asked to run 'conda init'

Initialize and Verify

# Close and reopen your terminal, OR run:
source ~/.bashrc  # For bash
# source ~/.zshrc  # For zsh

# Verify conda and mamba are installed
conda --version
mamba --version

# For Mac OSX users, check your computer and conda CPU architecture and make sure they match
uname -m
conda info | grep subdir

# You should see (base) in your prompt

Configure Channels

Add bioconda and conda-forge channels for bioinformatics packages.

# Add channels in correct priority order
conda config --add channels bioconda
conda config --add channels conda-forge

Step 3: Create Course Environment

Create a dedicated environment with all required tools using mamba for faster installation.

Create the Environment

# Create environment with Python 3.9 using mamba
mamba create -n genomics python=3.11 -y

# Activate the environment
conda activate genomics

# Your prompt should now show (genomics)

Install Bioinformatics Tools

# Install core bioinformatics tools using mamba (faster than conda)
mamba install -c conda-forge -c bioconda sra-tools fastqc fastp seqkit multiqc star salmon samtools igv wget tree -y

Install Python Packages

# Install Python data science packages
mamba install -c conda-forge -c bioconda pandas numpy matplotlib seaborn scikit-learn openpyxl gprofiler-official -y

Verify Installation

# Test that tools are accessible
fastqc --version
fastp --version
STAR --version
samtools --version
salmon --version
multiqc --version

# Test Python packages
python -c "import pandas; import seaborn; print('Python packages OK')"

Step 4: R and Bioconductor Setup

R is needed for DESeq2 and clusterProfiler analysis.

Install R via Mamba

# Make sure your environment is activated
conda activate genomics

# Install R and its packages using mamba
mamba install -c conda-forge -c bioconda \
    r-base=4.4 \
    r-tidyverse \
    r-biocmanager \
    bioconductor-deseq2 \
    bioconductor-tximport \
    bioconductor-clusterprofiler \
    bioconductor-org.hs.eg.db \
    bioconductor-enrichplot \
    r-pheatmap \
    r-ggrepel \
    r-ggplot2 \
    r-pheatmap \
    r-rcolorbrewer \
    -y

Verify R Installation


# Test if R packages installed successfully
R -q -e "suppressWarnings({library(ggplot2); library(pheatmap); library(RColorBrewer); library(DESeq2); library(tximport); library(clusterProfiler)}); cat('All R packages loaded successfully\n')"

Step 5: Verify Complete Setup

Run this verification script to ensure everything is properly installed.



echo "Checking bioinformatics tools..."
tools=("fastqc" "fastp" "STAR" "salmon" "samtools" "multiqc" "seqkit")
for tool in "${tools[@]}"; do
    if command -v $tool &> /dev/null; then
        echo "✓ $tool found"
    else
        echo "✗ $tool NOT found"
    fi
done

echo "Checking Python packages..."
python -c "
import sys
packages = ['pandas', 'numpy', 'matplotlib', 'seaborn']
for pkg in packages:
    try:
        __import__(pkg)
        print(f'✓ {pkg}')
    except ImportError:
        print(f'✗ {pkg} NOT found')
"

echo "Checking R packages..."
Rscript -e "
packages <- c('DESeq2', 'clusterProfiler', 'tximport', 'ggplot2')
for (pkg in packages) {
    if (require(pkg, character.only=TRUE, quietly=TRUE)) {
        cat(paste('✓', pkg, '\n'))
    } else {
        cat(paste('✗', pkg, 'NOT found\n'))
    }
}"

Troubleshooting

Common Issues

Conda command not found

Run source ~/.bashrc or restart your terminal. If still not working, re-run conda init bash.

Package conflicts during installation

Try creating a fresh environment or install packages one at a time to identify conflicts:

conda create -n bioinfo_fresh python=3.11 -y
conda activate bioinfo_fresh
conda install -c bioconda fastqc -y
# Continue with other packages...
WSL: "Unable to locate package"

Update your package lists:

sudo apt update
R package installation fails

Try installing dependencies first:

# In terminal (not R)
sudo apt install -y libcurl4-openssl-dev libssl-dev libxml2-dev

Getting Help

Quick Reference: Daily Commands

# Activate your environment (run this every time you open a new terminal)
conda activate genomics

# Navigate to your project directory
#mkdir ~/genomics #run this the first time you make the folder
cd ~/genomics

# Deactivate when done
conda deactivate

Linux Command Line Cheatsheet

Navigation

# Print current directory
pwd

# List files and directories
ls              # Basic listing
ls -l           # Detailed listing (permissions, size, date)
ls -la          # Include hidden files (starting with .)
ls -lh          # Human-readable file sizes

# Change directory
cd /path/to/dir     # Go to specific directory
cd ~                # Go to home directory
cd ..               # Go up one level
cd -                # Go to previous directory

File Operations

# Create files and directories
mkdir mydir             # Create directory
mkdir -p dir1/dir2      # Create nested directories
touch file.txt          # Create empty file

# Copy files
cp file.txt backup.txt          # Copy file
cp -r mydir/ mydir_backup/      # Copy directory recursively

# Move/rename files
mv oldname.txt newname.txt      # Rename file
mv file.txt /path/to/dest/      # Move file

# Remove files
rm file.txt             # Delete file
rm -r mydir/            # Delete directory recursively
rm -i file.txt          # Ask before deleting

Viewing Files

# Display file contents
cat file.txt            # Show entire file
head file.txt           # Show first 10 lines
head -n 20 file.txt     # Show first 20 lines
tail file.txt           # Show last 10 lines
tail -f logfile.txt     # Follow file updates in real-time
less file.txt           # View with scrolling (q to quit)

# View compressed files
gzip -dc file.gz         # View gzipped file (works on macOS and Linux)
zless file.gz           # View gzipped file with scrolling

File Information

# Count lines, words, characters
wc file.txt             # Lines, words, characters
wc -l file.txt          # Lines only

# File size
du -h file.txt          # Size of file
du -sh mydir/           # Total size of directory

# Disk space
df -h                   # Disk usage summary

Searching

# Find files
find . -name "*.fastq"          # Find by name pattern
find . -type f -size +100M      # Find files larger than 100MB

# Search file contents
grep "pattern" file.txt         # Lines containing pattern
grep -r "pattern" mydir/        # Search recursively in directory
grep -i "pattern" file.txt      # Case-insensitive search
grep -c "pattern" file.txt      # Count matching lines

Pipes and Redirection

# Redirect output
command > file.txt      # Write output to file (overwrite)
command >> file.txt     # Append output to file
command 2> error.txt    # Redirect errors to file

# Pipes (send output to another command)
cat file.txt | head             # First 10 lines
cat file.txt | grep "pattern"   # Filter lines
cat file.txt | wc -l            # Count lines
cat file.txt | sort | uniq      # Sort and remove duplicates

Compression

# Gzip
gzip file.txt           # Compress (creates file.txt.gz)
gunzip file.txt.gz      # Decompress

# Tar archives
tar -cvf archive.tar dir/       # Create archive
tar -xvf archive.tar            # Extract archive
tar -czvf archive.tar.gz dir/   # Create compressed archive
tar -xzvf archive.tar.gz        # Extract compressed archive

Permissions

# View permissions
ls -l file.txt          # Shows -rw-r--r-- format

# Change permissions
chmod +x script.sh      # Make executable
chmod 755 script.sh     # rwx for owner, rx for others

# Change ownership
chown user:group file.txt

Process Management

# View running processes
ps                      # Your processes
ps aux                  # All processes
top                     # Interactive process viewer (q to quit)
htop                    # Better interactive viewer (if installed)

# Control processes
command &               # Run in background
Ctrl+C                  # Stop current process
Ctrl+Z                  # Suspend current process
bg                      # Resume suspended process in background
kill PID                # Terminate process by ID
killall processname     # Terminate by name

Useful Shortcuts

# Keyboard shortcuts
Ctrl+C          # Cancel current command
Ctrl+L          # Clear screen
Ctrl+A          # Move cursor to start of line
Ctrl+E          # Move cursor to end of line
Ctrl+R          # Search command history
Tab             # Auto-complete file/directory names
Up/Down         # Navigate command history

# History
history                 # Show command history
!123                    # Run command #123 from history
!!                      # Repeat last command