Table of Contents
Introduction
Single-cell RNA sequencing (scRNA-seq) has revolutionized our understanding of cellular heterogeneity. This comprehensive guide walks through the complete analysis workflow from raw data to biological insights.
1. Quality Control
The first critical step is filtering low-quality cells and genes.
Cell-Level QC Metrics:
- Number of genes detected: Filter cells with < 200 genes (likely empty droplets)
- Total UMI counts: Remove cells with extremely low or high counts
- Mitochondrial percentage: High % indicates dying cells (typically filter > 20%)
- Doublet detection: Use Scrublet or DoubletFinder to identify multiplets
Gene-Level QC:
- Remove genes detected in < 3 cells
- Filter mitochondrial and ribosomal genes if needed
- Remove genes with zero variance
2. Normalization
Normalize for sequencing depth and technical variation.
Common Normalization Methods:
- Log-normalization: Simple, fast, works well for most datasets
- SCTransform: Variance-stabilizing transformation, handles technical noise better
- scran: Deconvolution-based, good for datasets with many cell types
3. Feature Selection
Identify highly variable genes that drive biological variation.
- Select top 2000-3000 highly variable genes
- Exclude cell cycle genes if needed
- Use variance-stabilized features for downstream analysis
4. Dimensionality Reduction
PCA (Principal Component Analysis):
- Reduce to top 30-50 principal components
- Use elbow plot to determine optimal number
- PCs capture major sources of variation
UMAP/t-SNE Visualization:
- UMAP: Better preserves global structure
- t-SNE: Better for local structure
- Both are stochastic - set random seed for reproducibility
5. Clustering
Group cells with similar expression profiles.
Graph-Based Clustering:
- Louvain: Fast, widely used
- Leiden: Improved version of Louvain
- Resolution parameter: Controls cluster granularity
Evaluating Clusters:
- Check cluster stability with different resolutions
- Validate with known marker genes
- Assess cluster quality with silhouette scores
6. Cell Type Annotation
Marker Gene Identification:
- Find differentially expressed genes for each cluster
- Use Wilcoxon rank-sum test or t-test
- Filter by log fold-change and adjusted p-value
Automated Annotation:
- SingleR: Reference-based annotation
- CellTypist: Machine learning-based
- Azimuth: Reference mapping
7. Differential Expression Analysis
Compare gene expression between conditions or cell types.
Methods:
- Pseudobulk: Aggregate cells, use DESeq2/edgeR (recommended)
- MAST: Hurdle model for scRNA-seq
- Wilcoxon: Non-parametric, fast
8. Trajectory Analysis
Infer developmental trajectories and pseudotime.
- Monocle3: Graph-based trajectories
- Slingshot: Cluster-based lineages
- PAGA: Partition-based graph abstraction
9. Visualization
Essential Plots:
- UMAP/t-SNE: Overall structure
- Violin plots: Gene expression distributions
- Dot plots: Marker gene expression across clusters
- Heatmaps: Top marker genes
- Feature plots: Individual gene expression
Best Practices
- Always perform rigorous QC - garbage in, garbage out
- Use multiple normalization methods and compare results
- Validate findings with known biology
- Check for batch effects and correct if necessary
- Document all parameters and software versions
Need Help with Single-Cell Analysis?
Our team has extensive experience analyzing scRNA-seq data from various platforms and tissues.
Contact Us