Single-Cell RNA-seq Analysis Guide

Introduction
1. Quality Control
2. Normalization
3. Feature Selection
4. Dimensionality Reduction
5. Clustering
6. Cell Type Annotation
7. Differential Expression Analysis
8. Trajectory Analysis
9. Visualization
Best Practices

Introduction

Single-cell RNA sequencing (scRNA-seq) has revolutionized our understanding of cellular heterogeneity. This comprehensive guide walks through the complete analysis workflow from raw data to biological insights.

1. Quality Control

The first critical step is filtering low-quality cells and genes.

Cell-Level QC Metrics:

Number of genes detected: Filter cells with < 200 genes (likely empty droplets)
Total UMI counts: Remove cells with extremely low or high counts
Mitochondrial percentage: High % indicates dying cells (typically filter > 20%)
Doublet detection: Use Scrublet or DoubletFinder to identify multiplets

Gene-Level QC:

Remove genes detected in < 3 cells
Filter mitochondrial and ribosomal genes if needed
Remove genes with zero variance

2. Normalization

Normalize for sequencing depth and technical variation.

Common Normalization Methods:

Log-normalization: Simple, fast, works well for most datasets
SCTransform: Variance-stabilizing transformation, handles technical noise better
scran: Deconvolution-based, good for datasets with many cell types

3. Feature Selection

Identify highly variable genes that drive biological variation.

Select top 2000-3000 highly variable genes
Exclude cell cycle genes if needed
Use variance-stabilized features for downstream analysis

4. Dimensionality Reduction

PCA (Principal Component Analysis):

Reduce to top 30-50 principal components
Use elbow plot to determine optimal number
PCs capture major sources of variation

UMAP/t-SNE Visualization:

UMAP: Better preserves global structure
t-SNE: Better for local structure
Both are stochastic - set random seed for reproducibility

5. Clustering

Group cells with similar expression profiles.

Graph-Based Clustering:

Louvain: Fast, widely used
Leiden: Improved version of Louvain
Resolution parameter: Controls cluster granularity

Evaluating Clusters:

Check cluster stability with different resolutions
Validate with known marker genes
Assess cluster quality with silhouette scores

6. Cell Type Annotation

Marker Gene Identification:

Find differentially expressed genes for each cluster
Use Wilcoxon rank-sum test or t-test
Filter by log fold-change and adjusted p-value

Automated Annotation:

SingleR: Reference-based annotation
CellTypist: Machine learning-based
Azimuth: Reference mapping

7. Differential Expression Analysis

Compare gene expression between conditions or cell types.

Methods:

Pseudobulk: Aggregate cells, use DESeq2/edgeR (recommended)
MAST: Hurdle model for scRNA-seq
Wilcoxon: Non-parametric, fast

8. Trajectory Analysis

Infer developmental trajectories and pseudotime.

Monocle3: Graph-based trajectories
Slingshot: Cluster-based lineages
PAGA: Partition-based graph abstraction

9. Visualization

Essential Plots:

UMAP/t-SNE: Overall structure
Violin plots: Gene expression distributions
Dot plots: Marker gene expression across clusters
Heatmaps: Top marker genes
Feature plots: Individual gene expression

Best Practices

Always perform rigorous QC - garbage in, garbage out
Use multiple normalization methods and compare results
Validate findings with known biology
Check for batch effects and correct if necessary
Document all parameters and software versions

Need Help with Single-Cell Analysis?

Our team has extensive experience analyzing scRNA-seq data from various platforms and tissues.

Single-Cell RNA-seq Analysis: A Complete Guide from QC to Visualization

Table of Contents