Single-Cell RNA-seq Analysis: A Complete Guide from QC to Visualization

Introduction

Single-cell RNA sequencing (scRNA-seq) has revolutionized our understanding of cellular heterogeneity. This comprehensive guide walks through the complete analysis workflow from raw data to biological insights.

1. Quality Control

The first critical step is filtering low-quality cells and genes.

Cell-Level QC Metrics:

  • Number of genes detected: Filter cells with < 200 genes (likely empty droplets)
  • Total UMI counts: Remove cells with extremely low or high counts
  • Mitochondrial percentage: High % indicates dying cells (typically filter > 20%)
  • Doublet detection: Use Scrublet or DoubletFinder to identify multiplets

Gene-Level QC:

  • Remove genes detected in < 3 cells
  • Filter mitochondrial and ribosomal genes if needed
  • Remove genes with zero variance

2. Normalization

Normalize for sequencing depth and technical variation.

Common Normalization Methods:

  • Log-normalization: Simple, fast, works well for most datasets
  • SCTransform: Variance-stabilizing transformation, handles technical noise better
  • scran: Deconvolution-based, good for datasets with many cell types

3. Feature Selection

Identify highly variable genes that drive biological variation.

  • Select top 2000-3000 highly variable genes
  • Exclude cell cycle genes if needed
  • Use variance-stabilized features for downstream analysis

4. Dimensionality Reduction

PCA (Principal Component Analysis):

  • Reduce to top 30-50 principal components
  • Use elbow plot to determine optimal number
  • PCs capture major sources of variation

UMAP/t-SNE Visualization:

  • UMAP: Better preserves global structure
  • t-SNE: Better for local structure
  • Both are stochastic - set random seed for reproducibility

5. Clustering

Group cells with similar expression profiles.

Graph-Based Clustering:

  • Louvain: Fast, widely used
  • Leiden: Improved version of Louvain
  • Resolution parameter: Controls cluster granularity

Evaluating Clusters:

  • Check cluster stability with different resolutions
  • Validate with known marker genes
  • Assess cluster quality with silhouette scores

6. Cell Type Annotation

Marker Gene Identification:

  • Find differentially expressed genes for each cluster
  • Use Wilcoxon rank-sum test or t-test
  • Filter by log fold-change and adjusted p-value

Automated Annotation:

  • SingleR: Reference-based annotation
  • CellTypist: Machine learning-based
  • Azimuth: Reference mapping

7. Differential Expression Analysis

Compare gene expression between conditions or cell types.

Methods:

  • Pseudobulk: Aggregate cells, use DESeq2/edgeR (recommended)
  • MAST: Hurdle model for scRNA-seq
  • Wilcoxon: Non-parametric, fast

8. Trajectory Analysis

Infer developmental trajectories and pseudotime.

  • Monocle3: Graph-based trajectories
  • Slingshot: Cluster-based lineages
  • PAGA: Partition-based graph abstraction

9. Visualization

Essential Plots:

  • UMAP/t-SNE: Overall structure
  • Violin plots: Gene expression distributions
  • Dot plots: Marker gene expression across clusters
  • Heatmaps: Top marker genes
  • Feature plots: Individual gene expression

Best Practices

  • Always perform rigorous QC - garbage in, garbage out
  • Use multiple normalization methods and compare results
  • Validate findings with known biology
  • Check for batch effects and correct if necessary
  • Document all parameters and software versions

Need Help with Single-Cell Analysis?

Our team has extensive experience analyzing scRNA-seq data from various platforms and tissues.

Contact Us