# R語言ggplot2繪制熱圖展示GO富集分析結果的是怎樣的
## 摘要
基因本體論(GO)富集分析是生物信息學中解讀高通量數據的核心方法。本文詳細介紹如何使用R語言中的ggplot2包將GO富集結果轉化為直觀的熱圖可視化,包括數據預處理、圖形定制和結果解讀的全流程。通過完整的代碼示例和參數解析,幫助研究者掌握專業級GO熱圖的繪制技巧。
## 1. GO富集分析與可視化概述
### 1.1 GO富集分析原理
基因本體論(Gene Ontology, GO)通過三個層次描述基因功能:
- 分子功能(Molecular Function)
- 生物過程(Biological Process)
- 細胞組分(Cellular Component)
富集分析通過統計檢驗識別在差異表達基因中顯著過表征的GO term,常用方法包括:
- 超幾何檢驗
- Fisher精確檢驗
- GSEA算法
### 1.2 可視化需求
原始富集結果通常包含:
- Term名稱
- P值/q值
- 富集因子
- 基因數量
熱圖通過顏色和大小雙重編碼可同時展示:
- 顯著性水平(-log10(p-value))
- 富集程度(基因比例)
- 術語間層次關系
## 2. 數據準備與預處理
### 2.1 示例數據加載
```r
# 模擬GO富集結果
go_terms <- data.frame(
ID = c("GO:0008152", "GO:0009987", "GO:0002376",
"GO:0006955", "GO:0006950"),
Description = c("metabolic process", "cellular process",
"immune system", "immune response",
"response to stress"),
GeneRatio = c(120/1000, 85/1000, 45/1000, 30/1000, 25/1000),
BgRatio = c(500/10000, 600/10000, 200/10000, 150/10000, 100/10000),
pvalue = c(1e-12, 1e-8, 1e-5, 0.001, 0.01),
p.adjust = c(1e-10, 1e-6, 1e-4, 0.0005, 0.005),
qvalue = c(1e-10, 1e-6, 1e-4, 0.0004, 0.004),
Count = c(120, 85, 45, 30, 25),
Category = c("BP", "BP", "BP", "BP", "BP")
)
library(dplyr)
plot_data <- go_terms %>%
mutate(
log_p = -log10(pvalue), # 轉換p值
GeneRatio_num = sapply(strsplit(as.character(GeneRatio), "/"),
function(x) as.numeric(x[1])/as.numeric(x[2])),
Description = factor(Description, levels = rev(unique(Description)))
library(ggplot2)
ggplot(plot_data, aes(x = Category, y = Description)) +
geom_tile(aes(fill = log_p), color = "white") +
scale_fill_gradient(low = "blue", high = "red") +
theme_minimal()
geom_tile()
: 創建熱圖矩陣aes(fill)
: 顏色映射變量color
: 格子邊框顏色scale_fill_gradient()
: 連續顏色標度ggplot(plot_data, aes(x = Category, y = Description)) +
geom_point(aes(size = Count, color = log_p)) +
scale_color_gradientn(colors = c("blue", "yellow", "red")) +
scale_size(range = c(3, 10)) +
theme_bw(base_size = 12) +
labs(x = "", y = "",
color = "-log10(p-value)",
size = "Gene Count")
# 當有多個比較組時
plot_data$Group <- rep(c("Treatment", "Control"), each = 3)[1:5]
ggplot(plot_data, aes(x = Group, y = Description)) +
geom_tile(aes(fill = log_p)) +
facet_grid(. ~ Category, scales = "free") +
scale_fill_viridis_c(option = "magma")
ggplot(plot_data, aes(x = Category, y = Description)) +
geom_tile(aes(fill = log_p), alpha = 0.8) +
geom_text(aes(label = sprintf("%.1f", log_p)),
color = "white", size = 3) +
scale_fill_distiller(palette = "Spectral") +
theme(
axis.text.x = element_text(angle = 45, hjust = 1),
plot.title = element_text(hjust = 0.5))
場景 | 推薦配色 |
---|---|
單連續變量 | viridis, magma, inferno |
發散型數據 | RdBu, PiYG, PRGn |
分類數據 | Set1, Paired, Dark2 |
my_palette <- colorRampPalette(c("#2E86AB", "#F24236"))(10)
ggplot(plot_data) +
geom_tile(aes(x = Category, y = Description, fill = log_p)) +
scale_fill_gradientn(colors = my_palette)
ggsave("GO_heatmap.pdf",
width = 10, height = 6,
dpi = 300, device = cairo_pdf)
# 高分辨率TIFF格式
ggsave("GO_heatmap.tiff",
compression = "lzw",
units = "in", width = 8, height = 5)
library(ggplot2)
library(dplyr)
# 數據準備
data <- clusterProfiler::enrichGO(...) %>%
as.data.frame() %>%
filter(p.adjust < 0.05) %>%
arrange(pvalue) %>%
head(20) %>%
mutate(
log_p = -log10(p.adjust),
Description = stringr::str_wrap(Description, width = 40))
# 高級熱圖
ggplot(data, aes(x = GeneRatio_num, y = reorder(Description, log_p))) +
geom_point(aes(size = Count, color = log_p)) +
scale_color_gradientn(
colors = rev(RColorBrewer::brewer.pal(11, "Spectral")),
limits = c(0, max(data$log_p))) +
scale_size_continuous(range = c(3, 8)) +
facet_grid(ONTOLOGY ~ ., scales = "free", space = "free") +
labs(x = "Gene Ratio", y = "",
color = "-log10(adj.p)",
size = "Gene Count",
title = "GO Enrichment Analysis") +
theme_classic(base_size = 12) +
theme(
strip.background = element_rect(fill = "grey90"),
panel.spacing = unit(0.2, "lines"),
axis.text.y = element_text(lineheight = 0.8))
plot_data %>%
mutate(Description = stringr::str_wrap(Description, width = 30)) %>%
ggplot(aes(...)) + ...
theme(axis.text.y = element_text(size = 8))
scale_y_discrete(labels = function(x) substr(x, 1, 20))
scale_fill_gradient(na.value = "gray90")
bind_rows(
mutate(go_data, Type = "GO"),
mutate(kegg_data, Type = "KEGG")) %>%
ggplot(aes(x = Type, ...)) + ...
## 參考文獻
1. Wickham H. ggplot2: Elegant Graphics for Data Analysis. Springer, 2016.
2. Yu G. et al. clusterProfiler: an R package for comparing biological themes. Bioinformatics, 2012.
3. RStudio ggplot2 Cheat Sheet
免責聲明:本站發布的內容(圖片、視頻和文字)以原創、轉載和分享為主,文章觀點不代表本網站立場,如果涉及侵權請聯系站長郵箱:is@yisu.com進行舉報,并提供相關證據,一經查實,將立刻刪除涉嫌侵權內容。