2022年10月11日
显著性符号的意义
Symbol | Meaning |
ns | P > 0.05 |
* | P ≤ 0.05 |
** | P ≤ 0.01 |
*** | P ≤ 0.001 |
**** | P ≤ 0.0001 |
参考:What is the meaning of * or ** or *** in reports of statistical significance from Prism or InStat?
文章里的描述:Expression of regional markers TOX3 (r5) and PRDX6 (r6) between HCO and HBSO. Wilcoxon rank sum test was applied. P-values < 0.0001 is marked by ****.
2022年10月07日
越高分的文章,对数据的严格的统计处理就越重要,存在一些经典必用的图:
- boxplot,自带四分位信息,最好加上jitter让人看到你的数据点
- violin plot,在单细胞里很火,可以直接看到数据的分布,可以叠加boxplot使用
- 线性拟合回归,lm,我们目前绝对无法handle非线性的回归
这些经典分析必须搭配显著性测试,必须在图里显示P-value,或者P-value对应的符号(*、**、***、NS)。
目前在ggplot里添加显著性的主要方法:
- ggpubr的stat_compare_means,也是最为主流的方法。
- 优点:可以应对多组比对,理论上可以无限标注分组,快速
- 缺点:数据处理整合在ggplot函数里,无法自定义处理数据
- ggpubr的第二个函数stat_pvalue_manual,适合有一个固定的ref
- 优点:可以自定义处理数据
- 缺点:没有comparisons参数,多组比较不够自如
- 自己检验,然后用geom_text手动标注
- 优点:完全自定义,随心所欲,超批量作图
- 缺点:不适合多组比对,适合两组对比
难点:
- 在数据零值过多时,所有scale都失效了,此时只能用geom_jitter来显示差异【2022-10-08】
- 两个control,HSCR paper里,必须后期用inkscape来修改,也算方便。
- facet_wrap可以做到真正的free_y,但stat_compare_means跟不上,所以得用stat_pvalue_manual(参考:Data_center/analysis/HBSO_7Ala_Elly_2022_Jul/neuronal_subset.ipynb)
- facet_grid里无法做到真正的free_y,可以用facet_wrap和cowplot的拼图来实现
- 也可以将数据scale到0-1,这样label时就可以使用固定参数(参考:EllyLab/mouse/singleCell/case/Vcl_ENCC/Vcl_ENCCs_aggregate_analysis.ipynb)
方法一的方法参考下面的文章
方法二:
- Add Manually P-values to a ggplot
- How to Add p-values onto ggplot
- T-test() does not add adjusted p-values
参考:Data_center/analysis/HBSO_7Ala_Elly_2022_Jul/Seurat_integration_all.ipynb
方法三:
现实案例代码:EllyLab/mouse/singleCell/case/Vcl_ENCC/Vcl_ENCCs_aggregate_analysis.ipynb【会封装为函数】
参考:Add P-values and Significance Levels to ggplots
ggpubr的包比较局限,能用的test也比较局限,但是做起来快速简单。
当情况特殊时ggpubr就不能用了,可以自己做了显著性test之后再显示在图上。
# show lable in facet grid plot
dat_text <- data.frame()
for (i in names(paired_list)) {
# Compute t-test
res <- t.test(value ~ group, data = paired_list[[i]], paired = TRUE)
dat_text <- rbind(dat_text, data.frame(variable=i, pvalue=res$p.value))
}
dat_text$label <- paste("P", round(dat_text$pvalue, 3), sep="=")
dat_text[dat_text$pvalue<0.05 & dat_text$pvalue>0.01,]$label <- paste("*",
dat_text[dat_text$pvalue<0.05 & dat_text$pvalue>0.01,]$label, sep=" ")
dat_text[dat_text$pvalue<0.01,]$label <- paste("**", "P<0.01", sep=" ")
library(ggplot2)
options(repr.plot.width=8, repr.plot.height=12) # 8x8
g2 <- ggplot(data=genes_expr_melt, aes(x=pseudotime, y=value, fill=group, color=group)) +
geom_point(size=0.01, alpha=0.5, aes(color=group, fill=group)) +
labs(x = "Pseudotime", y = "Relative expression", title = "Neuronal lineage") +
geom_smooth(method = 'loess',se=F,size=0.15,span = 0.7) + # ,alpha=0.05, weight=0.1,
facet_wrap( ~ variable, ncol=3, labeller = label_context, scales = "free_y") + #
geom_text(size = 5, data = dat_text, mapping = aes(x = Inf, y = Inf, label = label),
hjust = 1.05,vjust = 1.5, color=ifelse(dat_text$pvalue < 0.05,'red','black')) +
# themes
theme(strip.background = element_blank(),
panel.grid.major = element_blank(), panel.grid.minor = element_blank(),
panel.spacing=unit(.4, "lines"),panel.border = element_rect(color = "black", fill = NA, size = 0.5))+
theme(axis.text.x = element_text(face="plain", angle=0, size = 10, color = "black", vjust=0.5),
axis.text.y = element_text(face="plain", size = 10, color = "black"),
axis.title =element_text(size = 15)) +
theme(strip.background = element_rect(fill = "gray90", color = NA))+
# theme(legend.position = "none") + # must remove legend
theme(strip.placement = "outside", strip.text.x = element_text(face="plain", size = 13),
strip.text.y = element_text(face="plain", size = 11)) +
theme(strip.text.x = element_text(margin = margin(1,0,1,0, "mm"))) +
scale_color_manual(values=c("deepskyblue","red","gray50")) +
scale_fill_manual(values=c("deepskyblue","red","gray50"))
plot(g2)
# }
多组比较,挑选感兴趣的显示显著性。
data("ToothGrowth")
head(ToothGrowth)
library(ggpubr)
my_comparisons <- list( c("0.5", "1"), c("1", "2"), c("0.5", "2") )
options(repr.plot.width=4, repr.plot.height=4)
ggplot(ToothGrowth, aes(x=as.character(dose), y=len, fill=dose)) +
geom_boxplot(outlier.size=NA, size=0.01, outlier.shape = NA) +
geom_jitter(width = 0.3, size=0.01) +# , aes(color=supp) +
stat_compare_means(comparisons = my_comparisons)+ # Add pairwise comparisons p-value
stat_compare_means(label.y = 50, label.x = 1.5) # Add global p-value
还可以设定一个ref group来显示显著性差异,只需要改一下设定。
stat_compare_means(method = "anova", label.y = 1.3, label.x = 3)+ # Add pairwise comparisons p-value
# # Add global p-value
stat_compare_means(label = "p.signif", method = "t.test", ref.group = "hNP-D20", label.y = 1.1) +
生物学的强烈推荐看看Y叔的公众号里的统计相关的文章,非常的基础和实用。
统计
- Five things biologists should know about statistics
- 什么是T检验
- 富集基因之注释缺失
- 落入窠臼
- 你昨天才做的分析,可能是几年前的结果!
- 掐架的额外收获
- boxplot
- 如何告别单身
- 主成分分析
- 一文解决RT-PCR的统计分析
代码例子:
options(repr.plot.width=7, repr.plot.height=6)
# facet boxplot
bp <- ggplot(expr_data2, aes(x=group, y=expression, fill=NA)) +
geom_boxplot(outlier.size=NA, size=0.01, outlier.shape = NA) +
geom_jitter(width = 0.3, size=0.01, aes(color=cluster)) +
# + geom_boxplot( +
facet_grid( cluster ~ gene, switch="y") + # , scales = "free"
theme_bw() +
stat_compare_means(aes(group = group, label = ..p.signif..), label.x = 1.3,label.y = 1.3,
method = "wilcox.test", hide.ns = T) + # label = "p.format",
theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank()) +
labs(x = "", y = "", title = "") +
theme(panel.spacing=unit(.3, "lines"),panel.border = element_rect(color = "black", fill = NA, size = 0.2)) +
theme(axis.ticks.x = element_blank(), axis.ticks = element_line(size = 0.1),
axis.text.x = element_text(face="plain", angle=90, size = 8, color = "black", vjust=0.5),
axis.text.y = element_text(face="plain", size = 4, color = "black"),
axis.title =element_text(size = 12)) +
theme(strip.background = element_rect(fill = "gray97", color = NA))+
theme(legend.position = "none") +
theme(strip.placement = "outside", strip.text.x = element_text(face="italic", size = 11),
strip.text.y = element_text(face="plain", size = 11)) +
scale_y_continuous(position="right", limits = c(-0.5,1.5)) +
scale_fill_manual(values=brewer.pal(8,"Set2")[c(2,3,7,1,5,6)]) +
scale_color_manual(values=brewer.pal(8,"Set2")[c(2,3,7,1,5,6)])
bp