医学 Multicollinearity pearson r语言

转载

hackernew 2024-11-03 13:12:12

文章标签 r library car 用R语言做词频统计 ci ide 方差分析 文章分类 R语言后端开发

In an article called A Paradox in the Interpretation of Group Comparisons published in Psychological Bulletin, Lord (1967) made famous the following controversial story:

A university is interested in investigating the effects of the nutritional diet its students consume in the campus restaurant. Various types of data were collected including the weight of each student in the month of January and their weight in the month of June of the same year. The objective of the University is to know if the diet has greater effects on men than on women. This information is analyzed by two statisticians.

The first statistician observes that at the end of the semester (June), the average weight of the men is identical to their average weight at the beginning of the semester (January). This situation also occurs for women. The only difference is that women started the year with a lower average weight (which is obvious from their background). On average, neither men nor women gained or lost weight during the course of the semester. The first statistician concludes that there is no evidence of any significant effect of diet (or any other factor) on student weight. In particular, there is no evidence of any differential effect on both sexes, since no group shows systematic differences.

The second statistician examines the data more carefully. Note that there is a group of men and women who started the semester with the same weight. This group consisted of thin men and overweight women. He notes that those men gained weight from the average and these women lost weight with respect to the average. The second statistician concludes that by controlling for the initial weight, the university diet has a positive differential effect on men relative to women. It is evident that for men and women with the same initial weight, on average they differ since men gained more weight, and women lost more weight.

输入1：

df "df.csv",header=T)

结果1：

id

输入2：

df %>% 
  group_by(group)%>%
  summarize_at(vars(initial,final),funs(n(),mean(.,na.rm=T),sd(.,na.rm=T)))

结果2：

group

输入3：

split(df$change,df$group),shapiro.test))

结果3：

statistic

输入4：

library(car)
leveneTest(change~group,data=df)

结果4：

's Test for Homogeneity of Variance (center = median)

输入5：

change~

结果5：

Two Sample t-test

data:  change by group
t = -0.2, df = 198, p-value = 0.9
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -0.0523  0.0442
sample estimates:
mean in group Female   mean in group Male 
               0.039                0.043

输入6：

data=df)

结果6：

Call:

输入7：

p1 <- ggplot(df , aes(x=initial, y=final,color=group))+geom_point(aes(shape = group), size = 3) + geom_smooth(method = "lm", aes(fill = group), alpha = 0.1) +labs(x = "initial评分", y = "final评分")+theme_bw(base_size = 18)+theme(legend.position="top")p2 <- ggplot(df, aes(x=group, y=change,color=group,fill=group))+ geom_boxplot(color="black",alpha=0,width=0.5) +stat_boxplot(geom ='errorbar',color="black",width=0.5) +geom_dotplot(binaxis='y', stackdir='center',binwidth=0.02)+theme_bw(base_size = 18)+theme(legend.position = "none")library(cowplot)plot_grid(p1,p2,align="h")