R语言的cox回归分析 r cox回归

转载

mob64ca140bbb8b 2024-01-21 08:08:49

文章标签 R语言的cox回归分析拟合 ci 线性回归 文章分类 R语言后端开发

#简单线性回归：
 ##常用绘图：
 fit<-lm(weight~height,data=women)
 summary(fit)
 plot(women$height,women$weight,xlab="Height (in inches)",ylab="Weight (in pounds)")
 abline(fit)
 fit2<-lm(mpg~wt+I(wt^2),data=mtcars)
 summary(fit2) 
 plot(mtcars$wt,mtcars$mpg) 
 lines(mtcars$wt,fitted(fit2)) 
 
 ##最好用car包中的scatterplot（）函数，显示的内容更全面
 library(car)
 scatterplot(weight~height,data=women,spread=F,smoother.args=list(lty=2),pch=19,main="Women Age 30 -39",xlab="Height (in inches)",ylab="Weight (in pounds)")
  ###该图提供了散点图、线性拟合图和平滑拟合（loess）曲线，还展示了两个变量的箱线图。spread=F删除了残差正负均方根在平滑曲线的展开和非对称信息。smoother.args设置loess拟合曲线为虚线。pch=19设置为实心圆。
 
#多元线性回归
 ##lm函数需要数据框，这里把数据转化为数据框
 ##多元回归分析第一步总是检查变量之间的相关性：
 states<-as.data.frame(state.x77[,c("Murder","Population","Illiteracy","Income","Frost")])
 ##检查变量相关性,用car包的函数可视化比较强
 cor(states)
 library(car)
 scatterplotMatrix(states,spread=F,smoother.args=list(lty=2),main="Scatter Plot Matrix")
 
 ##线性回归
 fit<-lm(Murder~Population+Illiteracy+Income+Frost,data=states)
 summary(fit) 

 ##考虑交互项：
 ###交互项显著说明：相应变量与其中一个预测变量的关系依赖于另一个变量的水平
 fit<-lm(mpg~hp+wt+hp:wt,data=mtcars)
 ###展示交互项时可用effects包中的effect函数
 install.packages("effects")
 library(effects) 
 plot(effect("hp:wt",fit,,list(wt=c(2.2,3.2,4.2))),multiline=T) 
 
#回归诊断：
 ##可用于查看置信区间
 fit<-lm(Murder~Population+Illiteracy+Income+Frost,data=states)
 confint(fit)
 
 
 ##标准方法：可查看fit的四幅图形
 par(mfrow=c(2,2))
 plot(fit) 
 ###解决方法：二次拟合或者删除异常点  
 
 ##改进方法：
 ###（1）正态性：
 library(car)
 qqPlot(fit,labels=row.names(states),id.method="identify",simulate=T,main="Q-Q Plot") 
 ###id.method可以交互式绘图，simulat=T时，95%置信区间会用参数自助法生成。

 ###（2）误差的独立性：car包中的函数可做Duibin-Watson检验
 library(car)
 durbinWatsonTest(fit)

 ###（3）线性:通过成分残差图，查看自变量与因变量是否呈现非线性关系
 library(car)
 crPlots(fit) 

 ###（4）同方差性：
 library(car)
 ncvTest(fit) 
 spreadLevelPlot(fit) 
 ###若代码建议幂次转换为0.5，则用sqrt（y）代替y，如果幂次转换建议为0，则使用对数变换，如果接近1，说明异方差性不是很明显，不需要做变换。
 
 ###（5）线性模型假设的综合验证
 ####多重共线性:会导致模型参数的置信区间过大，使单个系数解释起来很困难
 library(car)
 vif(fit) 
 sqrt(vif(fit))>2 
 ####一般情况下，sqrt（vif）大于2就说明存在多重共线性问题。 
 
 ####异常观测点:
 #####离群点：
 library(car)
 outlierTest(fit) 
 #####高杠杠值点:即与其他预测变量有关的离群点。可通过帽子统计量判断。
 hat.plot<-function(fit){
   p<-length(coefficients(fit))
   n<-length(fitted(fit))
   plot(hatvalues(fit),main="Index Plot of Hat Value")
   abline(h=c(2,3)*p/n,col="red",lty=2)
   identify(1:n,hatvalues(fit),names(hatvalues(fit)))
 }
 hat.plot(fit) 
 ####强影响点：对模型参数估计值影响有些比例失衡的点。用Cook距离，即D统计量。
 cutoff<-4/(nrow(states)-length(fit$coefficients)-2)
 plot(fit,which=4,cook.levels=cutoff) 
 abline(h=cutoff,lty=2,col="red") 
 #####这部分没怎么看懂
 
#改进措施：
 ##删除观测点
 ##变量变换：
  ###当模型违反正态假设时，可以对响应变量尝试某种变换。
  library(car)
  summary(powerTransform(states$Murder))
  ###违背线性假设时，对预测变量进行变换常常比较有用。
  library(car)
  boxTidwell(Murder~Population+Illiteracy,data=states)
 ##增删变量
  
#选择最佳的回归模型：
 ##模型比较
 fit1<-lm(Murder~Population+Illiteracy,data="states")
 fit2<-lm(Murder~Population+Illiteracy+Income+Frost,data="states")
 anova(fit1,fit2)
 ##也可以用AIC原则
 AIC(fit1,fit2)
 
 ##逐步回归：
 library(MASS)
 stepAIC(fit,direction = "backward")
 ##全子集回归可以用leaps包中的regsubsets（）函数实现
 
#深层次分析：交叉验证和相对重要性（以后再研究）

本文章为转载内容，我们尊重原作者对文章享有的著作权。如有内容错误或侵权问题，欢迎原作者联系我们进行内容更正或删除文章。