r语言遍历文件 r语言循环读取文件

转载

Python数据分析 2023-08-25 16:20:53

文章标签 r语言遍历文件 r语言数据当前路径 java 文章分类 R语言后端开发

目标

掌握 R语言文件读取方法

学习笔记

utils包内Date Input用法
base包内readLines用法
stringi包内stri_read_lines
xlsx包内Date Input用法
readr包内 Read a delimited file 用法

1.utils包内Date Input用法

以read.table为例。

read.table参数详细说明见http://www.360doc.com/showweb/0/0/1029326103.aspx

read.table(file, header = FALSE, sep = “”, quote = “”'",
 dec = “.”, numerals = c(“allow.loss”, “warn.loss”, “no.loss”),
 row.names, col.names, as.is = !stringsAsFactors,
 na.strings = “NA”, colClasses = NA, nrows = -1,
 skip = 0, check.names = TRUE, fill = !blank.lines.skip,
 strip.white = FALSE, blank.lines.skip = TRUE,
 comment.char = “#”,
 allowEscapes = FALSE, flush = FALSE,
 stringsAsFactors = FALSE,
 fileEncoding = “”, encoding = “unknown”, text, skipNul = FALSE)

参数file

写法1：“文件名称”,如果不写路径，是会在当前路径下读取，可用getwd()获取当前路径。可用setwd（“路径”）修改当前路径。
写法2：绝对路径\文件名称，比如“D: \…\test.xlsx”。
写法2：“clipboard”,利用复制，然后读取

getwd()
setwd("....\\...")#输入想要设置的路径

在工作路径中设计一张表来测试，命名为test.xlsx 。

r语言遍历文件 r语言循环读取文件_数据

x1<-read.table('test.xlsx')
View(x1)

x1为

r语言遍历文件 r语言循环读取文件_r语言遍历文件_02

x1<-read.table(‘test.xlsx’)
 Warning messages:
 1: In read.table(“test.xlsx”) : line 1 appears to contain embedded nulls
 2: In read.table(“test.xlsx”) :
 incomplete final line found by readTableHeader on ‘test.xlsx’

报错“incomplete final line”，表示识别不到excel哪里是最后一行，我也不知道该怎么在excel里表示最后一行，所以建议不用read.table() 直接读excel。

解决办法：复制数据到txt文件里，命名为test.txt

x2<-read.table('test.txt')
print(x2)

x2为

r语言遍历文件 r语言循环读取文件_r语言遍历文件_03

可以看到第一行不被读取，为什么？这就要看下参数comment.char了

参数comment.char

这个参数用来识别注释字符的开始，默认值为“#”，所以我的txt里的#开头的一行被识别为注释，不会被读取。所以设置comment.char = “”，试下

x2<-read.table('test.txt',comment.char = "")

x2为

r语言遍历文件 r语言循环读取文件_r语言_04

那我现在想把第一行作为表头，就要设置参数header了。

参数header

默认为false，表示第一行不作为表头。若想将第一行作为表头，可设置为TURE。

x2<-read.table('test.txt',comment.char = "",header = TRUE)

x2见下图，表头里本来为#的，无法识别，被记为X.

r语言遍历文件 r语言循环读取文件_数据_05

想要指定列名，行名，就要用到参数 row.names和col.names了

参数 row.names和col.names

以改变列名举例，

x2<-read.table('test.txt',comment.char = "",header = TRUE,
               col.names=c("a","b","c"))

x2为

r语言遍历文件 r语言循环读取文件_数据_06

列名修改成功。

这里为什么会用函数c（）？函数c（）会将赋值结合成向量或者列表，我习惯用这个。

可以用class（）查看读取后的数据类型

class(x2)
[1] “data.frame”

可见read.table() 主要用来读取表格型数据，读入后为"data.frame"类型的数据。

以上为read.table的用法研究。

在utils包下除了read.table这个，还有这些读取文件的方法，参数类似，但默认值有所区别。

read.csv(file, header = TRUE, sep = “,”, quote = “”",
 dec = “.”, fill = TRUE, comment.char = “”, …)read.csv2(file, header = TRUE, sep = “;”, quote = “”",
 dec = “,”, fill = TRUE, comment.char = “”, …)read.delim(file, header = TRUE, sep = “\t”, quote = “”",
 dec = “.”, fill = TRUE, comment.char = “”, …)read.delim2(file, header = TRUE, sep = “\t”, quote = “”",
 dec = “,”, fill = TRUE, comment.char = “”, …)

2. base包内readLines用法

readLines(con = stdin(), n = -1L, ok = TRUE, warn = TRUE,
 encoding = “unknown”, skipNul = FALSE)

x3<-readLines(‘test.txt’)
 x3
 [1] “#\t中文\tEnglish” “1\t2\t3”
 [3] “4\t5\t6” “中文\t8\t9”
 [5] “13\tEnglish\t9” “13\t14\t%”
 [7] “16\t17\t18” “19\t20\t21”
 class(x3)
 [1] “character”

对于表格型数据，readLines会把制表符识别为“\t”.

3. stringi包内stri_read_lines

stri_read_lines(con, encoding = NULL, fname = con, fallback_encoding = NULL)

首先安装stringi包

install.packages("stringi")
library(stringi)

x3<-stri_read_lines(‘test.txt’)
 x3
 [1] “#\t中文\tEnglish” “1\t2\t3”
 [3] “4\t5\t6” “中文\t8\t9”
 [5] “13\tEnglish\t9” “13\t14\t%”
 [7] “16\t17\t18” “19\t20\t21”
 class(x3)
 [1] “character”

对于表格型数据，stri_read_lines会把制表符识别为“\t”.

4.xlsx包内Date Input用法

首先需要用install.packages（）安装xlsx包，然后用library()加载包。

install.packages("xlsx")
library(xlsx)

如果电脑上没有安装Java，此时会报错

错误: package or namespace load failed for ‘xlsx’:
loadNamespace()里算’rJava’时.onLoad失败了，详细内容：调用: fun(libname, pkgname)
错误: JAVA_HOME cannot be determined from the Registry

所以需要通过官网https://www.oracle.com/java/technologies/javase-downloads.html 安装Java.

但是报错。等我解决了这个问题再继续研究。

r语言遍历文件 r语言循环读取文件_r语言_07

5. readr包内 Read a delimited file 用法

read_delim(
 file,
 delim = NULL,
 quote = “”“,
 escape_backslash = FALSE,
 escape_double = TRUE,
 col_names = TRUE,
 col_types = NULL,col_select = NULL,
 id = NULL,
 locale = default_locale(),
 na = c(”", “NA”),
 quoted_na = TRUE,
 comment = “”,
 trim_ws = FALSE,
 skip = 0,
 n_max = Inf,
 guess_max = min(1000, n_max),
 name_repair = “unique”,
 num_threads = readr_threads(),
 progress = show_progress(),
 show_col_types = should_show_types(),
 skip_empty_rows = TRUE,
 lazy = should_read_lazy()
 )read_csv(
 file,
 col_names = TRUE,
 col_types = NULL,
 col_select = NULL,
 id = NULL,
 locale = default_locale(),
 na = c(“”, “NA”),
 quoted_na = TRUE,
 quote = “”",
 comment = “”,
 trim_ws = TRUE,
 skip = 0,
 n_max = Inf,
 guess_max = min(1000, n_max),
 name_repair = “unique”,
 num_threads = readr_threads(),
 progress = show_progress(),
 show_col_types = should_show_types(),
 skip_empty_rows = TRUE,
 lazy = should_read_lazy()
 )read_csv2(
 file,
 col_names = TRUE,
 col_types = NULL,
 col_select = NULL,
 id = NULL,
 locale = default_locale(),
 na = c(“”, “NA”),
 quoted_na = TRUE,
 quote = “”",
 comment = “”,
 trim_ws = TRUE,
 skip = 0,
 n_max = Inf,
 guess_max = min(1000, n_max),
 progress = show_progress(),
 name_repair = “unique”,
 num_threads = readr_threads(),
 show_col_types = should_show_types(),
 skip_empty_rows = TRUE,
 lazy = should_read_lazy()
 )read_tsv(
 file,
 col_names = TRUE,
 col_types = NULL,
 col_select = NULL,
 id = NULL,
 locale = default_locale(),
 na = c(“”, “NA”),
 quoted_na = TRUE,
 quote = “”",
 comment = “”,
 trim_ws = TRUE,
 skip = 0,
 n_max = Inf,
 guess_max = min(1000, n_max),
 progress = show_progress(),
 name_repair = “unique”,
 num_threads = readr_threads(),
 show_col_types = should_show_types(),
 skip_empty_rows = TRUE,
 lazy = should_read_lazy()
 )

本文章为转载内容，我们尊重原作者对文章享有的著作权。如有内容错误或侵权问题，欢迎原作者联系我们进行内容更正或删除文章。

上一篇：docker 两个相同的image id docker from多个

下一篇：相关性分析R语言代码典型相关分析r语言代码

提问和评论都可以，用心的回复会被更多人看到评论

发布评论

相关文章

官方博客	全部文章	热门标签	班级博客
了解我们	网站地图	意见反馈

鸿蒙开发者社区	51CTO学堂
51CTO	软考资讯