目标
- 掌握 R语言文件读取方法
学习笔记
- utils包内Date Input用法
- base包内readLines用法
- stringi包内stri_read_lines
- xlsx包内Date Input用法
- readr包内 Read a delimited file 用法
1.utils包内Date Input用法
以read.table为例。
read.table参数详细说明见http://www.360doc.com/showweb/0/0/1029326103.aspx
read.table(file, header = FALSE, sep = “”, quote = “”'",
dec = “.”, numerals = c(“allow.loss”, “warn.loss”, “no.loss”),
row.names, col.names, as.is = !stringsAsFactors,
na.strings = “NA”, colClasses = NA, nrows = -1,
skip = 0, check.names = TRUE, fill = !blank.lines.skip,
strip.white = FALSE, blank.lines.skip = TRUE,
comment.char = “#”,
allowEscapes = FALSE, flush = FALSE,
stringsAsFactors = FALSE,
fileEncoding = “”, encoding = “unknown”, text, skipNul = FALSE)
参数file
写法1:“文件名称”,如果不写路径,是会在当前路径下读取,可用getwd()获取当前路径。可用setwd(“路径”)修改当前路径。
写法2:绝对路径\文件名称,比如“D: \…\test.xlsx”。
写法2:“clipboard”,利用复制,然后读取
getwd()
setwd("....\\...")#输入想要设置的路径
在工作路径中设计一张表来测试,命名为test.xlsx 。
x1<-read.table('test.xlsx')
View(x1)
x1为
x1<-read.table(‘test.xlsx’)
Warning messages:
1: In read.table(“test.xlsx”) : line 1 appears to contain embedded nulls
2: In read.table(“test.xlsx”) :
incomplete final line found by readTableHeader on ‘test.xlsx’
报错“incomplete final line”,表示识别不到excel哪里是最后一行,我也不知道该怎么在excel里表示最后一行,所以建议不用read.table() 直接读excel。
解决办法:复制数据到txt文件里,命名为test.txt
x2<-read.table('test.txt')
print(x2)
x2为
可以看到第一行不被读取,为什么?这就要看下参数comment.char了
参数comment.char
这个参数用来识别注释字符的开始,默认值为“#”,所以我的txt里的#开头的一行被识别为注释,不会被读取。所以设置comment.char = “”,试下
x2<-read.table('test.txt',comment.char = "")
x2为
那我现在想把第一行作为表头,就要设置参数header了。
参数header
默认为false,表示第一行不作为表头。若想将第一行作为表头,可设置为TURE。
x2<-read.table('test.txt',comment.char = "",header = TRUE)
x2见下图,表头里本来为#的,无法识别,被记为X.
想要指定列名,行名,就要用到参数 row.names和col.names了
参数 row.names和col.names
以改变列名举例,
x2<-read.table('test.txt',comment.char = "",header = TRUE,
col.names=c("a","b","c"))
x2为
列名修改成功。
这里为什么会用函数c()?函数c()会将赋值结合成向量或者列表,我习惯用这个。
可以用class()查看读取后的数据类型
class(x2)
[1] “data.frame”
可见read.table() 主要用来读取表格型数据,读入后为"data.frame"类型的数据。
以上为read.table的用法研究。
在utils包下除了read.table这个,还有这些读取文件的方法,参数类似,但默认值有所区别。
read.csv(file, header = TRUE, sep = “,”, quote = “”",
dec = “.”, fill = TRUE, comment.char = “”, …)read.csv2(file, header = TRUE, sep = “;”, quote = “”",
dec = “,”, fill = TRUE, comment.char = “”, …)read.delim(file, header = TRUE, sep = “\t”, quote = “”",
dec = “.”, fill = TRUE, comment.char = “”, …)read.delim2(file, header = TRUE, sep = “\t”, quote = “”",
dec = “,”, fill = TRUE, comment.char = “”, …)
2. base包内readLines用法
readLines(con = stdin(), n = -1L, ok = TRUE, warn = TRUE,
encoding = “unknown”, skipNul = FALSE)
x3<-readLines(‘test.txt’)
x3
[1] “#\t中文\tEnglish” “1\t2\t3”
[3] “4\t5\t6” “中文\t8\t9”
[5] “13\tEnglish\t9” “13\t14\t%”
[7] “16\t17\t18” “19\t20\t21”
class(x3)
[1] “character”
对于表格型数据,readLines会把制表符识别为“\t”.
3. stringi包内stri_read_lines
stri_read_lines(con, encoding = NULL, fname = con, fallback_encoding = NULL)
首先安装stringi包
install.packages("stringi")
library(stringi)
x3<-stri_read_lines(‘test.txt’)
x3
[1] “#\t中文\tEnglish” “1\t2\t3”
[3] “4\t5\t6” “中文\t8\t9”
[5] “13\tEnglish\t9” “13\t14\t%”
[7] “16\t17\t18” “19\t20\t21”
class(x3)
[1] “character”
对于表格型数据,stri_read_lines会把制表符识别为“\t”.
4.xlsx包内Date Input用法
首先需要用install.packages()安装xlsx包,然后用library()加载包。
install.packages("xlsx")
library(xlsx)
如果电脑上没有安装Java,此时会报错
错误: package or namespace load failed for ‘xlsx’:
loadNamespace()里算’rJava’时.onLoad失败了,详细内容: 调用: fun(libname, pkgname)
错误: JAVA_HOME cannot be determined from the Registry
所以需要通过官网https://www.oracle.com/java/technologies/javase-downloads.html 安装Java.
但是报错。等我解决了这个问题再继续研究。
5. readr包内 Read a delimited file 用法
read_delim(
file,
delim = NULL,
quote = “”“,
escape_backslash = FALSE,
escape_double = TRUE,
col_names = TRUE,
col_types = NULL,col_select = NULL,
id = NULL,
locale = default_locale(),
na = c(”", “NA”),
quoted_na = TRUE,
comment = “”,
trim_ws = FALSE,
skip = 0,
n_max = Inf,
guess_max = min(1000, n_max),
name_repair = “unique”,
num_threads = readr_threads(),
progress = show_progress(),
show_col_types = should_show_types(),
skip_empty_rows = TRUE,
lazy = should_read_lazy()
)read_csv(
file,
col_names = TRUE,
col_types = NULL,
col_select = NULL,
id = NULL,
locale = default_locale(),
na = c(“”, “NA”),
quoted_na = TRUE,
quote = “”",
comment = “”,
trim_ws = TRUE,
skip = 0,
n_max = Inf,
guess_max = min(1000, n_max),
name_repair = “unique”,
num_threads = readr_threads(),
progress = show_progress(),
show_col_types = should_show_types(),
skip_empty_rows = TRUE,
lazy = should_read_lazy()
)read_csv2(
file,
col_names = TRUE,
col_types = NULL,
col_select = NULL,
id = NULL,
locale = default_locale(),
na = c(“”, “NA”),
quoted_na = TRUE,
quote = “”",
comment = “”,
trim_ws = TRUE,
skip = 0,
n_max = Inf,
guess_max = min(1000, n_max),
progress = show_progress(),
name_repair = “unique”,
num_threads = readr_threads(),
show_col_types = should_show_types(),
skip_empty_rows = TRUE,
lazy = should_read_lazy()
)read_tsv(
file,
col_names = TRUE,
col_types = NULL,
col_select = NULL,
id = NULL,
locale = default_locale(),
na = c(“”, “NA”),
quoted_na = TRUE,
quote = “”",
comment = “”,
trim_ws = TRUE,
skip = 0,
n_max = Inf,
guess_max = min(1000, n_max),
progress = show_progress(),
name_repair = “unique”,
num_threads = readr_threads(),
show_col_types = should_show_types(),
skip_empty_rows = TRUE,
lazy = should_read_lazy()
)