数据的操作语言DML(Data Manipulation Language)
1 LOAD(加载数据)
- 加载文件到表中(Loading files into tables)
-下面是官网上为我们列出的语法:
LOAD DATA [LOCAL] INPATH 'filepath' [OVERWRITE] INTO TABLE tablename [PARTITION (partcol1=val1, partcol2=val2 ...)]
- 加载数据到表中时,Hive不做任何转换。加载操作只是把数据拷贝或移动操作,即移动数据文件到Hive表相应的位置。
- 加载的目标可以是一个表,也可以是一个分区。如果表是分区的,则必须通过指定所有分区列的值来指定一个表的分区。
- filepath可以是一个文件,也可以是一个目录。不管什么情况下,filepath被认为是一个文件集合。
LOCAL:表示输入文件在本地文件系统(Linux),如果没有加LOCAL,hive则会去HDFS上查找该文件。
OVERWRITE:表示如果表中有数据,则先删除数据,再插入新数据,如果没有这个关键词,则直接附加数据到表中。
PARTITION:如果表中存在分区,可以按照分区进行导入。
1.创建一张员工表 :
hive> CREATE TABLE emp (
> empno int,
> ename string,
> job string,
> mgr int,
> hiredate string,
> salary double,
> comm double,
> deptno int
> ) ROW FORMAT DELIMITED FIELDS TERMINATED BY "\n";
OK
Time taken: 0.54 seconds
2.把本地文件系统中emp.txt导入:
LOAD DATA LOCAL INPATH '/home/hadoop/emp.txt' OVERWRITE INTO TABLE emp;
hive> select * from emp;
OK
7369 SMITH CLERK 7902 1980/12/17 800.0 NULL 20
7499 ALLEN SALESMAN 7698 1981/2/20 1600.0 300.0 30
7521 WARD SALESMAN 7698 1981/2/22 1250.0 500.0 30
7566 JONES MANAGER 7839 1981/4/2 2975.0 NULL 20
7654 MARTIN SALESMAN 7698 1981/9/28 1250.0 1400.0 30
7698 BLAKE MANAGER 7839 1981/5/1 2850.0 NULL 30
7782 CLARK MANAGER 7839 1981/6/9 2450.0 NULL 10
7788 SCOTT ANALYST 7566 1987/4/19 3000.0 NULL 20
7839 KING PRESIDENT NULL 1981/11/17 5000.0 NULL 10
7844 TURNER SALESMAN 7698 1981/9/8 1500.0 0.0 30
7876 ADAMS CLERK 7788 1987/5/23 1100.0 NULL 20
7900 JAMES CLERK 7698 1981/12/3 950.0 NULL 30
7902 FORD ANALYST 7566 1981/12/3 3000.0 NULL 20
7934 MILLER CLERK 7782 1982/1/23 1300.0 NULL 10
Time taken: 0.938 seconds, Fetched: 14 row(s)
- 现在emp表中有14条数据,不使用OVERWRITE关键字
hive> select * from emp;
OK
7369 SMITH CLERK 7902 1980/12/17 800.0 NULL 20
7499 ALLEN SALESMAN 7698 1981/2/20 1600.0 300.0 30
7521 WARD SALESMAN 7698 1981/2/22 1250.0 500.0 30
7566 JONES MANAGER 7839 1981/4/2 2975.0 NULL 20
7654 MARTIN SALESMAN 7698 1981/9/28 1250.0 1400.0 30
7698 BLAKE MANAGER 7839 1981/5/1 2850.0 NULL 30
7782 CLARK MANAGER 7839 1981/6/9 2450.0 NULL 10
7788 SCOTT ANALYST 7566 1987/4/19 3000.0 NULL 20
7839 KING PRESIDENT NULL 1981/11/17 5000.0 NULL 10
7844 TURNER SALESMAN 7698 1981/9/8 1500.0 0.0 30
7876 ADAMS CLERK 7788 1987/5/23 1100.0 NULL 20
7900 JAMES CLERK 7698 1981/12/3 950.0 NULL 30
7902 FORD ANALYST 7566 1981/12/3 3000.0 NULL 20
7934 MILLER CLERK 7782 1982/1/23 1300.0 NULL 10
7369 SMITH CLERK 7902 1980/12/17 800.0 NULL 20
7499 ALLEN SALESMAN 7698 1981/2/20 1600.0 300.0 30
7521 WARD SALESMAN 7698 1981/2/22 1250.0 500.0 30
7566 JONES MANAGER 7839 1981/4/2 2975.0 NULL 20
7654 MARTIN SALESMAN 7698 1981/9/28 1250.0 1400.0 30
7698 BLAKE MANAGER 7839 1981/5/1 2850.0 NULL 30
7782 CLARK MANAGER 7839 1981/6/9 2450.0 NULL 10
7788 SCOTT ANALYST 7566 1987/4/19 3000.0 NULL 20
7839 KING PRESIDENT NULL 1981/11/17 5000.0 NULL 10
7844 TURNER SALESMAN 7698 1981/9/8 1500.0 0.0 30
7876 ADAMS CLERK 7788 1987/5/23 1100.0 NULL 20
7900 JAMES CLERK 7698 1981/12/3 950.0 NULL 30
7902 FORD ANALYST 7566 1981/12/3 3000.0 NULL 20
7934 MILLER CLERK 7782 1982/1/23 1300.0 NULL 10
Time taken: 0.259 seconds, Fetched: 28 row(s)
查询出28条数据:
- 现在emp表中有28条数据,使用OVERWRITE关键字
hive> select * from emp;
OK
7369 SMITH CLERK 7902 1980/12/17 800.0 NULL 20
7499 ALLEN SALESMAN 7698 1981/2/20 1600.0 300.0 30
7521 WARD SALESMAN 7698 1981/2/22 1250.0 500.0 30
7566 JONES MANAGER 7839 1981/4/2 2975.0 NULL 20
7654 MARTIN SALESMAN 7698 1981/9/28 1250.0 1400.0 30
7698 BLAKE MANAGER 7839 1981/5/1 2850.0 NULL 30
7782 CLARK MANAGER 7839 1981/6/9 2450.0 NULL 10
7788 SCOTT ANALYST 7566 1987/4/19 3000.0 NULL 20
7839 KING PRESIDENT NULL 1981/11/17 5000.0 NULL 10
7844 TURNER SALESMAN 7698 1981/9/8 1500.0 0.0 30
7876 ADAMS CLERK 7788 1987/5/23 1100.0 NULL 20
7900 JAMES CLERK 7698 1981/12/3 950.0 NULL 30
7902 FORD ANALYST 7566 1981/12/3 3000.0 NULL 20
7934 MILLER CLERK 7782 1982/1/23 1300.0 NULL 10
Time taken: 0.104 seconds, Fetched: 14 row(s)
查询出14条数据
- 使用分区加载数据(PARTITION)
hive> load data local inpath '/home/hadoop/dept.txt' into table dept partition (dt='2018-09-09');
Loading data to table default.dept partition (dt=2018-09-09)
Partition default.dept{dt=2018-09-09} stats: [numFiles=1, totalSize=84]
OK
Time taken: 10.631 seconds
hive> select * form dept;
FAILED: ParseException line 1:9 missing EOF at 'form' near '*'
hive> select * from dept;
OK
10 ACCOUNTING NEW YORK 2018-08-08
20 RESEARCH DALLAS 2018-08-08
30 SALES CHICAGO 2018-08-08
40 OPERATIONS BOSTON 2018-08-08
10 ACCOUNTING NEW YORK 2018-09-09
20 RESEARCH DALLAS 2018-09-09
30 SALES CHICAGO 2018-09-09
40 OPERATIONS BOSTON 2018-09-09
Time taken: 1.385 seconds, Fetched: 8 row(s)
2 Inserting into (插入数据)
-下面是官网给出的语法
Standard syntax:
INSERT OVERWRITE TABLE tablename1 [PARTITION (partcol1=val1, partcol2=val2 ...) [IF NOT EXISTS]] select_statement1 FROM from_statement;
INSERT INTO TABLE tablename1 [PARTITION (partcol1=val1, partcol2=val2 ...)] select_statement1 FROM from_statement;
Hive extension (multiple inserts):
FROM from_statement
INSERT OVERWRITE TABLE tablename1 [PARTITION (partcol1=val1, partcol2=val2 ...) [IF NOT EXISTS]] select_statement1
[INSERT OVERWRITE TABLE tablename2 [PARTITION ... [IF NOT EXISTS]] select_statement2]
[INSERT INTO TABLE tablename2 [PARTITION ...] select_statement2] ...;
官网又给我们列出一大堆语法,看着就很可怕,但是仔细整理后再来看看你会发现并没有什么,下面对其进行分析:
1. 标准语法(Standard syntax):INSERT OVERWRITE TABLE tablename1 select_statement1 FROM from_statement; 其实就是一个简单的插入语句。
2. 可以使用PARTITION 关键字,进行分区插入。
3. OVERWRITE是否选择覆盖。
4. 使用插入语法会跑mr作业。
5. multiple inserts:代表多行插入。
注:这里有两种插语法,也就是加上OVERWRITE关键字和不加的区别。
- 向emp1表中插入emp表中的数据
hive> insert overwrite table emp1 select * from emp;
Query ID = hadoop_20180109081212_d62e58f3-946c-465e-999d-2ddf0d76d807
Total jobs = 3
Launching Job 1 out of 3
hive> select * from emp1;
OK
7369 SMITH CLERK 7902 1980/12/17 800.0 NULL 20
7499 ALLEN SALESMAN 7698 1981/2/20 1600.0 300.0 30
7521 WARD SALESMAN 7698 1981/2/22 1250.0 500.0 30
7566 JONES MANAGER 7839 1981/4/2 2975.0 NULL 20
7654 MARTIN SALESMAN 7698 1981/9/28 1250.0 1400.0 30
7698 BLAKE MANAGER 7839 1981/5/1 2850.0 NULL 30
7782 CLARK MANAGER 7839 1981/6/9 2450.0 NULL 10
7788 SCOTT ANALYST 7566 1987/4/19 3000.0 NULL 20
7839 KING PRESIDENT NULL 1981/11/17 5000.0 NULL 10
7844 TURNER SALESMAN 7698 1981/9/8 1500.0 0.0 30
7876 ADAMS CLERK 7788 1987/5/23 1100.0 NULL 20
7900 JAMES CLERK 7698 1981/12/3 950.0 NULL 30
7902 FORD ANALYST 7566 1981/12/3 3000.0 NULL 20
7934 MILLER CLERK 7782 1982/1/23 1300.0 NULL 10
Time taken: 0.211 seconds, Fetched: 14 row(s)
- 按字段插入数据(这里有个坑,大家要注意了!!!)
这里我把job,ename,依然可以插入数据,是不是很神奇。。。
hive> insert overwrite table emp2 select empno,job,ename,mgr,hiredate,salary,comm,deptno from emp;
这是我又向emp2表中插入了数据:
hive> insert into table emp2 select * from emp;
hive> select * from emp2;
OK
7369 CLERK SMITH 7902 1980/12/17 800.0 NULL 20
7499 SALESMAN ALLEN 7698 1981/2/20 1600.0 300.0 30
7521 SALESMAN WARD 7698 1981/2/22 1250.0 500.0 30
7566 MANAGER JONES 7839 1981/4/2 2975.0 NULL 20
7654 SALESMAN MARTIN 7698 1981/9/28 1250.0 1400.0 30
7698 MANAGER BLAKE 7839 1981/5/1 2850.0 NULL 30
7782 MANAGER CLARK 7839 1981/6/9 2450.0 NULL 10
7788 ANALYST SCOTT 7566 1987/4/19 3000.0 NULL 20
7839 PRESIDENT KING NULL 1981/11/17 5000.0 NULL 10
7844 SALESMAN TURNER 7698 1981/9/8 1500.0 0.0 30
7876 CLERK ADAMS 7788 1987/5/23 1100.0 NULL 20
7900 CLERK JAMES 7698 1981/12/3 950.0 NULL 30
7902 ANALYST FORD 7566 1981/12/3 3000.0 NULL 20
7934 CLERK MILLER 7782 1982/1/23 1300.0 NULL 10
7369 SMITH CLERK 7902 1980/12/17 800.0 NULL 20
7499 ALLEN SALESMAN 7698 1981/2/20 1600.0 300.0 30
7521 WARD SALESMAN 7698 1981/2/22 1250.0 500.0 30
7566 JONES MANAGER 7839 1981/4/2 2975.0 NULL 20
7654 MARTIN SALESMAN 7698 1981/9/28 1250.0 1400.0 30
7698 BLAKE MANAGER 7839 1981/5/1 2850.0 NULL 30
7782 CLARK MANAGER 7839 1981/6/9 2450.0 NULL 10
7788 SCOTT ANALYST 7566 1987/4/19 3000.0 NULL 20
7839 KING PRESIDENT NULL 1981/11/17 5000.0 NULL 10
7844 TURNER SALESMAN 7698 1981/9/8 1500.0 0.0 30
7876 ADAMS CLERK 7788 1987/5/23 1100.0 NULL 20
7900 JAMES CLERK 7698 1981/12/3 950.0 NULL 30
7902 FORD ANALYST 7566 1981/12/3 3000.0 NULL 20
7934 MILLER CLERK 7782 1982/1/23 1300.0 NULL 10
Time taken: 2.363 seconds, Fetched: 28 row(s)
查询结果前14条记录job,ename是反的。。。。。
- Inserting values into tables(手动插入一条或多条记录,会跑mr作业)
- 官方语法
INSERT INTO TABLE tablename [PARTITION (partcol1[=val1], partcol2[=val2] ...)] VALUES values_row [, values_row ...]
hive> create table stu(
> id int,
> name string
> )
> ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t';
OK
Time taken: 0.405 seconds
hive> select * from stu;
OK
hive> insert into table stu values(1,'zhangsan'),(2,'lisi);
hive> select * from stu;
OK
1 zhangsan
2 lisi
3 数据导出(Writing data into the filesystem from queries)
Standard syntax:(标准语法)
INSERT OVERWRITE [LOCAL] DIRECTORY directory1
[ROW FORMAT row_format] [STORED AS file_format] (Note: Only available starting with Hive 0.11.0)
SELECT ... FROM ...
Hive extension (multiple inserts):(导出多条记录)
FROM from_statement
INSERT OVERWRITE [LOCAL] DIRECTORY directory1 select_statement1
[INSERT OVERWRITE [LOCAL] DIRECTORY directory2 select_statement2] ...
row_format
: DELIMITED [FIELDS TERMINATED BY char [ESCAPED BY char]] [COLLECTION ITEMS TERMINATED BY char]
[MAP KEYS TERMINATED BY char] [LINES TERMINATED BY char]
[NULL DEFINED AS char] (Note: Only available starting with Hive 0.13)
LOCAL:加上LOCAL关键字代表导入本地系统,不加默认导入HDFS;
STORED AS:可以指定存储格式。
hive> insert overwrite local directory '/home/hadoop/data' ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
select * from stu;
HDFS上查看结果:
[hadoop@zydatahadoop001 data]$ pwd
/home/hadoop/data
[hadoop@zydatahadoop001 data]$ cat 000000_0
1 zhangsan
2 lisi
- 导出多条记录
hive> from emp
> INSERT OVERWRITE LOCAL DIRECTORY '/home/hadoop/tmp/hivetmp1'
> ROW FORMAT DELIMITED FIELDS TERMINATED BY "\t"
> select empno, ename
> INSERT OVERWRITE LOCAL DIRECTORY '/home/hadoop/tmp/hivetmp2'
> ROW FORMAT DELIMITED FIELDS TERMINATED BY "\t"
> select ename;
查询结果
[hadoop@zydatahadoop001 tmp]$ pwd
/home/hadoop/tmp
[hadoop@zydatahadoop001 tmp]$ cat hivetmp1/000000_0
7369 SMITH
7499 ALLEN
7521 WARD
7566 JONES
7654 MARTIN
7698 BLAKE
7782 CLARK
7788 SCOTT
7839 KING
7844 TURNER
7876 ADAMS
7900 JAMES
7902 FORD
7934 MILLER
[hadoop@zydatahadoop001 tmp]$ cat hivetmp2/000000_0
SMITH
ALLEN
WARD
JONES
MARTIN
BLAKE
CLARK
SCOTT
KING
TURNER
ADAMS
JAMES
FORD
MILLER
4 查询操作(Select)
4.1 where
- 查询员工表deptno=10的员工
hive> select * from emp where deptno=10;
OK
7782 CLARK MANAGER 7839 1981/6/9 2450.0 NULL 10
7839 KING PRESIDENT NULL 1981/11/17 5000.0 NULL 10
7934 MILLER CLERK 7782 1982/1/23 1300.0 NULL 10
Time taken: 1.144 seconds, Fetched: 3 row(s)
- 查询员工编号小于等于7800的员工
hive> select * from emp where empno <= 7800;
OK
7369 SMITH CLERK 7902 1980/12/17 800.0 NULL 20
7499 ALLEN SALESMAN 7698 1981/2/20 1600.0 300.0 30
7521 WARD SALESMAN 7698 1981/2/22 1250.0 500.0 30
7566 JONES MANAGER 7839 1981/4/2 2975.0 NULL 20
7654 MARTIN SALESMAN 7698 1981/9/28 1250.0 1400.0 30
7698 BLAKE MANAGER 7839 1981/5/1 2850.0 NULL 30
7782 CLARK MANAGER 7839 1981/6/9 2450.0 NULL 10
7788 SCOTT ANALYST 7566 1987/4/19 3000.0 NULL 20
Time taken: 0.449 seconds, Fetched: 8 row(s)
- 查询员工工资大于1000小于1500的员工
hive> select * from emp where salary between 1000 and 1500;
OK
7521 WARD SALESMAN 7698 1981/2/22 1250.0 500.0 30
7654 MARTIN SALESMAN 7698 1981/9/28 1250.0 1400.0 30
7844 TURNER SALESMAN 7698 1981/9/8 1500.0 0.0 30
7876 ADAMS CLERK 7788 1987/5/23 1100.0 NULL 20
7934 MILLER CLERK 7782 1982/1/23 1300.0 NULL 10
Time taken: 0.178 seconds, Fetched: 5 row(s)
- 查询前5条记录
hive> select * from emp limit 5;
OK
7369 SMITH CLERK 7902 1980/12/17 800.0 NULL 20
7499 ALLEN SALESMAN 7698 1981/2/20 1600.0 300.0 30
7521 WARD SALESMAN 7698 1981/2/22 1250.0 500.0 30
7566 JONES MANAGER 7839 1981/4/2 2975.0 NULL 20
7654 MARTIN SALESMAN 7698 1981/9/28 1250.0 1400.0 30
Time taken: 0.47 seconds, Fetched: 5 row(s)
- 查询编号为7566或7499的员工
hive> select * from emp where empno in(7566,7499);
OK
7499 ALLEN SALESMAN 7698 1981/2/20 1600.0 300.0 30
7566 JONES MANAGER 7839 1981/4/2 2975.0 NULL 20
Time taken: 0.4 seconds, Fetched: 2 row(s)
- 查询有津贴不为空的员工
hive> select * from emp where comm is not null;
OK
7499 ALLEN SALESMAN 7698 1981/2/20 1600.0 300.0 30
7521 WARD SALESMAN 7698 1981/2/22 1250.0 500.0 30
7654 MARTIN SALESMAN 7698 1981/9/28 1250.0 1400.0 30
7844 TURNER SALESMAN 7698 1981/9/8 1500.0 0.0 30
Time taken: 0.262 seconds, Fetched: 4 row(s)