数据的操作语言DML(Data Manipulation Language)

1 LOAD(加载数据)

  • 加载文件到表中(Loading files into tables)
    -下面是官网上为我们列出的语法:
LOAD DATA [LOCAL] INPATH 'filepath' [OVERWRITE] INTO TABLE tablename [PARTITION (partcol1=val1, partcol2=val2 ...)]
  1. 加载数据到表中时,Hive不做任何转换。加载操作只是把数据拷贝或移动操作,即移动数据文件到Hive表相应的位置。
  2. 加载的目标可以是一个表,也可以是一个分区。如果表是分区的,则必须通过指定所有分区列的值来指定一个表的分区。
  3. filepath可以是一个文件,也可以是一个目录。不管什么情况下,filepath被认为是一个文件集合。

LOCAL:表示输入文件在本地文件系统(Linux),如果没有加LOCAL,hive则会去HDFS上查找该文件。
OVERWRITE:表示如果表中有数据,则先删除数据,再插入新数据,如果没有这个关键词,则直接附加数据到表中。
PARTITION:如果表中存在分区,可以按照分区进行导入。

1.创建一张员工表 :

hive> CREATE TABLE emp (
    > empno int,
    > ename string,
    > job string,
    > mgr int,
    > hiredate string,
    > salary double,
    > comm double,
    > deptno int
    > ) ROW FORMAT DELIMITED FIELDS TERMINATED BY "\n";
OK
Time taken: 0.54 seconds

2.把本地文件系统中emp.txt导入:

LOAD DATA LOCAL INPATH '/home/hadoop/emp.txt' OVERWRITE INTO TABLE emp;

hive> select * from emp;
OK
7369    SMITH   CLERK   7902    1980/12/17      800.0   NULL    20
7499    ALLEN   SALESMAN        7698    1981/2/20       1600.0  300.0   30
7521    WARD    SALESMAN        7698    1981/2/22       1250.0  500.0   30
7566    JONES   MANAGER 7839    1981/4/2        2975.0  NULL    20
7654    MARTIN  SALESMAN        7698    1981/9/28       1250.0  1400.0  30
7698    BLAKE   MANAGER 7839    1981/5/1        2850.0  NULL    30
7782    CLARK   MANAGER 7839    1981/6/9        2450.0  NULL    10
7788    SCOTT   ANALYST 7566    1987/4/19       3000.0  NULL    20
7839    KING    PRESIDENT       NULL    1981/11/17      5000.0  NULL    10
7844    TURNER  SALESMAN        7698    1981/9/8        1500.0  0.0     30
7876    ADAMS   CLERK   7788    1987/5/23       1100.0  NULL    20
7900    JAMES   CLERK   7698    1981/12/3       950.0   NULL    30
7902    FORD    ANALYST 7566    1981/12/3       3000.0  NULL    20
7934    MILLER  CLERK   7782    1982/1/23       1300.0  NULL    10
Time taken: 0.938 seconds, Fetched: 14 row(s)
  • 现在emp表中有14条数据,不使用OVERWRITE关键字
hive> select * from emp;
OK
7369    SMITH   CLERK   7902    1980/12/17      800.0   NULL    20
7499    ALLEN   SALESMAN        7698    1981/2/20       1600.0  300.0   30
7521    WARD    SALESMAN        7698    1981/2/22       1250.0  500.0   30
7566    JONES   MANAGER 7839    1981/4/2        2975.0  NULL    20
7654    MARTIN  SALESMAN        7698    1981/9/28       1250.0  1400.0  30
7698    BLAKE   MANAGER 7839    1981/5/1        2850.0  NULL    30
7782    CLARK   MANAGER 7839    1981/6/9        2450.0  NULL    10
7788    SCOTT   ANALYST 7566    1987/4/19       3000.0  NULL    20
7839    KING    PRESIDENT       NULL    1981/11/17      5000.0  NULL    10
7844    TURNER  SALESMAN        7698    1981/9/8        1500.0  0.0     30
7876    ADAMS   CLERK   7788    1987/5/23       1100.0  NULL    20
7900    JAMES   CLERK   7698    1981/12/3       950.0   NULL    30
7902    FORD    ANALYST 7566    1981/12/3       3000.0  NULL    20
7934    MILLER  CLERK   7782    1982/1/23       1300.0  NULL    10
7369    SMITH   CLERK   7902    1980/12/17      800.0   NULL    20
7499    ALLEN   SALESMAN        7698    1981/2/20       1600.0  300.0   30
7521    WARD    SALESMAN        7698    1981/2/22       1250.0  500.0   30
7566    JONES   MANAGER 7839    1981/4/2        2975.0  NULL    20
7654    MARTIN  SALESMAN        7698    1981/9/28       1250.0  1400.0  30
7698    BLAKE   MANAGER 7839    1981/5/1        2850.0  NULL    30
7782    CLARK   MANAGER 7839    1981/6/9        2450.0  NULL    10
7788    SCOTT   ANALYST 7566    1987/4/19       3000.0  NULL    20
7839    KING    PRESIDENT       NULL    1981/11/17      5000.0  NULL    10
7844    TURNER  SALESMAN        7698    1981/9/8        1500.0  0.0     30
7876    ADAMS   CLERK   7788    1987/5/23       1100.0  NULL    20
7900    JAMES   CLERK   7698    1981/12/3       950.0   NULL    30
7902    FORD    ANALYST 7566    1981/12/3       3000.0  NULL    20
7934    MILLER  CLERK   7782    1982/1/23       1300.0  NULL    10
Time taken: 0.259 seconds, Fetched: 28 row(s)

查询出28条数据
- 现在emp表中有28条数据,使用OVERWRITE关键字

hive> select * from emp;
OK
7369    SMITH   CLERK   7902    1980/12/17      800.0   NULL    20
7499    ALLEN   SALESMAN        7698    1981/2/20       1600.0  300.0   30
7521    WARD    SALESMAN        7698    1981/2/22       1250.0  500.0   30
7566    JONES   MANAGER 7839    1981/4/2        2975.0  NULL    20
7654    MARTIN  SALESMAN        7698    1981/9/28       1250.0  1400.0  30
7698    BLAKE   MANAGER 7839    1981/5/1        2850.0  NULL    30
7782    CLARK   MANAGER 7839    1981/6/9        2450.0  NULL    10
7788    SCOTT   ANALYST 7566    1987/4/19       3000.0  NULL    20
7839    KING    PRESIDENT       NULL    1981/11/17      5000.0  NULL    10
7844    TURNER  SALESMAN        7698    1981/9/8        1500.0  0.0     30
7876    ADAMS   CLERK   7788    1987/5/23       1100.0  NULL    20
7900    JAMES   CLERK   7698    1981/12/3       950.0   NULL    30
7902    FORD    ANALYST 7566    1981/12/3       3000.0  NULL    20
7934    MILLER  CLERK   7782    1982/1/23       1300.0  NULL    10
Time taken: 0.104 seconds, Fetched: 14 row(s)

查询出14条数据

  • 使用分区加载数据(PARTITION)
hive> load data local inpath '/home/hadoop/dept.txt' into table dept partition (dt='2018-09-09');
Loading data to table default.dept partition (dt=2018-09-09)
Partition default.dept{dt=2018-09-09} stats: [numFiles=1, totalSize=84]
OK
Time taken: 10.631 seconds
hive> select * form dept;
FAILED: ParseException line 1:9 missing EOF at 'form' near '*'
hive> select * from dept;
OK
10      ACCOUNTING      NEW YORK        2018-08-08
20      RESEARCH        DALLAS  2018-08-08
30      SALES           CHICAGO 2018-08-08
40      OPERATIONS      BOSTON  2018-08-08
10      ACCOUNTING      NEW YORK        2018-09-09
20      RESEARCH        DALLAS  2018-09-09
30      SALES           CHICAGO 2018-09-09
40      OPERATIONS      BOSTON  2018-09-09
Time taken: 1.385 seconds, Fetched: 8 row(s)

2 Inserting into (插入数据)

-下面是官网给出的语法

Standard syntax:
INSERT OVERWRITE TABLE tablename1 [PARTITION (partcol1=val1, partcol2=val2 ...) [IF NOT EXISTS]] select_statement1 FROM from_statement;

INSERT INTO TABLE tablename1 [PARTITION (partcol1=val1, partcol2=val2 ...)] select_statement1 FROM from_statement;

Hive extension (multiple inserts):
FROM from_statement
INSERT OVERWRITE TABLE tablename1 [PARTITION (partcol1=val1, partcol2=val2 ...) [IF NOT EXISTS]] select_statement1
[INSERT OVERWRITE TABLE tablename2 [PARTITION ... [IF NOT EXISTS]] select_statement2]
[INSERT INTO TABLE tablename2 [PARTITION ...] select_statement2] ...;

官网又给我们列出一大堆语法,看着就很可怕,但是仔细整理后再来看看你会发现并没有什么,下面对其进行分析:
1. 标准语法(Standard syntax):INSERT OVERWRITE TABLE tablename1 select_statement1 FROM from_statement; 其实就是一个简单的插入语句。
2. 可以使用PARTITION 关键字,进行分区插入。
3. OVERWRITE是否选择覆盖。
4. 使用插入语法会跑mr作业。
5. multiple inserts:代表多行插入。
注:这里有两种插语法,也就是加上OVERWRITE关键字和不加的区别。
- 向emp1表中插入emp表中的数据

hive> insert overwrite table emp1 select * from emp;
Query ID = hadoop_20180109081212_d62e58f3-946c-465e-999d-2ddf0d76d807
Total jobs = 3
Launching Job 1 out of 3

hive> select * from emp1;
OK
7369    SMITH   CLERK   7902    1980/12/17      800.0   NULL    20
7499    ALLEN   SALESMAN        7698    1981/2/20       1600.0  300.0   30
7521    WARD    SALESMAN        7698    1981/2/22       1250.0  500.0   30
7566    JONES   MANAGER 7839    1981/4/2        2975.0  NULL    20
7654    MARTIN  SALESMAN        7698    1981/9/28       1250.0  1400.0  30
7698    BLAKE   MANAGER 7839    1981/5/1        2850.0  NULL    30
7782    CLARK   MANAGER 7839    1981/6/9        2450.0  NULL    10
7788    SCOTT   ANALYST 7566    1987/4/19       3000.0  NULL    20
7839    KING    PRESIDENT       NULL    1981/11/17      5000.0  NULL    10
7844    TURNER  SALESMAN        7698    1981/9/8        1500.0  0.0     30
7876    ADAMS   CLERK   7788    1987/5/23       1100.0  NULL    20
7900    JAMES   CLERK   7698    1981/12/3       950.0   NULL    30
7902    FORD    ANALYST 7566    1981/12/3       3000.0  NULL    20
7934    MILLER  CLERK   7782    1982/1/23       1300.0  NULL    10
Time taken: 0.211 seconds, Fetched: 14 row(s)
  • 按字段插入数据(这里有个坑,大家要注意了!!!)
这里我把job,ename,依然可以插入数据,是不是很神奇。。。
hive> insert overwrite table emp2 select empno,job,ename,mgr,hiredate,salary,comm,deptno from emp;

这是我又向emp2表中插入了数据:
hive> insert into table emp2 select * from emp;
hive> select * from emp2;
OK
7369    CLERK   SMITH   7902    1980/12/17      800.0   NULL    20
7499    SALESMAN        ALLEN   7698    1981/2/20       1600.0  300.0   30
7521    SALESMAN        WARD    7698    1981/2/22       1250.0  500.0   30
7566    MANAGER JONES   7839    1981/4/2        2975.0  NULL    20
7654    SALESMAN        MARTIN  7698    1981/9/28       1250.0  1400.0  30
7698    MANAGER BLAKE   7839    1981/5/1        2850.0  NULL    30
7782    MANAGER CLARK   7839    1981/6/9        2450.0  NULL    10
7788    ANALYST SCOTT   7566    1987/4/19       3000.0  NULL    20
7839    PRESIDENT       KING    NULL    1981/11/17      5000.0  NULL    10
7844    SALESMAN        TURNER  7698    1981/9/8        1500.0  0.0     30
7876    CLERK   ADAMS   7788    1987/5/23       1100.0  NULL    20
7900    CLERK   JAMES   7698    1981/12/3       950.0   NULL    30
7902    ANALYST FORD    7566    1981/12/3       3000.0  NULL    20
7934    CLERK   MILLER  7782    1982/1/23       1300.0  NULL    10
7369    SMITH   CLERK   7902    1980/12/17      800.0   NULL    20
7499    ALLEN   SALESMAN        7698    1981/2/20       1600.0  300.0   30
7521    WARD    SALESMAN        7698    1981/2/22       1250.0  500.0   30
7566    JONES   MANAGER 7839    1981/4/2        2975.0  NULL    20
7654    MARTIN  SALESMAN        7698    1981/9/28       1250.0  1400.0  30
7698    BLAKE   MANAGER 7839    1981/5/1        2850.0  NULL    30
7782    CLARK   MANAGER 7839    1981/6/9        2450.0  NULL    10
7788    SCOTT   ANALYST 7566    1987/4/19       3000.0  NULL    20
7839    KING    PRESIDENT       NULL    1981/11/17      5000.0  NULL    10
7844    TURNER  SALESMAN        7698    1981/9/8        1500.0  0.0     30
7876    ADAMS   CLERK   7788    1987/5/23       1100.0  NULL    20
7900    JAMES   CLERK   7698    1981/12/3       950.0   NULL    30
7902    FORD    ANALYST 7566    1981/12/3       3000.0  NULL    20
7934    MILLER  CLERK   7782    1982/1/23       1300.0  NULL    10
Time taken: 2.363 seconds, Fetched: 28 row(s)

查询结果前14条记录job,ename是反的。。。。。
  • Inserting values into tables(手动插入一条或多条记录,会跑mr作业)
  • 官方语法
INSERT INTO TABLE tablename [PARTITION (partcol1[=val1], partcol2[=val2] ...)] VALUES values_row [, values_row ...]
hive>  create table stu(
    > id int,
    > name string
    > )
    > ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t';
OK
Time taken: 0.405 seconds
hive> select * from stu;
OK

hive> insert into table stu values(1,'zhangsan'),(2,'lisi);

hive> select * from stu;
OK
1       zhangsan
2       lisi

3 数据导出(Writing data into the filesystem from queries)

Standard syntax:(标准语法)
INSERT OVERWRITE [LOCAL] DIRECTORY directory1
  [ROW FORMAT row_format] [STORED AS file_format] (Note: Only available starting with Hive 0.11.0)
  SELECT ... FROM ...

Hive extension (multiple inserts):(导出多条记录)
FROM from_statement
INSERT OVERWRITE [LOCAL] DIRECTORY directory1 select_statement1
[INSERT OVERWRITE [LOCAL] DIRECTORY directory2 select_statement2] ...

row_format
  : DELIMITED [FIELDS TERMINATED BY char [ESCAPED BY char]] [COLLECTION ITEMS TERMINATED BY char]
        [MAP KEYS TERMINATED BY char] [LINES TERMINATED BY char]
        [NULL DEFINED AS char] (Note: Only available starting with Hive 0.13)

LOCAL:加上LOCAL关键字代表导入本地系统,不加默认导入HDFS;
STORED AS:可以指定存储格式。

hive> insert overwrite local directory '/home/hadoop/data' ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' 
select * from stu;

HDFS上查看结果:
[hadoop@zydatahadoop001 data]$ pwd
/home/hadoop/data

[hadoop@zydatahadoop001 data]$ cat 000000_0 
1       zhangsan
2       lisi
  • 导出多条记录
hive> from emp
    > INSERT OVERWRITE  LOCAL DIRECTORY '/home/hadoop/tmp/hivetmp1'
    > ROW FORMAT DELIMITED FIELDS TERMINATED BY "\t"
    > select empno, ename  
    > INSERT OVERWRITE  LOCAL DIRECTORY '/home/hadoop/tmp/hivetmp2'
    > ROW FORMAT DELIMITED FIELDS TERMINATED BY "\t"
    > select ename;

查询结果
[hadoop@zydatahadoop001 tmp]$ pwd
/home/hadoop/tmp
[hadoop@zydatahadoop001 tmp]$ cat hivetmp1/000000_0 
7369    SMITH
7499    ALLEN
7521    WARD
7566    JONES
7654    MARTIN
7698    BLAKE
7782    CLARK
7788    SCOTT
7839    KING
7844    TURNER
7876    ADAMS
7900    JAMES
7902    FORD
7934    MILLER
[hadoop@zydatahadoop001 tmp]$ cat hivetmp2/000000_0 
SMITH
ALLEN
WARD
JONES
MARTIN
BLAKE
CLARK
SCOTT
KING
TURNER
ADAMS
JAMES
FORD
MILLER

4 查询操作(Select)

4.1 where

  • 查询员工表deptno=10的员工
hive> select * from emp where deptno=10;
OK
7782    CLARK   MANAGER 7839    1981/6/9        2450.0  NULL    10
7839    KING    PRESIDENT       NULL    1981/11/17      5000.0  NULL    10
7934    MILLER  CLERK   7782    1982/1/23       1300.0  NULL    10
Time taken: 1.144 seconds, Fetched: 3 row(s)
  • 查询员工编号小于等于7800的员工
hive> select * from emp where empno <= 7800;
OK
7369    SMITH   CLERK   7902    1980/12/17      800.0   NULL    20
7499    ALLEN   SALESMAN        7698    1981/2/20       1600.0  300.0   30
7521    WARD    SALESMAN        7698    1981/2/22       1250.0  500.0   30
7566    JONES   MANAGER 7839    1981/4/2        2975.0  NULL    20
7654    MARTIN  SALESMAN        7698    1981/9/28       1250.0  1400.0  30
7698    BLAKE   MANAGER 7839    1981/5/1        2850.0  NULL    30
7782    CLARK   MANAGER 7839    1981/6/9        2450.0  NULL    10
7788    SCOTT   ANALYST 7566    1987/4/19       3000.0  NULL    20
Time taken: 0.449 seconds, Fetched: 8 row(s)
  • 查询员工工资大于1000小于1500的员工
hive> select * from emp where salary between 1000 and 1500;
OK
7521    WARD    SALESMAN        7698    1981/2/22       1250.0  500.0   30
7654    MARTIN  SALESMAN        7698    1981/9/28       1250.0  1400.0  30
7844    TURNER  SALESMAN        7698    1981/9/8        1500.0  0.0     30
7876    ADAMS   CLERK   7788    1987/5/23       1100.0  NULL    20
7934    MILLER  CLERK   7782    1982/1/23       1300.0  NULL    10
Time taken: 0.178 seconds, Fetched: 5 row(s)
  • 查询前5条记录
hive> select * from emp limit 5;
OK
7369    SMITH   CLERK   7902    1980/12/17      800.0   NULL    20
7499    ALLEN   SALESMAN        7698    1981/2/20       1600.0  300.0   30
7521    WARD    SALESMAN        7698    1981/2/22       1250.0  500.0   30
7566    JONES   MANAGER 7839    1981/4/2        2975.0  NULL    20
7654    MARTIN  SALESMAN        7698    1981/9/28       1250.0  1400.0  30
Time taken: 0.47 seconds, Fetched: 5 row(s)
  • 查询编号为7566或7499的员工
hive> select * from emp where empno in(7566,7499);
OK
7499    ALLEN   SALESMAN        7698    1981/2/20       1600.0  300.0   30
7566    JONES   MANAGER 7839    1981/4/2        2975.0  NULL    20
Time taken: 0.4 seconds, Fetched: 2 row(s)
  • 查询有津贴不为空的员工
hive> select * from emp where comm is not null;
OK
7499    ALLEN   SALESMAN        7698    1981/2/20       1600.0  300.0   30
7521    WARD    SALESMAN        7698    1981/2/22       1250.0  500.0   30
7654    MARTIN  SALESMAN        7698    1981/9/28       1250.0  1400.0  30
7844    TURNER  SALESMAN        7698    1981/9/8        1500.0  0.0     30
Time taken: 0.262 seconds, Fetched: 4 row(s)