1. 原始数据

hive> select * from word; 
OK
1	MSN  
10	QQ  
100	Gtalk  
1000	Skype

 

 

 2. 创建保存为parquet格式的数据表

 

hive> CREATE TABLE parquet_table(age INT, name STRING)STORED AS PARQUET;

 

 

3. 数据表的描述

 

hive> describe parquet_table; 
hive> describe parquet_table;                                          
OK
id                  	int                 	                    
name                	string              	                    
Time taken: 0.099 seconds, Fetched: 2 row(s)

 

 

4. 插入数据

 

hive> INSERT OVERWRITE TABLE parquet_table SELECT * FROM word;

 

 

5. 查询

hive> select * from parquet_table;
OK
1	MSN  
10	QQ  
100	Gtalk  
1000	Skype

 

6. HDFS上文件的内容(parquet二进制格式)



python读取hive数据 hive读取parquet_java


 


7.参考

https://cwiki.apache.org/confluence/display/Hive/Parquet#Parquet-HiveQLSyntax