linux常用命令
pwd
查看当前工作目录的绝对路径
cat input.txt
查看input.txt文件的内容
ls
显示当前目录下所有的文件及子目录
rm recommender-dm-1.0-SNAPSHOT-lib.jar
删除当前目录下recommender-dm-1.0-SNAPSHOT-lib.jar文件
cp /home/deploy/pctr/recommender-dm_fat.jar ./
把/home/deploy/pctr/目录下的recommender-dm_fat.jar复制到当前目录下
rm -rf 0000*
强行直接删除(不作任何提示)所有前缀为0000的文件
rm -rf
删除文件夹
rm
-r:向下递归,不管有多少级目录,一并删除
-f:直接强行删除,不做任何提示的意思
rz
从客户端向服务器上传文件(receive:服务器接收)
sz
从服务器下载文件(send:服务器发送)
hadoop hdfs常用命令
hadoop fs -ls /user/deploy/recsys/workspace/ouyangyewei
查看ouyangyewei目录文件
hadoop fs -mkdir /user/deploy/recsys/workspace/ouyangyewei/input
在ouyangyewei目录下创建input文件夹
hadoop fs -rm /user/deploy/recsys/workspace/ouyangyewei/input/input.txt
删除input.txt文件
hadoop fs -rmr /user/deploy/recsys/workspace/ouyangyewei/input
删除input目录以及目录下的所有文件
hadoop fs -put ./input.txt /home/deploy/recsys/workspace/ouyangyewei/input
把当前目录下的input.txt文件复制到input目录下
hadoop fs -dus /data/share/trackinfo/ds=2014-05-12
查看文件 “/data/share/trackinfo/ds=2014-05-12”的大小(以字节为单位)
hadoop jar recommender-dm-1.0-SNAPSHOT-lib.jar com.yhd.ml.statistics.click.WordCount /home/deploy/recsys/workspace/ouyangyewei/input /home/deploy/recsys/workspace/ouyangyewei/output
运行Job,指定的jar包是recommender-dm-1.0-SNAPSHOT-lib.jar,主类是com.yhd.ml.statistics.click.WordCount,输入目录是input,输出目录是output
hadoop job -kill job_201403291618_274044
杀掉hadoop的job
hbase常用命令
/usr/local/cloud/hbase/bin/hbase shell
用shell来连接hbase
exit
退出hbase shell
version
查看hbase版本
hbase(main):045:0> is_enabled 't1' true 0 row(s) in 0.0020 seconds
测试表t1是否有效
hbase(main):046:0> is_disabled 't1' false 0 row(s) in 0.0010 seconds
测试表t1是否无效
hbase(main):044:0> exists 't1' Table t1 does exist 0 row(s) in 0.0270 seconds
测试表t1是否存在
scan 'full_user_profile', {LIMIT=>1}
输出数据表full_user_profile中的1个RowKey
list
列出所有数据表
describe 'full_user_profile'
列出full_user_profile数据表的结构
hbase(main):003:0> disable 'score' 0 row(s) in 2.1080 seconds
使数据表score无效 hbase(main):004:0> drop 'score' 0 row(s) in 10.6740 seconds
注意在删除表之前要使表无效)
-------------------------------------------------------------------------
hbase(main):013:0> create 'score', 'name', 'course' 0 row(s) in 5.1050 seconds
创建数据表score,其中name是Row Key,course是列族 hbase(main):014:0> put 'score', 'xiaowen', 'course:China', '95' 0 row(s) in 33.4270 seconds
在列族course下创建列China,值为95 hbase(main):015:0> put 'score', 'xiaowen', 'course:Math', '99' 0 row(s) in 0.0130 seconds
在列族course下创建列Math,值为99
hbase(main):016:0> put 'score', 'xiaowen', 'course:English', '98'
0 row(s) in 0.0040 seconds
在列族course下创建列English,值为98
hbase(main):017:0> scan 'score' ROW COLUMN+CELL xiaowen column=course:China, timestamp=1400141524101, value=95 xiaowen column=course:English, timestamp=1400141591123, value=98 xiaowen column=course:Math, timestamp=1400141579107, value=99 1 row(s) in 0.0250 seconds
查看score整张表的数据
hbase(main):018:0> get 'score', 'xiaowen' COLUMN CELL course:China timestamp=1400141524101, value=95 course:English timestamp=1400141591123, value=98 course:Math timestamp=1400141579107, value=99 3 row(s) in 0.0110 seconds
查看score表的xiaowen行数据
hbase(main):019:0> get 'score', 'xiaowen', 'course:Math' COLUMN CELL course:Math timestamp=1400141579107, value=99 1 row(s) in 0.0070 seconds
查看score表的xiaowen行,course列族上的Math列的数据
hbase(main):008:0> scan 'score' ROW COLUMN+CELL xiaowen column=course:China, timestamp=1400141524101, value=95 xiaowen column=course:English, timestamp=1400141591123, value=98 xiaowen column=course:Math, timestamp=1400141579107, value=99 xiaoye column=course:China, timestamp=1400143888087, value=85 xiaoye column=course:English, timestamp=1400143921395, value=85 xiaoye column=course:Math, timestamp=1400143907407, value=85 2 row(s) in 0.0240 seconds
查看score表的所有值
hbase(main):013:0> scan 'score', {COLUMNS=>'course'} ROW COLUMN+CELL xiaowen column=course:China, timestamp=1400141524101, value=95 xiaowen column=course:English, timestamp=1400141591123, value=98 xiaowen column=course:Math, timestamp=1400141579107, value=99 xiaoye column=course:China, timestamp=1400143888087, value=85 xiaoye column=course:English, timestamp=1400143921395, value=85 xiaoye column=course:Math, timestamp=1400143907407, value=85 2 row(s) in 0.0230 seconds
查看score表的course列的所有值 hbase(main):014:0> scan 'score', {COLUMNS=>'course:Math'} ROW COLUMN+CELL xiaowen column=course:Math, timestamp=1400141579107, value=99 xiaoye column=course:Math, timestamp=1400143907407, value=85 2 row(s) in 0.0270 seconds
查看score表course:Math列的所有值
hbase(main):021:0> count 'score' 2 row(s) in 0.1880 seconds
统计score表有多少行
--------------------------------------------------------------------
test表的实践
hbase(main):022:0> create 'test', 'c1', 'c2' 0 row(s) in 1.1260 seconds hbase(main):023:0> put 'test', 'r1', 'c1:1', 'value1-1/1' 0 row(s) in 0.0360 seconds hbase(main):024:0> put 'test', 'r1', 'c1:2', 'value1-1/2' 0 row(s) in 0.0210 seconds hbase(main):025:0> put 'test', 'r1', 'c1:3', 'value1-1/3' 0 row(s) in 0.0170 seconds hbase(main):026:0> put 'test', 'r1', 'c2:1', 'value1-2/1' 0 row(s) in 0.0100 seconds hbase(main):027:0> put 'test', 'r1', 'c2:2', 'value1-2/2' 0 row(s) in 0.0060 seconds hbase(main):028:0> put 'test', 'r2', 'c1:1', 'value2-1/1' 0 row(s) in 0.0110 seconds hbase(main):029:0> put 'test', 'r2', 'c2:1', 'value2-2/1' 0 row(s) in 0.0080 seconds
hbase(main):030:0> scan 'test' ROW COLUMN+CELL r1 column=c1:1, timestamp=1400152716678, value=value1-1/1 r1 column=c1:2, timestamp=1400152749600, value=value1-1/2 r1 column=c1:3, timestamp=1400152770555, value=value1-1/3 r1 column=c2:1, timestamp=1400152793839, value=value1-2/1 r1 column=c2:2, timestamp=1400152811436, value=value1-2/2 r2 column=c1:1, timestamp=1400152843148, value=value2-1/1 r2 column=c2:1, timestamp=1400152858073, value=value2-2/1 2 row(s) in 0.0490 seconds
hbase(main):031:0> describe 'test' DESCRIPTION ENABLED {NAME => 'test', FAMILIES => [{ NAME => 'c1', DATA_BLOCK_ENCODING => 'NONE', BLOOMFILTER => 'NONE', REPLICATION_SC true OPE => '0', VERSIONS => '3', COMPRESSION => 'NONE', MIN_VERSIONS => '0', TTL => '2147483647', KEEP_DELETED_CELLS => 'false', BLOCKSIZE => '65536', IN_MEMORY => 'false', ENCODE_ON_DISK => 'true', BLOCKCACHE => 'true'}, { NAME => 'c2', DATA_BLOCK_ENCODING => 'NONE', BLOOMFILTER => 'NONE', REPLICATION_SCOPE => '0', VERSIONS => '3', COMPRESSI ON => 'NONE', MIN_VERSIONS => '0', TTL => '2147483647', KEEP_DELETED_CELLS => 'false', BLOCKSIZE => '65536', IN_M EMORY => 'false', ENCODE_ON_DISK => 'true', BLOCKCACHE => 'true'}]} 1 row(s) in 0.2560 seconds
从describe命令可见test表有两个列族
hive常用命令
show tables;
列出hive里面所有数据表名
desc userProfile;
显示数据表userProfile的基本表字段及字段type
desc extended trackinfo;
显示数据表trackinfo的详细信息,包括字段说明,数据表等
/usr/local/cloud/hive/bin/hive
进入hive数据库
select attribute_name from pms_attribute where attribute_id=21000 and attribute_value_id=105991;
hive的select操作
select user_id, category_id, catgory_pref, attribute_id, attribute_pref, attribute_value_id, attribute_value_pref from userProfile limit 10;
hive的select操作,只显示10行
/usr/local/cloud/hive/bin/hive -e "select category_id, attribute_id, count(user_id) from userProfile group by category_id, attribute_id" >> /home/deploy/recsys/workspace/ouyangyewei/statistics_data/number_attention_of_attribute_for_mobilePhone.csv;
将sql查询的数据导出到csv文件中