Hive基本查询
需求1:某次经营活动中,商家发起了"异性拼团购",试着针对某个地区的用户进行推广,找出匹配用户。
参考实现:选出城市在北京,性别为女的10个用户名
select user_name
from user_info
where city='beijing' and sex='female'
limit 10;
注意:如果该表是一个分区表,则WHERE条件中必须对分区字段进行限制。
需求2:某天,发现食物类的商品卖的很好,你能找出几个资深吃货吗?参考实现:选出在2019年6月18日,购买的商品品类是food的用户名、支付金额总和
select user_name,sum(pay_amount) as pay_sum
from user_trade
where dt='2019-06-18' and goods_category='food'
group by user_name
order by pay_sum desc
limit 10;
需求3:试着对本公司2019年第一季度商品的热度与价值度进行分析。参考实现:2019年一月到三月,每个品类有多少人购买,累计金额是多少
select goods_category,
count(distinct user_name) as user_num,
sum(pay_amount) as total_amount
from user_trade
where dt between '2019-01-01' and '2019-03-31'
group by goods_category;
需求4:2019年4月,支付金额超过5万元的用户。给VIP用户赠送优惠劵。
select user_name,sum(pay_amount) as pay_sum
from user_trade
where dt between '2019-04-01' and '2019-04-30'
group by user_name
having sum(pay_amount)>50000;
需求5:去年的劳动节新用户推广活动价值分析。即拉新分析。
参考实现:用户的首次激活时间,与2019年5月1日的日期间隔
select user_name,
datediff('2019-05-01',to_date(firstactivetime))
from user_info
limit 10;
需求6:对用户的年龄段进行分析,观察分布情况。参考实现:统计以下四个年龄段20岁以下、20-30岁、30-40岁、40岁以上的用户数
select case when age<20 then '20岁以下'
when age>=20 and age<30 then '20-30岁'
when age>=30 and age<40 then '30-40岁'
else '40岁以上' end as age_type,
count(distinct(user_id)) as user_num
from user_info
GROUP BY case when age<20 then '20岁以下'
when age>=20 and age<30 then '20-30岁'
when age>=30 and age<40 then '30-40岁'
else '40岁以上' end;
需求7:去年王思聪的微博抽奖活动引起争议,我们想要观察用户等级随性别的分布情况。
参考实现:统计每个性别用户等级高低的分布情况(level大于5为高级)
select sex,
if(level>5,'高','低') as user_level,
count(distinct(user_id)) as user_num
from user_info
group by sex,
if(level>5,'高','低');
需求8:分析每个月的拉新情况,可以倒推回运营效果。
select substr(firstactivetime,1,7) as month,
count(distinct(user_id)) as user_num
from user_info
group by substr(firstactivetime,1,7);
需求9:找出不同手机品牌的用户分布情况。
#方法一
SELECT get_json_object(extra1,'$.phonebrand') as phone_brand,
count(distinct(user_id)) as user_num
from user_info
group by get_json_object(extra1,'$.phonebrand');
#方法二
select extra2['phonebrand'] as phone_brand,
count(distinct(user_id)) as user_num
from user_info
group by extra2['phonebrand'];
需求10:找出在2018年具有VIP潜质的用户,发送VIP试用劵。
参考实现:2018年购买商品种类超过两个的用户
select user_name,
count(distinct(goods_category)) as gooods_count
from user_trade
where year(dt)=2018
group by user_name
having count(distinct(goods_category))>2;
需求:用户激活时间在2018年,年龄段在20-30岁和30-40岁的婚姻状况 分布
select a.age_type,
if(a.marriage_status=1,'已婚','未婚'),
count(distinct(user_id)) as user_num
from
(
select user_id,
case when age<20 then '20以下'
when age>=20 and age<30 then '20-30'
when age>=30 and age<40 then '30-40'
else '40以上' end as age_type,
get_json_object(extra1,'$.marriage_status') as marriage_status
from user_info
where to_date(firstactivetime) between '2018-01-01' and '2018-12-31'
) as a
where a.age_type in ('20-30','30-40')
group by a.age_type,
if(a.marriage_status=1,'已婚','未婚');
需求:2019年1月1日到2019年4月30日,每个时段的不同品类购买金额分布
#HH表示24小时制
select substr(from_unixtime(pay_time,'yyyy-MM-dd HH'),12),
goods_category,
sum(pay_amount) as total_amount
from user_trade
where dt between '2019-01-01' and '2019-04-30'
group by substr(from_unixtime(pay_time,'yyyy-MM-dd HH'),12),
goods_category;