Hive基本查询

需求1:某次经营活动中,商家发起了"异性拼团购",试着针对某个地区的用户进行推广,找出匹配用户。
参考实现:选出城市在北京,性别为女的10个用户名

select user_name
from user_info
where city='beijing' and sex='female'
limit 10;

注意:如果该表是一个分区表,则WHERE条件中必须对分区字段进行限制。

需求2:某天,发现食物类的商品卖的很好,你能找出几个资深吃货吗?参考实现:选出在2019年6月18日,购买的商品品类是food的用户名、支付金额总和

select user_name,sum(pay_amount) as pay_sum
from user_trade
where dt='2019-06-18' and goods_category='food'
group by user_name
order by pay_sum desc
limit 10;

需求3:试着对本公司2019年第一季度商品的热度与价值度进行分析。参考实现:2019年一月到三月,每个品类有多少人购买,累计金额是多少

select goods_category,
count(distinct user_name) as user_num,
sum(pay_amount) as total_amount
from user_trade
where dt between '2019-01-01'  and '2019-03-31'
group by goods_category;

需求4:2019年4月,支付金额超过5万元的用户。给VIP用户赠送优惠劵。

select user_name,sum(pay_amount) as pay_sum
from user_trade
where dt between '2019-04-01' and '2019-04-30'
group by user_name
having sum(pay_amount)>50000;

需求5:去年的劳动节新用户推广活动价值分析。即拉新分析。
参考实现:用户的首次激活时间,与2019年5月1日的日期间隔

select user_name,
datediff('2019-05-01',to_date(firstactivetime))
from user_info
limit 10;

需求6:对用户的年龄段进行分析,观察分布情况。参考实现:统计以下四个年龄段20岁以下、20-30岁、30-40岁、40岁以上的用户数

select case when age<20 then '20岁以下'
            when age>=20 and age<30 then '20-30岁' 
            when age>=30 and age<40 then '30-40岁'
            else '40岁以上' end as age_type,
            count(distinct(user_id)) as user_num
from user_info
GROUP BY case when age<20 then '20岁以下'
              when age>=20 and age<30 then '20-30岁'
              when age>=30 and age<40 then '30-40岁' 
              else '40岁以上' end;

需求7:去年王思聪的微博抽奖活动引起争议,我们想要观察用户等级随性别的分布情况。
参考实现:统计每个性别用户等级高低的分布情况(level大于5为高级)

select sex,
       if(level>5,'高','低')  as user_level,
       count(distinct(user_id)) as user_num
from user_info
group by sex,
         if(level>5,'高','低');

需求8:分析每个月的拉新情况,可以倒推回运营效果。

select substr(firstactivetime,1,7) as month,
       count(distinct(user_id)) as user_num
from user_info
group by substr(firstactivetime,1,7);

需求9:找出不同手机品牌的用户分布情况。

#方法一
SELECT get_json_object(extra1,'$.phonebrand') as phone_brand,
       count(distinct(user_id)) as user_num
from user_info
group by get_json_object(extra1,'$.phonebrand');
#方法二
select extra2['phonebrand'] as phone_brand,
       count(distinct(user_id)) as user_num
from user_info
group by extra2['phonebrand'];

需求10:找出在2018年具有VIP潜质的用户,发送VIP试用劵。
参考实现:2018年购买商品种类超过两个的用户

select user_name,
       count(distinct(goods_category)) as gooods_count
from user_trade
where year(dt)=2018
group by user_name
having count(distinct(goods_category))>2;

需求:用户激活时间在2018年,年龄段在20-30岁和30-40岁的婚姻状况 分布

select a.age_type,
       if(a.marriage_status=1,'已婚','未婚'),
       count(distinct(user_id)) as user_num
from
(
select user_id,
case when age<20 then '20以下'
     when age>=20 and age<30 then '20-30'
     when age>=30 and age<40 then '30-40'
     else '40以上' end as age_type,
get_json_object(extra1,'$.marriage_status') as marriage_status
from user_info
where to_date(firstactivetime) between '2018-01-01' and '2018-12-31'
) as a
where a.age_type in ('20-30','30-40')
group by a.age_type,
         if(a.marriage_status=1,'已婚','未婚');

需求:2019年1月1日到2019年4月30日,每个时段的不同品类购买金额分布

#HH表示24小时制
select substr(from_unixtime(pay_time,'yyyy-MM-dd HH'),12),
       goods_category,
       sum(pay_amount) as total_amount
from user_trade
where dt between '2019-01-01' and '2019-04-30'
group by substr(from_unixtime(pay_time,'yyyy-MM-dd HH'),12),
         goods_category;