文章目录

  • 某音短视频
  • SQL1 各个视频的平均完播率
  • SQL2 平均播放进度大于60%的视频类别
  • SQL3 每类视频近一个月的转发量/率
  • SQL4 每个创作者每月的涨粉率及截止当前的总粉丝量
  • 用户增长场景(某度信息流)
  • SQL7 2021年11月每天的人均浏览文章时长
  • SQL8 每篇文章同一时刻最大在看人数
  • SQL9 2021年11月每天新用户的次日留存率
  • 电商场景(某东商城)
  • SQL13 计算商城中2021年每月的GMV
  • SQL14 统计2021年10月每个退货率不大于0.5的商品的各项指标
  • SQL15 某店铺的各商品毛利率及店铺整体毛利率
  • SQL16 零食类商品中复购率top3高的商品
  • 出行场景(某滴打车)
  • SQL19 2021年国庆在北京接单3次及以上的司机统计
  • SQL20 有取消订单记录的司机平均评分
  • 某宝店铺分析(电商模式)
  • SQL25 某宝店铺的SPU数量
  • SQL26 某宝店铺的实际销售额与客单价
  • 牛客直播课分析(在线教育行业)
  • SQL30 牛客直播转换率
  • 某乎问答(内容行业)
  • SQL35 某乎问答11月份日人均回答量


某音短视频

SQL1 各个视频的平均完播率

先理解题目意思:有播放记录的视频的id号和其完播率。条件是:2021年和根据其完播率降序排序。
拆解复杂指标:这里需要拆解的是完播率这个指标

mysql大作业总结与体会 mysql大题_ide

注意:记得加上题目的限制条件

SELECT a.video_id ,
	   round(sum(if(end_time - start_time >= duration, 1, 0))/count(start_time ),3) as avg_comp_play_rate
FROM tb_user_video_log a
LEFT JOIN tb_video_info b
on a.video_id = b. video_id
WHERE year(start_time) = 2021
GROUP BY a.video_id 
ORDER BY avg_comp_play_rate DESC;

SQL2 平均播放进度大于60%的视频类别

平均播放进度大于60%的视频类别
明确题意:
计算各类视频的平均播放进度,将进度大于60%的类别输出

问题分解:
关联用户-视频互动记录和短视频信息表:JOIN tb_video_info USING(video_id);
按视频类别分组:GROUP BY tag
计算每个类别的平均播放进度:
播放进度=播放时长÷视频时长*100%
播放时长=TIMESTAMPDIFF(SECOND, start_time, end_time);特殊情况:播放时长大于视频时长时,播放进度为100%(加个IF判断)
平均进度=AVG(每个进度)
结果保留2位小数:ROUND(x, 2)
百分比格式化:CONCAT(x, ‘%’)
计算公式:
ROUND(AVG(
IF(TIMESTAMPDIFF(SECOND, start_time, end_time) > duration, 1,
TIMESTAMPDIFF(SECOND, start_time, end_time) / duration)
) * 100, 2) as avg_play_progress
筛选播放进度>60%的视频类别:HAVING avg_play_progress > 60
细节问题:
表头重命名:as
按播放进度倒序排序:ORDER BY avg_play_progress DESC;

select tag,concat(avg_play_progress,"%") as avg_play_progress
from(
select tag,
round(avg(if(timestampdiff(second,start_time,end_time)>duration,1,
            timestampdiff(second,start_time,end_time)/duration))*100,
     2) as avg_play_progress
from tb_user_video_log
    join tb_video_info using(video_id)
    group by tag
    having avg_play_progress>60
    order by avg_play_progress desc
)as t_progress

SQL3 每类视频近一个月的转发量/率

首先是题意,视频播放日期start_time和整体的日期最大值之差小于30的,叫做“视频在有用户互动的最近一个月”。 第二,mark一下,上面要写成(SELECT MAX(start_time) FROM tb_user_video_log),而不能直接写MAX(start_time)。where后面不能跟集函数。顺便说一句,having后面倒是常跟集函数。

1.有交互的视频的最近一个月(就是有播放量的视频的最大日期 减去 30天)
2.每类视频在有用户互动的最近一个月(并不是对于各类视频计算最大值,而是整体的日期最大值)

select tag,sum(if_retweet),round(sum(if_retweet)/count(*),3) retweet_rate
from tb_user_video_log a
left join tb_video_info b
on a.video_id =b.video_id
where timestampdiff(day,start_time,(select max(start_time) from tb_user_video_log))<30
group by tag
order by retweet_rate desc

SQL4 每个创作者每月的涨粉率及截止当前的总粉丝量

第一个需要解决的地方是每个月状态2相当于掉粉,状态1相当于涨粉,这里不能用if去做,因为有多个状态,所以用case when去处理。

第二个需要解决的就是,我们需要计算每个月截止当前的粉丝量,这里自然地想到用窗口函数去处理即可,这里窗口函数需要去partition by author如果这样处理的话,如果有多个作者就会混乱。

SELECT 
    author, 
    date_format(start_time,'%Y-%m') month,
    round(sum(case when if_follow=1 then 1
             when if_follow=2 then -1
             else 0 end)/count(author),3) fans_growth_rate,
     sum(sum(case when if_follow=1 then 1
             when if_follow=2 then -1
             else 0 end)) over(partition by author order by date_format(start_time,'%Y-%m')) total_fans
FROM tb_user_video_log log 
left join tb_video_info info on log.video_id=info.video_id
where year(start_time)=2021
group by author,month
order by author,total_fans

请问这个为啥要sum两次呢?只在窗口函数中放sum为啥不行?

因为一个是单月统计 一个是截止到当前的汇总
窗口函数是把单月新增累加一起

用户增长场景(某度信息流)

SQL7 2021年11月每天的人均浏览文章时长

select 
date(in_time) dt,
round(sum(timestampdiff(second,in_time,out_time))/count(distinct uid),1) avg_lensec
from tb_user_log
where date_format(in_time,"%Y-%m") ="2021-11" and artical_id!=0
group by dt
order by avg_lensec

SQL8 每篇文章同一时刻最大在看人数

参考题解

select artical_id, max(cnt) as ax
from
    (select artical_id,
    sum(Mark) over (partition by artical_id order by Time, Mark desc) as cnt
    from
    (select artical_id, in_time as Time , 1 Mark from tb_user_log where artical_id != 0
    union all
    select artical_id, out_time as Time , -1 Mark from tb_user_log where artical_id != 0) as a
)as b
group by artical_id
order by ax desc

SQL9 2021年11月每天新用户的次日留存率

流程:

先查询出每个用户第一次登陆时间(最小登陆时间)–每天新用户表
因为涉及到跨天活跃,所以要进行并集操作,将登录时间和登出时间取并集,这里union会去重–用户活跃表
将每天新用户表和用户活跃表左连接,只有是同一用户并且该用户第2天依旧登陆才会保留整个记录,否则右表记录为空
得到每天新用户第二天是否登陆表后,开始计算每天的次日留存率:根据日期分组计算,次日活跃用户个数/当天新用户个数

select t1.dt,round(count(t2.uid)/count(t1.uid),2) uv_left_rate
from(
select uid,
min(date(in_time)) dt
from tb_user_log
group by uid) t1 -- 每天用户表
left join
(select uid,date(in_time) dt
from tb_user_log
union
select uid,date(out_time) dt
from tb_user_log) t2 -- 用户活跃表
on t1.uid = t2.uid and t1.dt = date_sub(t2.dt,INTERVAL 1 day) 
where date_format(t1.dt,"%Y-%m") ='2021-11'
group by t1.dt
order by t1.dt

电商场景(某东商城)

SQL13 计算商城中2021年每月的GMV

明确题意:
统计GMV大于10w的每月GMV

问题分解:
筛选满足条件的记录:
退款的金额不算(付款的记录还在,已算过一次):where status != 2
2021年的记录:and YEAR(event_time) = 2021
按月份分组:group by DATE_FORMAT(event_time, “%Y-%m”)
计算GMV:(sum(total_amount) as GMV
保留整数:ROUND(x, 0)
筛选GMV大于10w的分组:having GMV > 100000

另外有一个问题
group by 执行顺序不是在select 之前吗,为什么能直接用month啊
答:
mysql做了查询增强,别的不可以

select date_format(event_time,"%Y-%m")  as month,
sum(total_amount) as GMV
from tb_order_overall
where year(event_time) =2021 and (status!=2)
group by month
having GMV>100000
order by GMV

SQL14 统计2021年10月每个退货率不大于0.5的商品的各项指标

明确题意:
统计2021年10月每个有展示记录的退货率不大于0.5商品的各项指标:

商品点展比=点击数÷展示数;加购率=加购数÷点击数;成单率=付款数÷加购数;退货率=退款数÷付款数,当分母为0时整体结果记为0。

结果中各项指标保留3位小数,并按商品ID升序排序。

问题分解:
计算各个维度的计数(生成子表t_product_index_cnt)
筛选时间窗内的记录:where DATE_FORMAT(event_time, ‘%Y%m’) = ‘202110’
按商品ID分组:group by product_id
统计各种计数:
展示数(每条记录就是一次展示):COUNT(1) as show_cnt
点击数:sum(if_click) as click_cnt
加购数:sum(if_cart) as cart_cnt
付款数:sum(if_payment) as payment_cnt
退款数:sum(if_refund) as refund_cnt
计算各种指标率(除了展示数其他均可能为0,要特殊处理!):
点击率:click_cnt/show_cnt as ctr
加购率:IF(click_cnt>0, cart_cnt/click_cnt, 0) as cart_rate
付款率:IF(cart_cnt>0, payment_cnt/cart_cnt, 0) as payment_rate
退款率:IF(payment_cnt>0, refund_cnt/payment_cnt, 0) as refund_rate
都保留3位小数:ROUND(x, 3)
筛选退款率不大于0.5的商品,需注意分母可能为0:where payment_cnt = 0 or refund_rate <= 0.5

select product_id,round(click_cnt/show_cnt,3) as ctr,
round(if(click_cnt>0,cart_cnt/click_cnt,0),3) as cart_rate,
round(if(cart_cnt>0,payment_cnt/cart_cnt,0),3) as payment_rate,
round(if(payment_cnt>0,refund_cnt/payment_cnt,0),3) as refund_rate
from(
select product_id,count(1) as show_cnt,
sum(if_click) as click_cnt,
sum(if_cart) as cart_cnt,
sum(if_payment) as payment_cnt,
sum(if_refund) as refund_cnt
    from tb_user_event
    where date_format(event_time,'%Y-%m') ='2021-10'
    group by product_id
)as tb_user_e 
where payment_cnt =0 or refund_cnt/payment_cnt<=0.5
order by product_id

SQL15 某店铺的各商品毛利率及店铺整体毛利率

参考题解链接with rollup用法

select product_id,concat(profit_rate,"%") as profit_rate
from(
    select ifnull(product_id,'店铺汇总') as product_id,
    round(100*(1-sum(in_price*cnt)/sum(price*cnt)),1) as profit_rate
    from(
        select product_id,price,cnt,in_price
        from tb_order_detail
        join tb_order_overall using(order_id)
        join tb_product_info using(product_id)
        where shop_id =901 and date(event_time) >="2021-10-01"
    )as t_product_in_each_order
    group by product_id
    with rollup
    having profit_rate>24.9 or product_id is null
    order by product_id
)as t1

SQL16 零食类商品中复购率top3高的商品

参考链接date_sub用法

select product_id,
round(sum(repurchase)/count(repurchase),3) as repurchase_rate
from(
    select uid,product_id,if(count(event_time)>1,1,0) as repurchase
    from tb_order_detail
    join tb_order_overall using(order_id)
    join tb_product_info using(product_id)
    where tag = "零食" and event_time>= (
    select date_sub(max(event_time),INTERVAL 89 DAY)
        from tb_order_overall
    )
    group by uid,product_id
)as tb 
group by product_id
order by repurchase_rate desc,product_id
limit 3

出行场景(某滴打车)

SQL19 2021年国庆在北京接单3次及以上的司机统计

select "北京" as city,round(avg(order_num),3) as avg_order_num,
round(avg(income),3) as avg_income
from( 
select driver_id,count(order_id) as order_num,
    sum(fare) as income
    from tb_get_car_order
    join tb_get_car_record using(order_id)
    where city ="北京" 
    and date_format(order_time,"%Y-%m-%d") between '2021-10-01' and '2021-10-07'
    group by driver_id
    having count(order_id)>=3
) as t

SQL20 有取消订单记录的司机平均评分

select ifnull(driver_id,"总体") as driver_id,
round(avg(grade),1) as avg_grade
from tb_get_car_order
where driver_id in (
select driver_id
from tb_get_car_order
where date_format(order_time,"%Y-%m")='2021-10' and isnull(fare)
) and not isnull(grade)
group by driver_id
with rollup

某宝店铺分析(电商模式)

SQL25 某宝店铺的SPU数量

select style_id,
count(distinct item_id) SPU_num
from product_tb
group by style_id
order by SPU_num desc

SQL26 某宝店铺的实际销售额与客单价

销售额:SUM(sales_price)
客单价:SUM(sales_price)/COUNT(DISTINCT user_id)

select sum(sales_price) sales_total,
round(sum(sales_price)/count(distinct user_id),2) pre_trans from sales_tb

牛客直播课分析(在线教育行业)

SQL30 牛客直播转换率

SELECT course_id,
course_name,
round(sum(if_sign)/ sum(if_vw)*100,2) as "sign_rate(%)"
from course_tb 
join behavior_tb 
using(course_id)
group by course_id,course_name
order by course_id

某乎问答(内容行业)

SQL35 某乎问答11月份日人均回答量

select answer_date,
round(count(issue_id)/count(distinct author_id),2) per_num
from answer_tb
where month(answer_date) = 11
group by answer_date