游戏玩法分析
表: Activity
+--------------+---------+
| Column Name | Type |
+--------------+---------+
| player_id | int |
| device_id | int |
| event_date | date |
| games_played | int |
+--------------+---------+
(player_id, event_date) 是这个表的两个主键
这个表显示的是某些游戏玩家的游戏活动情况
每一行是在某天使用某个设备登出之前登录并玩多个游戏(可能为0)的玩家的记录
表示例数据如下
Activity table:
+-----------+-----------+------------+--------------+
| player_id | device_id | event_date | games_played |
+-----------+-----------+------------+--------------+
| 1 | 2 | 2016-03-01 | 5 |
| 1 | 2 | 2016-05-02 | 6 |
| 2 | 3 | 2017-06-25 | 1 |
| 3 | 1 | 2016-03-02 | 0 |
| 3 | 4 | 2018-07-03 | 5 |
+-----------+-----------+------------+--------------+
1.查询每一个玩家首次登陆的设备名称,输出playerid,deviceid
解题思路:
(1) 先用一个子查询得到每个用户首次登陆的时间
(2)可以用where in 筛选出一个玩家首次登陆的设备名称或者利用join进行匹配
-- where in 解法
select a.player_id ,a.device_id
from Activity a
where (a.player_id ,a.event_date) in (
select player_id,min(event_date) as first_login_time
from Activity
group by player_id
)t
注:不能直接用event_date in(子查询)。这是因为没有把玩家playerid和时间event_date关联起来,会产生这样一个错误。例如玩家A首次登陆的时间是2020-03-01,而玩家B在2020-03-01也登陆了,所以最后查询结果必然多了那些不是首次登陆的玩家结果。
-- join解法
select a.player_id,a.device_id
from Activity a inner join (
select player_id,min(event_date) as first_login_date
from Activity
group by player_id
)t on a.player_id = t.player_id and a.event_date = t.first_login_date
注:列名在多表同时出现时,要加上表别名,eg:a.player_id,不然会报错。
2.编写一个 SQL 查询,同时输出每组玩家和日期,以及玩家到目前为止玩了多少游戏。也就是说,在此日期之前玩家所玩的游戏总数。输出playerid,eventdate,games_played_so_far
解题思路:
是个累积求和的问题,利用窗口函数或者自连接解决
-- 窗口函数解法:leetcode sqlserver可以实现窗口函数,mysql不能
select player_id,event_date,
sum(games_played)over(partition by player_id order by event_date) games_played_so_far
from Activity
注:不加order by会计算分组内全部数的和,加order by才能实现累加和
--自连接解法
select a1.player_id,a1.event_date,sum(a2.games_played) as games_played_so_far
from Activity a1,Activity a2
where a1.player_id = a2.player_id and a1.event_date >= a2.event_date
group by a1.player_id,a1.event_date
order by a1.player_id,a1.event_date
3.编写一个 SQL 查询,输出在首次登录的第二天再次登录的玩家的分数,四舍五入到小数点后两位。这个分数等于从首次登录日期开始至少连续两天登录的玩家的数量除以玩家总数。
示例:
Activity table:
+-----------+-----------+------------+--------------+
| player_id | device_id | event_date | games_played |
+-----------+-----------+------------+--------------+
| 1 | 2 | 2016-03-01 | 5 |
| 1 | 2 | 2016-03-02 | 6 |
| 2 | 3 | 2017-06-25 | 1 |
| 3 | 1 | 2016-03-02 | 0 |
| 3 | 4 | 2018-07-03 | 5 |
+-----------+-----------+------------+--------------+
Result table:
+-----------+
| fraction |
+-----------+
| 0.33 |
+-----------+
只有 ID 为 1 的玩家在第一天登录后才重新登录,所以答案是 1/3 = 0.33
解题思路:
(1)求次日留存用户数second_back_player_num,将每个用户首次登陆时间左连接到a表,然后使用datediff来判断
(2)求总人数
select round(
(
select count(distinct a.player_id) as second_back_player_num
from Activity a left join
(
select player_id,min(event_date) as first_time
from Activity
group by player_id
)t on a.player_id = t.player_id
where datediff(event_date,first_time) = 1
) / (
select count(distinct a.player_id) as total_player_num
from Activity a
),2
) as fraction
4.计算次日留存率
次日留存率 = 第一天新增用户数在第二天活跃的用户数 / 当天新增用户数
解题思路:
在上一题的基础上,要同时保存当天新增用户数和次日留存用户数,所以不能使用where过滤,改成count(case when)
select first_date as install_dt,
count(distinct a.player_id) as installs,
round(
count(distinct case when datediff(event_date,first_date) = 1 then a.player_id else null end) /
count(distinct a.player_id)
,
2
) as Day1_retention
from Activity a left join
(
select player_id,
min(event_date) as first_date
from Activity
group by player_id
) t on a.player_id = t.player_id
group by first_date