对于使用doris的去实现漏斗分析的用户,doris提供了windows_funnel漏斗函数。

但目前发现漏斗分析函数window_funnel最多只能根据user_id进行GROUP BY获取总体转化率,但我希望能增加每日的UV,例如8月份每天的UV,功能如下:
窗口期为3天,
用户在2023-08-26点击了第一步,没有点击第二步和第三步
用户在2023-08-28点击了第一步和第二步以及第三步
那么计算出来的漏斗数据为
2023-08-26 step1:1 step2:1 step3:1 (因为窗口期为3天,滑动至08-28)
2023-08-28 step1:1 step2:1 step3:1

如果窗口期为当天
那么计算出来的漏斗数据为
2023-08-26 step1:1 step2:0 step3:0 (因为当天没有点第二步和第三步)
2023-08-28 step1:1 step2:1 step3:1

Use case

以下是我的sql:下面的sql无法实现我说的功能,因为进行了event_date的group by,只能当天的滑动窗口

select
funnel.event_date,
sum(if(step >= 1 , 1, 0)) as step1,
sum(if(step >= 2 , 1, 0)) as step2,
sum(if(step >= 3 , 1, 0)) as step3
from
(
SELECT
user_id,DATE(event_time) as event_date,
window_funnel(3600 * 24 * 3,'defalut',event_time,
event_key = 'App_Quanmaidian_Page_Exp' and (p__user_type = 1 or p__user_type = 2),
event_key = 'Passenger_Xunjiaye_Gouquanrukou_Popbanner_Exp',
event_key = 'Passenger_Trip_Youhuiquantuijiantanchuang_Popbanner_Exp'
) as step
FROM ruqi_realtime.dwd_user_event_track_normal
WHERE event_time>='2023-08-01 00:00:00' and event_time<='2023-08-24 00:00:00' and user_id is not null
GROUP BY user_id,event_date
) as funnel
group by funnel.event_date
order by funnel.event_date

解决方案:

可以用EXPLODE和ARRAY_RANGE函数实现这个功能

select
funnel.event_date,
sum(if(step >= 1 , 1, 0)) as step1,
sum(if(step >= 2 , 1, 0)) as step2,
sum(if(step >= 3 , 1, 0)) as step3
from
(
SELECT
user_id,FROM_DAYS(tmp.event_date) as event_date,
window_funnel(3600 * 24 * 3,'defalut',event_time,
event_key = 'App_Quanmaidian_Page_Exp' and (p__user_type = 1 or p__user_type = 2),
event_key = 'Passenger_Xunjiaye_Gouquanrukou_Popbanner_Exp',
event_key = 'Passenger_Trip_Youhuiquantuijiantanchuang_Popbanner_Exp'
) as step
FROM ruqi_realtime.dwd_user_event_track_normal 
LATERAL VIEW EXPLODE(
            ARRAY_RANGE(
                IF(
                    event_key = 'App_Quanmaidian_Page_Exp' and (p__user_type = 1 or p__user_type = 2),
                    TO_DAYS(event_time),
                    TO_DAYS(SECONDS_SUB(event_time, 3600 * 24 * 3))
                ),
                TO_DAYS(event_time) + 1
            )
        ) tmp AS event_date
WHERE event_time>='2023-08-01 00:00:00' and event_time<='2023-08-24 00:00:00' and user_id is not null
GROUP BY user_id,tmp.event_date
) as funnel
group by funnel.event_date
order by funnel.event_date

LATERAL VIEW EXPLODE的作用为列转行,ARRAY_RANGE的作用是生成数组,下面sql的含义是,如果是第一步,则数组为【TO_DAYS(event_time),TO_DAYS(event_time) + 1),如果是第二步或者第三步,则数组【TO_DAYS(SECONDS_SUB(event_time, 3600 * 24 * 3)),TO_DAYS(event_time) + 1),其实就是将第二步和第三步的数据都往前挪动3天加入数据,然后就可以按天group by了,比如第二步是2023-10-18的,往前挪动后变成数组【2023-10-15,2023-10-16,2023-10-17,2023-10-18】,最后将日期列转行,计算可得出按日期的具有窗口期的漏斗分析

ARRAY_RANGE(

                IF(

                    event_key = 'App_Quanmaidian_Page_Exp' and (p__user_type = 1 or p__user_type = 2),

                    TO_DAYS(event_time),

                    TO_DAYS(SECONDS_SUB(event_time, 3600 * 24 * 3))

                ),

                TO_DAYS(event_time) + 1

            )