使用saiku的过程中发现一个重要问题,速度慢!下面是跟踪和优化过程

一、首先抓包,发现ajax请求:http://l-tdata2.tkt.cn6.qunar.com:8080/saiku/rest/saiku/api/query/execute

里面的参数不少,下面是截屏

saiku执行速度慢_数据

二、看日志:发现了mdx语句

WITH
SET [~ROWS_create_date_create_date] AS
    {[create_date].[create_date].[2016-04-12]}
SET [~ROWS_dimPartner_dimPartner] AS
    Hierarchize({{[dimPartner].[dimPartner].[All dimPartners]}, {[dimPartner].[dimPartner].[name].Members}})
SET [~ROWS_in_track_in_track] AS
    {[in_track].[in_track].[All in_tracks]}
SET [~ROWS_product_product] AS
    {[product].[product].[All products]}
SET [~ROWS_self_self] AS
    {[self].[self].[All selfs]}
SET [~ROWS_sight_sight] AS
    {[sight].[sight].[All sights]}
SET [~ROWS_ticket_type_ticket_type] AS
    {[ticket_type].[ticket_type].[All ticket_types]}
SET [~ROWS_order_status_order_status] AS
    {[order_status].[order_status].[All order_statuss]}
SET [~ROWS_refund_status_refund_status] AS
    {[refund_status].[refund_status].[All refund_statuss]}
SELECT
NON EMPTY {[Measures].[money], [Measures].[quantity], [Measures].[qunar_income], [Measures].[order_num]} ON COLUMNS,
NON EMPTY Order(NonEmptyCrossJoin([~ROWS_create_date_create_date], NonEmptyCrossJoin([~ROWS_dimPartner_dimPartner], NonEmptyCrossJoin([~ROWS_in_track_in_track], NonEmptyCrossJoin([~ROWS_product_product], NonEmptyCrossJoin([~ROWS_self_self], NonEmptyCrossJoin([~ROWS_sight_sight], NonEmptyCrossJoin([~ROWS_ticket_type_ticket_type], NonEmptyCrossJoin([~ROWS_order_status_order_status], [~ROWS_refund_status_refund_status])))))))), [Measures].[money], BDESC) ON ROWS
FROM [com_order_detail_cube]
2016-04-15 17:02:55,948 INFO  [org.saiku.datasources.connection.SaikuOlapConnection] Clearing cache
2016-04-15 17:03:26,603 WARN  [mondrian.rolap.RolapSchema] Model is in legacy format
2016-04-15 17:04:09,714 INFO  [org.saiku.datasources.connection.SaikuOlapConnection] Catalogs:1
2016-04-15 17:06:03,660 DEBUG [org.saiku.service.olap.ThinQueryService] Query End
2016-04-15 17:06:03,661 INFO  [org.saiku.service.olap.ThinQueryService] RUN#:84 Size: 14/7      Execute:        190420ms        Format: 0ms     Totals: 0ms      Total: 190420ms

观察日志,发现前端一直执行不返回。分析主要原因是执行mdx需要很长时间,190秒

3、找代码:org.saiku.web.rest.resources.Query2Resource的execute方法

继续追踪代码:org.saiku.service.olap.ThinQueryService的execute方法()。下面是核心重点:

    private CellDataSet execute(ThinQuery tq, ICellSetFormatter formatter) {
        try {

            Long start = (new Date()).getTime();
            log.debug("Query Start");
            CellSet cellSet =  executeInternalQuery(tq); //这是执行mdx语句的地方,需要较长时间
            log.debug("Query End");
            String runId = "RUN#:" + ID_GENERATOR.get();
            Long exec = (new Date()).getTime();

            CellDataSet result = OlapResultSetUtil.cellSet2Matrix(cellSet,formatter);
            Long format = (new Date()).getTime();

            if (ThinQuery.Type.QUERYMODEL.equals(tq.getType()) && formatter instanceof FlattenedCellSetFormatter && tq.hasAggregators()) {
                calculateTotals(tq, result, cellSet, formatter);
            }
            Long totals = (new Date()).getTime();
            log.info(runId + "\tSize: " + result.getWidth() + "/" + result.getHeight() + "\tExecute:\t" + (exec - start)
                    + "ms\tFormat:\t" + (format - exec) + "ms\tTotals:\t" + (totals - format) + "ms\t Total: " + (totals - start) + "ms");

            result.setRuntime(new Double(format - start).intValue());
            return result;
        } catch (Exception | Error e) {
            throw new SaikuServiceException("Can't execute query: " + tq.getName(),e);
        }
    }

4、查看数据执行的sql,看看为什么执行的很慢

4.1 选择情况

首先任何的筛选都是对立方体内的字段进行全表的扫描,比如我的立方体对应的数据表是:com_order_detail_view,时间对应的字段是create_date,那么选择时间的时候,捕获执行的sql如下:

 select "com_order_detail_view"."create_date" as "c0" from "com_order_detail_view" as "com_order_detail_view" group by "com_order
_detail_view"."create_date" order by "com_order_detail_view"."create_date" ASC NULLS LAST

发现根本没有where条件。好吧,这个可以理解!

4.2 执行情况

筛选的时候,为了提升效率,选择了一个日期,并且只是选择了name字段作为区分。执行时间:190s

saiku执行速度慢_字段_02

抓取的sql如下:

4.2.1

select "dim_partner"."name" as "c0", sum("com_order_detail_view"."money") as "m0", sum("com_order_detail_view"."quantity") as "m1", sum("com_order_detail_view"."qunar_income") as "m2", count(distinct "com_order_detail_view"."display_id") as "m3" from "com_order_detail_view" as "com_order_detail_view", "dim_partner" as "dim_partner" where "com_order_detail_view"."partner" = "dim_partner"."code" group by "dim_partner"."name"

4.2.2

 select sum("com_order_detail_view"."money") as "m0", sum("com_order_detail_view"."quantity") as "m1", sum("com_order_detail_view"."qunar_income") as "m2", count(distinct "com_order_detail_view"."display_id") as "m3" from "com_order_detail_view" as "com_order_detail_view"

没有发现where条件。猜测可能是选择日期没有在过滤条件里面,所以全表扫描,那么将日期放入过滤条件,mdx被修改为:

WITH
SET [~FILTER] AS
    {[create_date].[create_date].[2016-04-01]}
SET [~ROWS_dimPartner_dimPartner] AS
    Hierarchize({{[dimPartner].[dimPartner].[All dimPartners]}, {[dimPartner].[dimPartner].[name].Members}})
SET [~ROWS_in_track_in_track] AS
    {[in_track].[in_track].[All in_tracks]}
SET [~ROWS_product_product] AS
    {[product].[product].[All products]}
SET [~ROWS_self_self] AS
    {[self].[self].[All selfs]}
SET [~ROWS_sight_sight] AS
    {[sight].[sight].[All sights]}
SET [~ROWS_ticket_type_ticket_type] AS
    {[ticket_type].[ticket_type].[All ticket_types]}
SET [~ROWS_order_status_order_status] AS
    {[order_status].[order_status].[All order_statuss]}
SET [~ROWS_refund_status_refund_status] AS
    {[refund_status].[refund_status].[All refund_statuss]}
SELECT
NON EMPTY {[Measures].[money], [Measures].[quantity], [Measures].[qunar_income], [Measures].[order_num]} ON COLUMNS,
NON EMPTY Order(NonEmptyCrossJoin([~ROWS_dimPartner_dimPartner], NonEmptyCrossJoin([~ROWS_in_track_in_track], NonEmptyCrossJoin([~ROWS_product_product], NonEmptyCrossJoin([~ROWS_self_self], NonEmptyCrossJoin([~ROWS_sight_sight], NonEmptyCrossJoin([~ROWS_ticket_type_ticket_type], NonEmptyCrossJoin([~ROWS_order_status_order_status], [~ROWS_refund_status_refund_status]))))))), [Measures].[money], BDESC) ON ROWS
FROM [com_order_detail_cube]
WHERE [~FILTER]
2016-04-15 17:13:02,448 DEBUG [org.saiku.service.olap.ThinQueryService] Query End
2016-04-15 17:13:02,449 INFO  [org.saiku.service.olap.ThinQueryService] RUN#:86 Size: 13/8      Execute:        20679ms Format: 1ms     Totals: 0ms      Total: 20680ms

发现有了效果,执行时间:20s。下面是抓取的sql

4.2.3

select "com_order_detail_view"."create_date" as "c0", "dim_partner"."name" as "c1", sum("com_order_detail_view"."money") as "m0", sum("com_order_detail_view"."quantity") as "m1", sum("com_order_detail_view"."qunar_income") as "m2", count(distinct "com_order_detail_view"."display_id") as "m3" from "com_order_detail_view" as "com_order_detail_view", "dim_partner" as "dim_partner" where "com_order_detail_view"."create_date" = DATE '2016-04-01' and "com_order_detail_view"."partner" = "dim_partner"."code" group by "com_order_detail_view"."create_date", "dim_partner"."name"

4.2.4

select "com_order_detail_view"."create_date" as "c0", sum("com_order_detail_view"."money") as "m0", sum("com_order_detail_view"."quantity") as "m1", sum("com_order_detail_view"."qunar_income") as "m2", count(distinct "com_order_detail_view"."display_id") as "m3" from "com_order_detail_view" as "com_order_detail_view" where "com_order_detail_view"."create_date" = DATE '2016-04-01' group by "com_order_detail_view"."create_date"

 

总结:使用saiku的时候,将时间条件放在《行》或者《列》里面,基本不起作用。最好放入在《过滤》里面