Aggregation Functions 聚合函数 (page 246)
Aggregation functions can operate in analytic mode or conventional non-analytic mode. Aggregation
functions in non-analytic mode reduce the result set to fewer rows. However, in analytic mode,
aggregation functions do not reduce the result set. Further, the aggregation functions can fetch both
aggregated and non-aggregated columns in the same row. Aggregation functions in analytic mode
provide the ability to aggregate data at different levels without any need for a self-join.
聚合函数能运作于分析模式或者传统的非分析模式。非分析模式的聚合函数将结果集缩减成几行。然而,分析模式,聚合函数不会缩减结果集。再者,聚合函数能在同一行取出聚合和非聚合的列。 分析模式的聚合函数提供了在不同层次聚合数据而不需要自连接的能力。
Analytic functions are useful in writing complex report queries aggregating data at different levels.
Consider a demographic market analysis report for a product, a favorite among advertising executives,
which requires sales data to be aggregated at myriad levels such as age, gender, store, district, region,
and country. Aggregation functions in the analytic mode can be effectively utilized to implement this
market analysis report with ease. Analytic functions will markedly improve the clarity and performance
of the SQL statements, compared to its non-analytic counterparts.
分析函数对写在不同层次上聚集数据的复杂报告查询是有用的。考虑一款产品的人口统计的市场分析报告,是在广告经理中的最爱,需要在很多层次,诸如年龄,性别,库存,片区,地区以及国家,进行销售数据的聚合。分析模式的聚合函数能轻松,有效的用于执行这种市场分析报告。分析函数显著的
Let’s review the example in the Listing 8-2. The SQL statement is calculating the running of sum of
Sale column from the beginning of the year for a product, country, region, and year combination. The
clause partition by product, country, region, year specifies the partition columns. Within the data
partition, rows are sorted by the Week column using the clause order by week .
我们考察一下列表8-2的例子。SQL语句计算Sale列的从年头起,按产品,国家,区域和年份组合(聚合)的累计和。子句partition by product, country, region, year 指定分区列。在数据分区中,用子句order by week使得行按照Week列排序。
In Listing 8-2, the SQL is calculating the running of sum of Sale column, so the analytic function
must operate on window of rows from the beginning of the year to the current week. That goal is
achieved by the windowing clause rows between unbounded preceding and current row. The sum(sale)
function calculates the sum of Sale column values over this window of rows. Since the rows are sorted by
the Week column, the sum function is operating over a set of rows from the beginning of the year until
the current week.
在列表8-2中,SQL计算在Sale列上的累加和,因此分析函数必须运作于从年头到当前周的行窗口上。通过开窗子句 rows between unbounded preceding and current row可以完成这个目标。 函数sum(sale) 计算在行窗口上Sale列值的和。由于行按Week列排序,sum函数运作在从年头到当前周的行集上。
Listing 8-2. Running Sum of Sale Column
1 select year, week,sale,
2 sum (sale) over(
3 partition by product, country, region, year
4 order by week
5 rows between unbounded preceding and current row
6 ) running_sum_ytd
7 from sales_fact
8 where country in ('Australia') and product ='Xtend Memory'
9* order by product, country,year, week
/
YEAR WEEK SALE RUNNING_SUM_YTD
----- ---- ---------- ----------------
...
2000 49 42.38 3450.85
2000 50 21.19 3472.04
2000 52 67.45 3539.49
2001 1 92.26 92.26
2001 2 118.38 210.64
2001 3 47.24 257.88
2001 4 256.70 514.58
...
Notice in the output of Listing 8-2, column Running_sum_ytd is the output of the sum function in
the analytic mode. The column value resets at the onset of the new year 2001. Since year is also a
partitioning column, so a new partition starts with each new year.
注意列表8-2的输出,列Running_sum_ytd是sum函数分析模式的输出。列值在新的2001年的开端重置了。因为年也是一分区列,所以对于每一新的年开创一个新的分区。
When a new year begins, the window slides to the next data partition, and the sum function begins
aggregating from Week 1. Implementing this functionality with a conventional SQL statement would
lead multiple self-joins and/or costly
昂贵的 列级子查询。
Aggregate Function Over An Entire Partition 在整个分区上的聚合函数
In some cases, analytic functions might need to be applied over all rows in a given data partition. For
example, computing the maximum value of the Sale column for the entire year would require a window
encompassing every row in the data partition. In the Listing 8-3, you use the SQL clause rows between
unbounded preceding and unbounded following to specify that the MAX function applies to all rows in a
data partition. The key difference between Listing 8-2 and Listing 8-3 is that the clause unbounded
following specifies the window size to include all rows in a data partition.
在某些情况下,分析函数可能需要应用于给定数据分区的所有行之上。例如,计算Sale列在整年的最大值将需要一包含在数据分区每一行的窗口。在列表8-3 中,你使用SQL子句rows between unbounded preceding and unbounded following 指定MAX函数应用于数据分区的所有行。 列表8-2和8-3的主要不同是子句unbounded following指定窗口的尺度包含在数据分区上的所有行。
Listing 8-3. Maximum of Sale Column
1 select year, week,sale,
2 max (sale) over(
3 partition by product, country, region ,year
4 order by week
5 rows between unbounded preceding and unbounded following
6 ) Max_sale
7 from sales_fact
8 where country in ('Australia') and product ='Xtend Memory'
9* order by product, country,year, week
/
YEAR WEEK SALE MAX_SALE
----- ---- ---------- ---------------
...
2000 44 135.24 246.74
2000 45 67.62 246.74
2000 46 246.74 246.74
...
2000 50 21.19 246.74
2000 52 67.45 246.74
2001 1 92.26 278.44
2001 2 118.38 278.44
...
Granular Window Specifications 窗口规格的粒度
Window specification can be more granular, too. Let’s say that you want to calculate the Maximum of
Sale column for a five week window period encompassing two weeks prior to the current week, the
current week, and the two weeks following the current week. You can do that using the clause rows
between 2 preceding and 2 following .
窗口的规格能够更加的(小)粒度。我们说你想要计算Sale列的每五周窗口周期,包含当前周的前两周、当前周、当前周的后两周,的最大值。你就能使用子句rows between 2 preceding and 2 following 。
In the Listing 8-4, for week 36, the maximum value for the Sale column in the 5 week window is
178.52. For week 37, the maximum value for the Sale column in the 5 week window is 118.41. You can see those values in the MAX_WEEKS_5 column of the output.
在列表8-4中,对于36周而言,Sale列在5周窗口中的最大值是178.52。对37周而言,Sale列在5周窗口中的最大值是118.41。你可以从MAX_WEEKS_5列中看到这些值的输出。
Listing 8-4. Maximum of Sale Column for a Span of Five Weeks Window
1 select year, week,sale,
2 max (sale) over(
3 partition by product, country, region ,year
4 order by week
5 rows between 2 preceding and 2 following
6 ) max_weeks_5
7 from sales_fact
8 where country in ('Australia') and product ='Xtend Memory'
9* order by product, country,year, week
/
YEAR WEEK SALE MAX_WEEKS_5
---- ---- ---------- -----------
...
2000 34 178.52 178.52
2000 35 78.82 178.52
2000 36 118.41 178.52
2000 37 117.96 118.41
2000 38 79.36 118.41
...
Default Window Specification 默认的窗口规格
The default windowing clause is rows between unbounded preceding and current row. If you do not
explicitly specify a window, you’ll get the default window. It is a good approach to specify this clause
explicitly to avoid ambiguities.
默认的开窗子句是rows between unbounded preceding and current row。