sql server 密钥

In this note, I’m going to discuss one of the most useful and helpful cardinality estimator enhancements – the Ascending Key estimation.

在本说明中,我将讨论最有用和有用的基数估计器增强功能之一-升序密钥估计。

We should start with defining the problem with the ascending keys and then move to the solution, provided by the new CE.

我们应该首先使用升序键定义问题,然后转到新的CE提供的解决方案。

Ascending Key is a common data pattern and you can find it in an almost every database. These might be: identity columns, various surrogate increasing keys, date columns where some point in time is fixed (order date or sale date, for instance) or something like this – the key point is, that each new portion of such data has the values that are greater than the previous values.

升序键是一种常见的数据模式,您几乎可以在每个数据库中找到它。 这些可能是:标识列,各种替代增加键,日期列,其中某个时间点是固定的(例如,订购日期或销售日期)或类似的内容–关键点是,此类数据的每个新部分都具有大于先前值的值。

As we remember, the Optimizer uses base statistics to estimate the expected number of rows returned by the query, distribution histogram helps to determine the value distribution and predict the number of rows. In various RDBMS various types of histograms might be used for that purpose, SQL Server uses a Maxdiff histogram. The histogram building algorithm builds histogram’s steps iteratively, using the sorted attribute input (the exact description of that algorithm is beyond the scope of this note, however, it is curious, and you may read the patent US 6714938 B1 – “Query planning using a maxdiff histogram” for the details, if interested). What is important that at the end of this process the histogram steps are sorted in ascending order. Now imagine, that some portion of the new data is loaded, and this portion is not big enough to exceed the automatic update statistic threshold of 20% (especially, this is the case when you have a rather big table with several millions of rows), i.e. the statistics are not updated.

我们记得,优化器使用基本统计信息来估计查询返回的预期行数,分布直方图有助于确定值分布并预测行数。 在各种RDBMS中,各种类型的直方图可用于此目的,SQL Server使用Maxdiff直方图。 直方图构建算法使用排序的属性输入来迭代地构建直方图的步骤(该算法的确切描述超出了本说明的范围,但是,这很好奇,您可以阅读专利US 6714938 B1 –“使用有关详细信息,请查看maxdiff直方图 )。 重要的是,在此过程结束时,直方图的步骤应按升序排序。 现在想象一下,新数据的某些部分已加载,并且该部分的大小不足以超过20%的自动更新统计阈值(尤其是当您有一个具有数百万行的相当大的表时) ,即统计信息不会更新。

In the case of the non-ascending data, the newly added data may be more or less accurate considered by the Optimizer with the existing histogram steps, because each new row will belong to some of the histogram’s steps and there is no problem.

在非上升数据的情况下,优化器可以考虑使用现有直方图步长或多或少地精确添加新数据,因为每个新行都将属于某些直方图步长,并且没有问题。

SQL server 2012 密钥修改_机器学习

If the data has ascending nature, then it becomes a problem. The histogram steps are ascending and the maximum step reflects the maximum value before the new data was loaded. The loaded data values are all greater than the maximum old value because the data has ascending nature, so they are also greater than the maximum histogram step, and so will be beyond the histogram scope.

如果数据具有递增性质,那么它将成为问题。 直方图的步长是递增的,最大步长反映的是加载新数据之前的最大值。 加载的数据值都大于最大旧值,因为该数据具有递增性质,因此它们也大于最大直方图步长,因此将超出直方图范围。

SQL server 2012 密钥修改_java_02

The way how this situation is treated in the new CE and in the old CE is a subject of this note. Now, it is time to look at the example.

本注释的主题是在新的CE和旧的CE中如何处理这种情况。 现在,该看示例了。

We will use the AdventureWorks2012 database, but not to spoil the data with modifications, I’ll make a copy of the tables of interest and their indexes.

我们将使用AdventureWorks2012数据库,但不会因修改而破坏数据,我将制作目标表及其索引的副本。

use AdventureWorks2012;
 
------------------------------------------------
-- Prepare Data
if object_id('dbo.SalesOrderHeader') is not null drop table dbo.SalesOrderHeader;
if object_id('dbo.SalesOrderDetail') is not null drop table dbo.SalesOrderDetail;
select * into dbo.SalesOrderHeader from Sales.SalesOrderHeader;
select * into dbo.SalesOrderDetail from Sales.SalesOrderDetail;
go
alter table dbo.SalesOrderHeader add  constraint PK_DBO_SalesOrderHeader_SalesOrderID primary key clustered (SalesOrderID)
create unique index AK_SalesOrderHeader_rowguid on dbo.SalesOrderHeader(rowguid)
create unique index AK_SalesOrderHeader_SalesOrderNumber on dbo.SalesOrderHeader(SalesOrderNumber)
create index IX_SalesOrderHeader_CustomerID on dbo.SalesOrderHeader(CustomerID)
create index IX_SalesOrderHeader_SalesPersonID on dbo.SalesOrderHeader(SalesPersonID)
alter table dbo.SalesOrderDetail add constraint PK_DBO_SalesOrderDetail_SalesOrderID_SalesOrderDetailID primary key clustered (SalesOrderID, SalesOrderDetailID);
create index IX_SalesOrderDetail_ProductID on dbo.SalesOrderDetail(ProductID);
create unique index AK_SalesOrderDetail_rowguid on dbo.SalesOrderDetail(rowguid);
create index ix_OrderDate on dbo.SalesOrderHeader(OrderDate) -- *
go

Now, let’s make a query, that asks for some order information for the last month, together with the customer and some other details. I’ll also turn on statistics time metrics, because we will see the performance difference, even in such a small database. Pay attention, that TF 9481 is used to force the old cardinality estimation behavior.

现在,让我们进行查询,询问上个月的一些订单信息,以及客户和其他一些详细信息。 我还将打开统计时间指标,因为即使在如此小的数据库中,我们也会看到性能差异。 请注意, TF 9481用于强制执行旧基数估计行为。

-- Query
set statistics time, xml on
select
	soh.OrderDate,
	soh.TotalDue,
	soh.Status,
	OrderQty = sum(sod.OrderQty),
	c.AccountNumber,
	st.Name,
	so.DiscountPct
from
	dbo.SalesOrderHeader soh
	join dbo.SalesOrderDetail sod on soh.SalesOrderID = sod.SalesOrderDetailID
	join Sales.Customer c on soh.CustomerID = c.CustomerID
	join Sales.SalesTerritory st on c.TerritoryID = st.TerritoryID
	left join Sales.SpecialOffer so on sod.SpecialOfferID = so.SpecialOfferID
where
	soh.OrderDate > '20080701'
group by
	soh.OrderDate,
	soh.TotalDue,
	soh.Status,
	c.AccountNumber,
	st.Name,
	so.DiscountPct
order by
	soh.OrderDate
option(querytraceon 9481)
set statistics time, xml off
go

The query took 250 ms on average on my machine, and produced the following plan with Hash Joins:

该查询在我的计算机上平均花费250毫秒 ,并使用哈希联接生成了以下计划:

SQL server 2012 密钥修改_机器学习_03

Now, let’s emulate the data load, as there were some new orders for the next month saved.

现在,让我们模拟数据加载,因为保存了下个月的一些新订单。

-- Load Orders And Details
declare @OrderCopyRelations table(SalesOrderID_old int, SalesOrderID_new int)
 
merge
    dbo.SalesOrderHeader dst
using (
	select SalesOrderID, OrderDate = dateadd(mm,1,OrderDate), RevisionNumber, DueDate, ShipDate, Status, OnlineOrderFlag, SalesOrderNumber = SalesOrderNumber+'new', PurchaseOrderNumber, AccountNumber, CustomerID, SalesPersonID, TerritoryID, BillToAddressID, ShipToAddressID, ShipMethodID, CreditCardID, CreditCardApprovalCode, CurrencyRateID, SubTotal, TaxAmt, Freight, TotalDue, Comment, ModifiedDate
	from Sales.SalesOrderHeader
	where OrderDate > '20080701'
) src 
on 0=1 when not matched then
	insert (OrderDate, RevisionNumber, DueDate, ShipDate, Status, OnlineOrderFlag, SalesOrderNumber, PurchaseOrderNumber, AccountNumber, CustomerID, SalesPersonID, TerritoryID, BillToAddressID, ShipToAddressID, ShipMethodID, CreditCardID, CreditCardApprovalCode, CurrencyRateID, SubTotal, TaxAmt, Freight, TotalDue, Comment, ModifiedDate, rowguid)
	values (OrderDate, RevisionNumber, DueDate, ShipDate, Status, OnlineOrderFlag, SalesOrderNumber, PurchaseOrderNumber, AccountNumber, CustomerID, SalesPersonID, TerritoryID, BillToAddressID, ShipToAddressID, ShipMethodID, CreditCardID, CreditCardApprovalCode, CurrencyRateID, SubTotal, TaxAmt, Freight, TotalDue, Comment, ModifiedDate, newid())
output src.SalesOrderID, inserted.SalesOrderID 
into @OrderCopyRelations(SalesOrderID_old, SalesOrderID_new);
 
insert dbo.SalesOrderDetail(SalesOrderID, CarrierTrackingNumber, OrderQty, ProductID, SpecialOfferID, UnitPrice, UnitPriceDiscount, LineTotal, ModifiedDate, rowguid)
select ocr.SalesOrderID_new, CarrierTrackingNumber, OrderQty, ProductID, SpecialOfferID, UnitPrice, UnitPriceDiscount, LineTotal, ModifiedDate, newid()
from
    @OrderCopyRelations ocr
    join Sales.SalesOrderDetail op on ocr.SalesOrderID_old = op.SalesOrderID
go

Not too much data was added: 939 rows for orders and 2130 rows for order details. That is not enough to exceed the 20% threshold for auto-update statistics.

没有添加太多数据: 939行用于订单, 2130行用于订单明细。 这还不足以超过自动更新统计信息的20%阈值。

Now, let’s repeat the previous query, and ask the orders for the last month (that would be the new added orders).

现在,让我们重复上一个查询,并询问上个月的订单(那将是新添加的订单)。

-- Old
set statistics time, xml on
select
	soh.OrderDate,
	soh.TotalDue,
	soh.Status,
	OrderQty = sum(sod.OrderQty),
	c.AccountNumber,
	st.Name,
	so.DiscountPct
from
	dbo.SalesOrderHeader soh
	join dbo.SalesOrderDetail sod on soh.SalesOrderID = sod.SalesOrderDetailID
	join Sales.Customer c on soh.CustomerID = c.CustomerID
	join Sales.SalesTerritory st on c.TerritoryID = st.TerritoryID
	left join Sales.SpecialOffer so on sod.SpecialOfferID = so.SpecialOfferID
where
	soh.OrderDate > '20080801'
group by
	soh.OrderDate,
	soh.TotalDue,
	soh.Status,
	c.AccountNumber,
	st.Name,
	so.DiscountPct
order by
	soh.OrderDate
option(querytraceon 9481)
set statistics time, xml off
go

That took 17 500 ms on average on my machine, more than 50X times slower! If you look at the plan, you’ll see that a server now using the Nested Loops Join:

我的机器平均花了17 500 毫秒 ,慢了50倍以上! 如果看一下计划,您会看到现在使用嵌套循环联接的服务器:

SQL server 2012 密钥修改_机器学习_04

The reason for that plan shape and slow execution is the 1 row estimate, whereas 939 rows actually returned. That estimate skewed the next operator estimates. The Nested Loops Join input estimate is one row, and the optimizer decided to put the SalesOrderDetail table on the inner side of the Nested Loops – which resulted in more than 100 million of rows to be read!

计划形状和执行缓慢的原因是1行估算,而实际返回了939行 。 该估计值使下一个运营商估计值有所偏差。 Nested Loops Join输入估计为一行,优化器决定将SalesOrderDetail表放在Nested Loops的内侧–从而读取了超过1亿行!

(CE 7.0 Solution (Pre SQL Server 2014))

To address this issue Microsoft has made two trace flags: TF 2389 and TF 2390. The first one enables statistic correction for the columns marked ascending, the second one adds other columns. More comprehensive description of those flags is provided in the post Ascending Keys and Auto Quick Corrected Statistics by Ian Jose. To see the column’s nature, you may use the undocumented TF 2388 and DBCC SHOW_STATISTICS command like this:

为了解决此问题,Microsoft制作了两个跟踪标记: TF 2389和TF 2390 。 第一个对标记为升的列启用统计校正,第二个对其他列进行添加。 Ian Jose在“ 升序键和自动快速更正的统计信息”中提供了对这些标志的更全面描述。 要查看该列的性质,可以使用未记录的TF 2388和DBCC SHOW_STATISTICS命令,如下所示:

-- view column leading type
dbcc traceon(2388)
dbcc show_statistics ('dbo.SalesOrderHeader', 'ix_OrderDate')
dbcc traceoff(2388)

In this case, no surprise, the column leading type is Unknown, 3 other inserts and update statistics should be done to brand the column.

在这种情况下,毫不奇怪,列的前导类型为Unknown,应进行3次其他插入和更新统计以对列进行商标。

SQL server 2012 密钥修改_数据库_05

You may find a good description of this mechanism in the blog post Statistics on Ascending Columns by Fabiano Amorim. As the column branded Unknown we should use both TFs in the old CE to solve the ascending key problem.

您可以在Fabiano Amorim撰写的博客文章《 关于上升的列的统计信息》中找到对该机制的良好描述。 作为列为Unknown的列,我们应该在旧的CE中同时使用两个TF来解决升序的关键问题。

-- Old with TFs
set statistics time, xml on
select
	soh.OrderDate,
	soh.TotalDue,
	soh.Status,
	OrderQty = sum(sod.OrderQty),
	c.AccountNumber,
	st.Name,
	so.DiscountPct
from
	dbo.SalesOrderHeader soh
	join dbo.SalesOrderDetail sod on soh.SalesOrderID = sod.SalesOrderDetailID
	join Sales.Customer c on soh.CustomerID = c.CustomerID
	join Sales.SalesTerritory st on c.TerritoryID = st.TerritoryID
	left join Sales.SpecialOffer so on sod.SpecialOfferID = so.SpecialOfferID
where
	soh.OrderDate > '20080801'
group by
	soh.OrderDate,
	soh.TotalDue,
	soh.Status,
	c.AccountNumber,
	st.Name,
	so.DiscountPct
order by
	soh.OrderDate
option(querytraceon 9481, querytraceon 2389, querytraceon 2390)
set statistics time, xml off
go

This query took the same 250 ms on average on my machine and resulted in the similar plan shape (won’t provide it here, for the space saving). Cool, isn’t it? Yes, it is, in this synthetic example.

该查询在我的计算机上平均花费相同的250毫秒 ,并得出相似的计划形状(此处不提供,以节省空间)。 不错,不是吗? 是的,在此综合示例中。

If you are persistent enough, try to re-run the whole example from the very beginning, commenting the index ix_OrderDate creation (the one marked with the * symbol in the creation script). You will be quite surprised, that those TFs are not helpful in case of the missing index! This is a documented behavior (KB 922063):

如果您有足够的毅力,请尝试从头开始重新运行整个示例,并注释索引ix_OrderDate创建(在创建脚本中标有*符号的索引)。 您会很惊讶,这些TF在缺少索引的情况下没有帮助! 这是已记录的行为( KB 922063 ):

SQL server 2012 密钥修改_算法_06

That means, that automatically created statistics (and I think in most of the real world scenarios the statistics are created automatically) won’t benefit from using these TFs.

这意味着,自动创建的统计信息(我认为在大多数现实情况下,统计信息是自动创建的)不会从使用这些TF中受益。

(CE 12.0 Solution (SQL Server 2014))

To address the issue of Ascending Key in SQL Server 2014 you should do… nothing! This model enhancement is turned on by default, and I think it is great! If we simply run the previous query without any TF, i.e. using the new CE, it will run like a charm. Also, no restriction of having a defined index on that column.

要解决SQL Server 2014中的升序键问题,您应该执行任何操作! 默认情况下,此模型增强功能是打开的,我认为它很棒! 如果我们简单地运行没有任何TF的先前查询,即使用新的CE,它将像一个超级按钮一样运行。 同样,在该列上没有定义的索引也没有限制。

-- New
set statistics time, xml on
select
	soh.OrderDate,
	soh.TotalDue,
	soh.Status,
	OrderQty = sum(sod.OrderQty),
	c.AccountNumber,
	st.Name,
	so.DiscountPct
from
	dbo.SalesOrderHeader soh
	join dbo.SalesOrderDetail sod on soh.SalesOrderID = sod.SalesOrderDetailID
	join Sales.Customer c on soh.CustomerID = c.CustomerID
	join Sales.SalesTerritory st on c.TerritoryID = st.TerritoryID
	left join Sales.SpecialOffer so on sod.SpecialOfferID = so.SpecialOfferID
where
	soh.OrderDate > '20080801'
group by
	soh.OrderDate,
	soh.TotalDue,
	soh.Status,
	c.AccountNumber,
	st.Name,
	so.DiscountPct
order by
	soh.OrderDate
set statistics time, xml off
go

The plan would be the following (adjusted a little bit to fit the page):

计划如下(根据页面情况进行了一些调整):

SQL server 2012 密钥修改_java_07

You may see that the estimated number of rows is not 1 row any more. It is 281.7 rows. That estimate leads to an appropriate plan with Hash Joins, that we saw earlier. If you wonder how this estimation was made – the answer is that in CE 2014 the “out-of-boundaries” values are modeled to belong an average histogram step (trivial histogram step with a uniform data distribution) in case of equality – it is well described in Joe Sack blog post mentioned above. In case of inequality the 30% guess, over the added rows is made (common 30% guess was discussed earlier).

您可能会看到估计的行数不再是1行 。 是281.7行 。 这个估计值导致我们在前面看到的使用Hash Joins的适当计划。 如果您想知道如何进行估算–答案是在CE 2014中将“越界”值建模为在相等的情况下属于平均直方图步长(数据分布均匀的平凡直方图步长)–是上面提到的Joe Sack 博客文章对此进行了很好的描述。 在不等的情况下的30%的猜测,在添加的行的情况下(常见的30%的猜测讨论更早 )。

select rowmodctr*0.3 from sys.sysindexes i where i.name = 'PK_DBO_SalesOrderHeader_SalesOrderID'

The result is 939*0.3 = 281.7 rows. Of course a server uses another, per-column counters, but in this case it doesn’t matter. What is matter that this really cool feature is present in the new CE 2014!

结果是939 * 0.3 = 281.7行 。 当然,服务器使用每个列的另一个计数器,但是在这种情况下没有关系。 新的CE 2014中提供了这个非常酷的功能!

Another interesting thing to note is some internals. If you run the query with the TF 2363 (and the TF 3604 of course) to view diagnostic output, you’ll see that the specific calculator CSelCalcAscendingKeyFilter is used.

需要注意的另一件有趣的事情是内部结构。 如果使用TF 2363(当然还有TF 3604)运行查询以查看诊断输出,则会看到使用了特定的计算器CSelCalcAscendingKeyFilter 。

SQL server 2012 密钥修改_机器学习_08

According to this output, at first the regular calculator for an inequality (or equality with non-unique column) was used. When it estimated zero selectivity, the estimation process realized that some extra steps should be done and re-planed the calculation. I think this is a result of separating the two processes, the planning for computation and the actual computation, however, I’m not sure and need some information from the inside about that architecture enhancement. The re-planed calculator is CSelCalcAscendingKeyFilter calculator that models “out-of-histogram-boundaries” distribution. You may also notice the guess argument, that stands for the 30% guess.

根据此输出,首先使用不等式(或与非唯一列相等)的常规计算器。 当估计零选择性时,估计过程意识到应该做一些额外的步骤并重新计划计算。 我认为这是将计算规划和实际计算这两个过程分开的结果,但是,我不确定,并且需要内部一些有关该体系结构增强的信息。 重新计划的计算器是CSelCalcAscendingKeyFilter计算器, 它对 “超出直方图边界”分布进行建模。 您可能还会注意到guess参数,它代表30%的猜测。

(The Model Variation)

The model variation in that case would be to turn off the ascending key logic. Besides, this is completely undocumented and should not be used in production, I strongly don’t recommend to turn off this splendid mechanism, it’s like buying a ticket and staying at home.

在这种情况下,模型变化将是关闭升序键逻辑。 此外,这是完全没有记录的,不应该在生产中使用,我强烈建议您不要关闭这种出色的机制,就像买票和呆在家里一样。

However, maybe this opportunity will be helpful for some geeky people (like me=)) in their optimizer experiments. To enable the model variation and turn off the ascending key logic you should run the query together with TF 9489.

但是,对于某些讨厌的人(例如me =),他们的优化程序实验可能会有所帮助。 要启用模型变化并关闭升序键逻辑,您应该将查询与TF 9489一起运行。

set statistics time, xml on
select
	soh.OrderDate,
	soh.TotalDue,
	soh.Status,
	OrderQty = sum(sod.OrderQty),
	c.AccountNumber,
	st.Name,
	so.DiscountPct
from
	dbo.SalesOrderHeader soh
	join dbo.SalesOrderDetail sod on soh.SalesOrderID = sod.SalesOrderDetailID
	join Sales.Customer c on soh.CustomerID = c.CustomerID
	join Sales.SalesTerritory st on c.TerritoryID = st.TerritoryID
	left join Sales.SpecialOffer so on sod.SpecialOfferID = so.SpecialOfferID
where
	soh.OrderDate > '20080801'
group by
	soh.OrderDate,
	soh.TotalDue,
	soh.Status,
	c.AccountNumber,
	st.Name,
	so.DiscountPct
order by
	soh.OrderDate
option(querytraceon 9489)
set statistics time, xml off
go

And with TF 9489 we are now back to the nasty Nested Loops plan. I’m sure, due to the statistical nature of the estimation algorithms you may invent the case where this TF will be helpful, but in the real world, please, don’t use it, of course, unless you are guided by Microsoft support!

有了TF 9489,我们现在又回到了讨厌的嵌套循环计划。 我敢肯定,由于估算算法的统计性质,您可能会发明这种TF会有所帮助的情况,但是在现实世界中,请不要使用它,当然,除非得到Microsoft支持的指导, !

That’s all for that post! Next time we will talk about multi-statement table valued functions.

这就是那个帖子的全部! 下次,我们将讨论多语句表值函数。

(Table of contents)

Cardinality Estimation Role in SQL Server

Cardinality Estimation Place in the Optimization Process in SQL Server

Cardinality Estimation Concepts in SQL Server

Cardinality Estimation Process in SQL Server

Cardinality Estimation Framework Version Control in SQL Server

Filtered Stats and CE Model Variation in SQL Server

Join Containment Assumption and CE Model Variation in SQL Server

Overpopulated Primary Key and CE Model Variation in SQL Server

Ascending Key and CE Model Variation in SQL Server

MTVF and CE Model Variation in SQL Server

SQL Server中的基数估计角色

基数估计在SQL Server优化过程中的位置

SQL Server中的基数估计概念

SQL Server中的基数估计过程

SQL Server中的基数估计框架版本控制

SQL Server中的筛选后的统计信息和CE模型变化

在SQL Server中加入包含假设和CE模型变化

SQL Server中人口过多的主键和CE模型的变化

SQL Server中的升序密钥和CE模型变化

SQL Server中的MTVF和CE模型变化

(References)

  • Optimizing Your Query Plans with the SQL Server 2014 Cardinality Estimator 使用SQL Server 2014基数估计器优化查询计划
  • Ascending Keys and Auto Quick Corrected Statistics 升序键和自动快速更正统计
  • Optimizing Your Query Plans with the SQL Server 2014 Cardinality Estimator 使用SQL Server 2014基数估计器优化查询计划
  • Regularly Update Statistics for Ascending Keys 定期更新升序键的统计信息

翻译自: https://www.sqlshack.com/ascending-key-and-ce-model-variation/

sql server 密钥