主要讲述直方图与kernel density estimation,参考维基百科中的经典论述,从直方图和核密度估计的实现对比来说明这两种经典的非参数密度估计方法,具体的细节不做深入剖析。
In statistics, kernel density estimation (KDE) is a non-parametric way to estimate the probability density function of a random variable. Kernel density estimation is a fundamental data smoothing problem where inferences about the population are made, based on a finite data sample. In some fields such as signal processing and econometrics it is also termed the Parzen–Rosenblatt window method, after Emanuel Parzen and Murray Rosenblatt, who are usually credited with independently creating it in its current form.
Let (x1, x2, …, xn) be an independent and identically distributed sample drawn from some distribution with an unknown density ƒ. We are interested in estimating the shape of this function ƒ. Its kernel density estimator is
f^h(x)=1n∑i=1nKh(x−xi)=1nh∑i=1nK(x−xih)
where K(∙) is the kernel — a non-negative function that integrates to one and has mean zero — and h > 0 is a smoothing parameter called the bandwidth. A kernel with subscript h is called the scaled kernel and defined as Kh(x)=1hK(x/h). Intuitively one wants to choose h as small as the data allow, however there is always a trade-off between the bias of the estimator and its variance; more on the choice of bandwidth below.
A range of kernel functions are commonly used: uniform, triangular, biweight, triweight, Epanechnikov, normal, and others. The Epanechnikov kernel is optimal in a mean square error sense,[3] though the loss of efficiency is small for the kernels listed previously,[4] and due to its convenient mathematical properties, the normal kernel is often used K(x)=ϕ(x), where ϕ is the standard normal density function.
The construction of a kernel density estimate finds interpretations in fields outside of density estimation. For example, in thermodynamics, this is equivalent to the amount of heat generated when heat kernels (the fundamental solution to the heat equation) are placed at each data point locations xi. Similar methods are used to construct discrete Laplace operators on point clouds for manifold learning.
Kernel density estimates are closely related to histograms, but can be endowed with properties such as smoothness or continuity by using a suitable kernel. To see this, we compare the construction of histogram and kernel density estimators, using these 6 data points: x1=−2.1,x2=−1.3,x3=−0.4,x4=1.9,x5=5.1,x6=6.2. For the histogram, first the horizontal axis is divided into sub-intervals or bins which cover the range of the data. In this case, we have 6 bins each of width 2. Whenever a data point falls inside this interval, we place a box of height 1/12. If more than one data point falls inside the same bin, we stack the boxes on top of each other.
For the kernel density estimate, we place a normal kernel with variance 2.25 (indicated by the red dashed lines) on each of the data points x_i. The kernels are summed to make the kernel density estimate (solid blue curve). The smoothness of the kernel density estimate is evident compared to the discreteness of the histogram, as kernel density estimates converge faster to the true underlying density for continuous random variables.
我们通过这个例子来分析一下直方图方法与核密度估计方法的异同之处:我们利用下面的6个数据点来做:
x1=−2.1,x2=−1.3,x3=−0.4,x4=1.9,x5=5.1,x6=6.2
直方图的方法是将x根据数据的大致范围分为一系列的bin,以上面的6个数据点为例,可将范围设置为(−4,8),每个bin的长度为2,即可分为[−4,−2),[−2,0),[0,2)[2,4)[4,6)[6,8)共6个bin,每个bin的高度初始值设置为0。然后遍历所有的样本点,判定其落在哪一个bin之上,并在相应的bin的高度上加1N,N是样本个数,这里等于6。从而就得到了上图所示的直方图显示。
核密度估计的方法生成的密度估计是一个连续的光滑的曲线,具体的方法是在对应的数据点上放置一个kernel函数,然后将所有数据点上的kernel叠加在一起,就可以构成一个光滑的密度函数。以上面的样本为例,上图右边的图中红色的虚线表示的是在对应6个数据点上放置的kernel函数,而蓝色的线代表的就是将所有的kernel函数叠加在一起构成的密度函数。
从两者的对比来看,直方图生成的离散的密度估计,而kernel则等效于将直方图采用kernel函数进行了平滑。
2015-8-28
艺少