凸集和凸函数

定义与基本性质


  • 若\(E\subseteq R^m\)满足,\(\forall x,y\in E,\forall t\in [0,1]\),有\((1-t)x+ty\in E\),则称\(E\)是\(R^m\)上的凸集.
  • 若\(E\)是\(R^m\)上的凸集,称\(f:E\mapsto R\)是凸(凹)函数如果\(f((1-t)x+ty)\le(\ge)(1-t)f(x)+tf(y),\forall x,y\in E,\forall t\in [0,1]\).
  • \(E\)是\(R^m\)上凸集,则\(f\)是凸函数\(\Leftrightarrow \{(x,y)|y\ge f(x),x\in E\}\)是\(R^{m+1}\)上的凸集.
  • 由凸函数的定义,\(\forall \{t_k\}^{n}_{k=1}\)满足\(t_k\ge 0,\sum_{k=1}^{n}t_k=1\),则\(\sum_{k=1}^{n}t_kx_k\in E\)且\(f(\sum_{k=1}^{n}t_kx_k)\le \sum_{k=1}^{n}t_kf(x_k)\)
  • 若\(E\)是凸集,\(\overline{E}\)也是凸集
  • 若\(\vec{a_1},\vec{a_2}\dots \vec{a_n}\in R^m\),称\(\sum_{i=1}^{n}t_i\vec{a_i},\forall 1\le i \le n,t_i\ge 0,\sum_{i=1}^{n}t_i=1\)为\(\vec{a_1},\vec{a_2}\dots \vec{a_n}\)的一个凸组合,由此可以引出凸包(Convex Hull)的概念.
  • 若\(E\subseteq R^m\),称\(Conv(E)\)为\(E\)的凸包$$Conv(E):={\sum_{i=1}^{n}t_ix_i|\forall 1\le i\le n,x_i\in E,\sum_{i=1}^{n}t_i=1,n\in N}$$.不难验证,\(Conv(E)\)是包含\(E\)的最小凸集.

若\(\vec{a_1},\vec{a_2}\dots \vec{a_n}\in R^m\),则\(Conv(\{\vec{a_1},\vec{a_2}\dots \vec{a_n}\})\)是紧凸集.

事实上,\(T=\{\vec{t}=(t_1,t_2\dots t_m)|\sum_{i=1}^{m}t_i=1\}\)是紧凸集.同时定义\(\Phi(\vec{t})=\sum_{i=1}^{m}t_i\vec{a_i}\),则\(\Phi:T\mapsto Conv(\{\vec{a_1},\vec{a_2}\dots \vec{a_n}\})\)是连续的一一映射.

  • 若\(E\)是凸集,称\(x\in E\)是\(E\)的极点当且仅当\(x\)不能表示为\(E\)中其他点的凸组合.即\(\forall t\in ]0,1[,\forall a,b\in E,x\ne ta+(1-t)b\).

由定义可知,若\(x\in \mathring{E}\),\(x\)不可能是\(E\)的极点.因此,极点只能在\(\partial E\)上.

  • 若\(K\subseteq R^m\)是紧凸集,则\(K\)一定有极点,若\(K\)不是单点集,\(K\)至少有两个极点.
  • 若\(K\)的全部极点是\(\vec{a_1},\vec{a_2}\dots \vec{a_n}\),则\(K=Conv(\{\vec{a_1},\vec{a_2}\dots \vec{a_n}\})\).
  • \(\prod_{i=1}^{m}[a_i,b_i]\)的全体极点是全体顶点.

凸函数的上确界


交点引理

设\(E\subseteq R^m\)是紧集,\(x\in K,y\in R^m,x\ne y\),则从\(y\)出发经过\(x\)的射线必与\(\partial K\)有不同于\(y\)的交点.即\(\rho:=\max\{t\ge 1|y+t(x-y)\in K\}\)是良定义的且\(y+\rho(x-y)\in \partial K\).特别地,如果\(x\in \mathring K\),则\(\rho\ge \frac{r}{1+2|x-y|}>1\),其中\(r\)满足\(B(x,r)\subseteq K\).

\[\begin{array}{l} ◂\ 定义I=\{t\ge 1|y+t(x-y)\in K\}\\ 同时不难发现(t-1)|x-y|=|y+t(x-y)-x|\le diam(K)\Rightarrow t\le 1+\frac{diam(K)}{|x-y|}\\ I有上界\Rightarrow I有上确界,又K是闭集\Rightarrow maxI存在\\ 显然如果y+\rho(x-y)\in \mathring{K},\rho可以更大\\ 由此也可以发现当x\in \mathring{K}时,\rho>1. \ ▸ \end{array} \]


极值问题

\(K \subseteq R^m\)是紧凸集,\(f:K\mapsto R\)是凸函数.则:

  • \(\sup_{x\in K} f(x)=\sup_{x\in \partial K}f(x)\).
  • 特别地,若\(f\)在\(K\)上连续,则有\(\max_{x\in K} f(x)=\max_{x\in \partial K}f(x)\).即最值可以在边界上达到.
  • 若\(\exists x_0\in \mathring{K},f(x_0)=\max_{x\in K}f(x)\).则\(f(x)\equiv c\).

对于凹函数,将上面的\(sup\)换成\(inf\),\(max\)换成\(min\).

\[\begin{array}{l} ◂\ 主要思路:\forall x\in \mathring{K},y\in \partial K,\exist z\in \partial K,y\ne z,f(x)的值可以由f(y)与f(z)控制\\ 具体地,记z=\rho(x-y)+y,\rho>1\Rightarrow f(x)\le \frac{\rho-1}{\rho}f(y)+\frac{1}{\rho}z \Rightarrow f(x)\le \sup_{x\in \partial K}f(x)\\ 由紧集和f的连续性,容易推出第二三条. \ ▸ \end{array} \]

紧集条件不能去除,一个反例是\(K=\{(x_1,x_2\dots x_m)\in R^m|x_i\ge 0,\forall 1\le i\le m,\sum_{i=1}^{m}x_i<1\}\),\(f(x)=\frac{1}{1-\sum_{i=1}^{m}x_i}\).


紧凸多面体的极值

若\(K=conv(\vec{a_1},\vec{a_2}\dots \vec{a_n})\subseteq R^m\),\(f:K\mapsto R\)是凸函数,则\(f\)在\(K\)上有界.即\(\max_{1\le i\le n}f(\vec{a_i})+n\min_{1\le i\le n}[f(\vec{c})-f(\vec{a_i})]\le f(x)\le \max_{1\le i\le n}f(\vec{a_i})\).其中\(c=\frac{1}{n}\sum_{i=1}^{n}\vec{a_i}\).特别地,\(\max_{x\in K}f(x)=\max_{1\le i\le n}f(\vec{a_i})\).


凸函数的下界

设\(f:E\mapsto R\)为凸函数,\(E\)是有界凸集且\(\mathring{E}\ne \emptyset\),则\(\inf_{x\in E}f(x)>-\infty\).

\[\begin{array}{l} ◂\ 取\vec{c}=(c_1,c_2\dots c_m)\in \mathring{E}\Rightarrow \exist \delta>0,K:=\prod_{i=1}^{m}[c_i-\delta,c_i+\delta]\subseteq E\\ K是凸多面体\Rightarrow \exist M>0,\forall y\in K,-M\le f(y)\le M\\ 取x\in E,x\notin K,\exist \rho>1,\rho(\vec{c}-x)+x\in \partial K,记z=\rho(\vec{c}-x)+x.\\ \Rightarrow f(\vec{c})\le \frac{\rho-1}{\rho}f(x)+\frac{1}{\rho}f(z)\Rightarrow f(x)\ge -\frac{\rho+1}{\rho-1}M.\\ 而\rho-1有上界因为\vec{c}=\frac{\rho-1}{\rho}x+\frac{1}{\rho}z\Rightarrow\frac{\rho-1}{\rho}|x-z|=\frac{1}{\rho}|\vec{c}-z|\Rightarrow \rho-1=\frac{|\vec{c}-z|}{x-z}\ge \frac{\delta}{2diam(E)}\\ \Rightarrow f(x)有界. \ ▸ \end{array} \]

以上\(\mathring{E}\ne \emptyset\)的条件可以去除.

考虑\(\mathring{E}= \emptyset\)的情形:

  • \(E\)为单点集,命题成立.
  • \(E\)不是单点集,\(\exist x\in E\)及\(1\le k<m\)维线性空间\(W\),\(E\subseteq x+W\)且\(E-x\)对于\(W\)而言内部不为空.

\[\begin{array}{l} ◂\ 令k=\max\{p|\exist x_0,x_1\dots x_p\in E使得x_1-x_0,x_2-x_0\dots x_p-x_0线性无关\}\\ 显然k\le m,而如果k=m,即\exist x_0,x_1\dots x_m\in E使得x_1-x_0,x_2-x_0\dots x_m-x_0线性无关.\\ 考虑集合S=x_0+Conv(\{x_1-x_0,x_2-x_0\dots x_m-x_0\}).\\ 将x_1-x_0,x_2-x_0\dots x_m-x_0作为R^m的一组基.有坐标映射\phi:R^m\mapsto R^m.\\ \forall x\in R^m,x=c_1(x_1-x_0)+c_2(x_2-x_0)\dots+c_m(x_m-x_0),\phi(x)=(c_1,c_2\dots c_m).\\ 不难发现S有内点等价于\phi(S)有内点\Rightarrow \mathring{S}\ne \emptyset,矛盾\Rightarrow k<m.\\ 现在有x_0,x_1\dots x_p\in E使得x_1-x_0,x_2-x_0\dots x_p-x_0线性无关.\\ 由k的定义,\forall x\in E,x\in Span(x_1-x_0,x_2-x_0\dots x_p-x_0).\\ 由上面的证明不难看出E-x_0对于W而言内部不为空.\\ 同时,定义\hat{E}=\{\phi(x)|x\in E\},g(\phi(x))=f(x)\Rightarrow g是定义在\hat{E}上的凸函数\Rightarrow g有下界\Rightarrow f有下界. \ ▸ \end{array} \]


凸函数在局部是Lipschitz函数

若\(f:E\mapsto R^m\)是凸函数,对\(x_0\in E\),定义\(I(x_0,\delta)=\prod_{i=1}^{m}[x+i-\delta,x_i+\delta]\).

  • 若\(I(x_0,2\delta)\subseteq E\),则\(f\)在\(I(x_0,\delta)\)上有界且满足Lipschitz条件.

考虑\(I(x_0,\delta)\)中两点\(x,y\)和\(I(x_0,2\delta)\)中一点\(z\)满足\(x+\frac{\delta(x-y)}{\sqrt{M}|x-y|}\),其中\(M=\sup_{x\in E}|f(x)|\),\(f(x)-f(y)\)由\(f(y)-f(z)\)控制,通过使得其只与\(|x-y|\)相关,再由\(xy\)的对称性确定\(|f(x)-f(y)|\)满足Lipschitz条件.

  • 若\(E\)是开集,\(K\subset E\)是紧集,则\(f\)在\(K\)上满足Lipschitz条件.

结合上一条,对\(x\in K,\exist \delta_x>0,I(x,\delta_x)\subseteq E\),取所有\(\mathring{I(x,\delta_x)}\)得到\(K\)的有限开覆盖.

  • \(f\)在\(\mathring{E}\)上连续.

凸投影定理和凸函数的支撑平面


凸投影定理

设\(K\subseteq R^m\)为非空的闭凸集,则有:

  • \(\forall x\in R^m,\exist x^*\in K,dist(x,K)=dist(x-x^*)\),且\(\forall y\in K,(x-x^*,y-x^*)\le 0\),即\(x-x^*\)与\(y-x^*\)的夹角大于\(\frac{\pi}{2}\).
  • 引入垂足映射\(P:R^m\mapsto K,P(x)=x^*,\forall x\in R^m\).则有\(\forall x,y\in R^m,|P(x)-P(y)|\le |x-y|.\)

不妨先假定\(x\notin K\),则有\(dist(x,x^*)\ne 0\).

先证明\(P\)是良定义的,假设\(\exist x\in R^m,x_1^*,x_2^*\in K\)满足\(dist(x,K)=dist(x,x^*_1)=dist(x,x^*_2)\).

取\(z=\frac{1}{2}(x^*_1+x^*_2)\in K\),有\(|x-z|\le |x-x^*_1|\Rightarrow |x-z|=|x-x^*_1|\).

由平行四边形法则,\(2(|x-x^*_1|^2+|x-x^*_2|^2)=|x^*_1-x^*_2|^2+4|x-z|^2\).

\(\Rightarrow|x^*_1-x^*_2|^2=0 \Rightarrow x^*_1=x^*_2\).

固定\(x\in R^m,y\in K\),定义\(f:[0,1]\mapsto R,f(t)=|x-x^*-t(y-x^*)|^2,\forall t\in[0,1]\).

\(\forall t\in ]0,1]\),\(x^*+t(y-x^*)\in K\),由\(x^*\)的唯一性,可知\(f(t)>f(0)\).

\(\Rightarrow f'_+(0)\ge 0 \Rightarrow -2t(x-x^*,y-x^*)\ge 0 \Rightarrow (x-x^*,y-x^*)\le 0\).

\(\forall x,y\in R^m,|P(x)-P(y)|^2=(P(x)-P(y),P(x)-P(y))=(P(x)-x,P(x)-P(y))+(x-y,P(x)-P(y))+(y-P(y),P(x)-P(y))\)且有\((P(x)-x,P(x)-P(y))\le 0,(y-P(y),P(x)-P(y))\le 0\).

\(\Rightarrow |P(x)-P(y)|^2\le (x-y,P(x)-P(y))\).

\(\Rightarrow |P(x)-P(y)|\le |x-y|\).

  • 设\(E\subseteq R^m\)是闭凸集,\(f\in C(E;R^n)\),记\(F=f\circ P\)则有\(F|_E=f,F\in C(r^m;R^n)\).
分离性

设\(K\subseteq R^m\)是闭凸集,\(x\notin K\),则\(\exist s\in R^m,s\ne 0,(s,x)>\sup_{y\in K}(s,y)\).即在\(s\)的方向上,\(x\)在\(K\)的上方.

令$s=x-P(x)\Rightarrow \forall y\in K,(s,y-P(x))\le 0 \Rightarrow (s,y-x+s)\le 0 \Rightarrow (s,x)\ge |s|^2+(s,y) $.

分离性的推论

若\(K\)是凸集,\(x\in \partial K\),则\(\exist s\in R^m,s \ne 0\)满足\((s,x)\ge (s,y),\forall y\in K\).

\(\partial K=\partial \overline{K} \Rightarrow \exist \{x_n\}\subseteq \overline{K}^c,x_n\to x\).

由分离性,\(\exist s_n\ne 0,|s_n|=1\)满足\((s_n,x_n)\ge (s,y),\forall y\in K\).

\(\{s_n\}\)有收敛到\(s\)的收敛子列\(\Rightarrow (s,x)\ge (s,y),\forall y\in K\).


凸函数的支撑平面

若\(f:E\subseteq R^m\mapsto R,f\)是凸函数,则\(\forall x\in \mathring{E},\exist v(x)\in R^m\)满足\(\forall y\in E,f(y)\ge f(x)+v(x)\cdot (y-x)\).其中\(v(x)\)称为\(x\)的支撑平面.

特别地,\(m=1\)时,\(v(x)\)介于\(f'_-(x)\)与\(f'_+(x)之间\).\(v(x)\)唯一\(\Leftrightarrow f\)是可微的.

记\(K=\{(x,z)|z\ge f(x)\}是R^{m+1}\)上的凸集\(\Rightarrow (x,f(x))\in \partial K\).由分离性的推论,\(\exist (s,\alpha)\in R^{m+1},(s,\alpha)\ne 0\)且\((s,x)+\alpha f(x)\ge (s,y)+\alpha r,\forall (y,r)\in K\).

若\(\alpha >0\),令\(r\to +\infty\),矛盾.

若\(\alpha =0\),则\(s\ne 0\).由\(x\in \mathring{E},\exist \delta >0,x+\delta s\in E\).

\(\Rightarrow (s,x)+\alpha f(x)\ge (s,x+\delta s)+\alpha f(x+\delta s) \Rightarrow 0<\delta|s|\le \alpha (f(x)-f(x+\delta s)) \Rightarrow \alpha\ne 0\).矛盾.

所以\(\alpha<0\).那么\(f(y)\ge \frac{1}{|\alpha|}(s,y-x)+f(x)=f(x)+(\frac{s}{|\alpha|},y-x)\).

\((s,\alpha)\)依赖于\(x\Rightarrow v(x)=\frac{s}{|\alpha|}\).


分离性与支撑平面的几何解读

  • 定义\(R^m\)上的超平面:若\(\xi\in R^m \xi\ne 0,t\in R\),则集合\(H:=\{x\in R^m|x\cdot \xi =t\}\)称为\(R^m\)上的超平面.相对该超平面,定义\(H^+=\{x\in R^m|x\cdot \xi\ge t\},H^-=\{x\in R^m|x\cdot \xi\le t\}\),称为对应超平面的两个半空间.此时,\(\xi\)是平面\(H\)的法向量.
    从超平面的角度解释上面的两个结论:
  • \(K\)是闭凸集,\(x\notin K\),令\(\xi=x-P(x)\).由凸投影定理,$\exist t\in R,(x,\xi)>t>\sup_{y\in K}(y,\xi)\Rightarrow x\in H^+,K\subset H^- $.
  • \(K\)是凸集,\(x_0\in \partial K\),则\(\exist \xi\in R^m,\xi\ne 0\)满足\((\xi,x_0)\ge (\xi,y),\forall y\in K\).
    若令\(H=\{x\in R^m|(\xi,x)=(\xi,x_0)\}\)则有\(x_0\in \partial K,x_0\in H\)且\(K\subseteq H^-\).满足这两个条件的平面\(H\)称为\(K\)的支撑平面.
  • 从几何的角度来看凸函数的支撑平面.
  • 当\(m=1\)时,有\(f(y)\ge f(x)+v(x)(y-x),\forall y\in E\).此时的支撑平面是一条直线\(L\),可以认为\(L\)是由法向量\((v(x),-1)\)和点\((x,f(x))\)确定.此时沿着\((v(x),-1)\)来看我们有\(\{(y,f(y)|y\in E)\}\subseteq L^-\).
  • \(m\le 2\)时有\(R^{m+1}=\{(x,z)|x\in R^m,z\in R\}\).
    由凸集的支撑平面可知存在\(\vec{\xi}\)以及超平面\(H\)满足\(\{(y,f(y))|y\in E\}\subseteq H^-\).
    由于\(E=\{(y,z)|y\in E,z\ge f(y)\}\).有\(\vec{\xi}\cdot \vec{e_{m+1}}\le 0\).其中\(\vec{e_{m+1}}\)是最后一维的单位向量.
    由于\(\vec{\xi}\ne 0\),我们可以对其最后一维归一化,于是有\(\vec{\xi}=(v,-1)\).
    从而\(H\)由\((v,-1)\)与点\((x,f(x))\)确定.即\(\forall (y,z)\in R^{m+1},(y,z)\in H\Rightarrow (y-x,z-f(x))\cdot (v,-1)=0\Rightarrow z=f(x)+v(y-x)\).
    因此\(\{(y,f(y))|y\in E\}\subseteq H^-\Rightarrow f(y)\ge f(x)+v(y-x)\).