从稀疏表示到低秩表示(四)

确定研究方向后一直在狂补理论,最近看了一些文章,有了些想法,顺便也总结了representation系列的文章,由于我刚接触,可能会有些不足,愿大家共同指正。

从稀疏表示到低秩表示系列文章包括如下内容:

 

 

 

 

四、Group sparsity 

此部分是上篇的续篇,介绍sparse representation 的改进

 

Group sparsity 

从稀疏表示到低秩表示(四)_凸优化

 

group sparsity或者单纯的sparsity对于一些有物理意义的东西比较好解释,我们通常觉得一些东西的基(basis,or say, feature)和令一些东西的基是不同的,所以可以按照sample分group,或者feature分group。现在group的东西也做得差不多了,从当初的group Lasso到group PCA和GroupNMF(Nonnegative Matrix Factorization). JinguKim et al. “Group Sparsity in Nonnegative Matrix Factorization” 链接:

​http://www.cc.gatech.edu/~hpark/papers/GSNMF_SDM12_Final.pdf​

 

 

1 Why use Group Sparsity?

An observationthat features or data items within a group are expected to share the samesparsity pattern in their latent factor representation.

假设的Group sparsity就是从属于同一个group的数据项或者特征在low-rank 表示中有相似的sparsity pattern.

 

2. Difference fromNMF.

As a variation oftraditional NMF, Group NMF considersgroup sparsity regularization methods for NMF.

 

3. Application.

Dimensionreduction, Noise removal intext mining, bio-informations, blind source separation,computer vision. Group NMF enable natural interpretation of discovered latent factors.

 

4. What is group?

Different types offeatures, such as in CV: pixel values, gradient features, 3D pose features,etc. 同种feature组成一个group

 

5. Related work onGroup Sparsity

5.1. Lasso(TheLeast Absolute Shrinkage and Selection Operator)

l1-normpenalized linear regression

从稀疏表示到低秩表示(四)_凸优化_02

 

5.2. Group Lasso

Group sparsityusing l1,2-norm regularization

从稀疏表示到低秩表示(四)_凸优化_03

 

where the sqrt(pl)termsaccounts for the varying group sizes

 

5.3. Sparse grouplasso

从稀疏表示到低秩表示(四)_凸优化_04

 

5.4. Hierarchical regularizationwith tree structure,2010

R. Jenatton, J. Mairal, G. Obozinski,and F. Bach. “Proximal methods for sparsehierarchical dictionary learning”. ICML 2010

 

5.5. There aresome other works focusing on group sparsity on PCA

 

6 NMF

By incorporating mixed-normregularization in NMF, it is based on l1,q-norm regularization. Regularizationby l1-norm is well-known to promote a sparse representation [31]. When thisapproach is extended to groups of parameters, l1,q-norm has been shown toinduce a sparse representation at the level of groups.

 Affine NMF:extending NMF with an offset vector. AffineNMF is used to simultaneously factorize.

 

Problem to Solve

1 )Consider a matrix X∈ Rm×n .Assume that the rows ofXrepresent features and the columns of Xrepresent data items. 

 

2) In standard NMF, we are interested in discovering two low-rankfactormatrices W and H by minimizing an objective function:

从稀疏表示到低秩表示(四)_ide_05(4)

 

constrain :W>=0 and H>=0

 

3) Group structure and Groupsparsity

In this figure, (a)中group分sample, 对于basis W, 一个group内系数H的sparsity相同; (b)中group分feature,group sparsity体现在latent component matrices的构造中。

从稀疏表示到低秩表示(四)_sed_06

 

As group structure can be found in many other datamining problems, we proceed to discuss how group sparsity can be promoted by employingmixed-norm regularization as follows.

 

4) Formulation with mixed-norm regularization

Suppose the columns of X ∈ Rm×n are divided into B groups as X = (X(1), · · · ,X(B)),  Accordingly, the coefficient matrix is divided into B groupsas H =(H(1), · · · ,H(B)),where H(b) ∈ Rk×n, In group NMF, formula (4) can be written as:

从稀疏表示到低秩表示(四)_解决方法_07

 

为了得到group sparsity, 系数项H加入了mixed-norm regularization term ,采用l1,q-norm,得到:

从稀疏表示到低秩表示(四)_sed_08

 

其中W的F范数是为了防止优化过程中变大, 属于权衡系数,control the strength of each regularization term.

The l1,q-norm of Y ∈ Ra×c is defined by:

从稀疏表示到低秩表示(四)_稀疏表示_09

 

其中,重点讨论。

That is, the l1,q-norm of a matrix isthe sum of vector lq-norms of its rows.

所以,||Y||1,q 的惩罚项希望得到Y中的0行越多越好。在这里,b个类,每一类的X(b)和H(b)不同,所以obj function希望使得H(b)中有尽可能多的0行, 刚好符合我们的groupsparsity。

 

 

5)block coordinate descent (BCD) method

由于有了mixed-norm regularization,所以优化问题难于标准NMF问题,采用block coordinate descent (BCD) method in non-linear optimization,分为BCD method with matrix blocks和BCD methodwith vector blocks。

 

从稀疏表示到低秩表示(四)_凸优化_10

 

(4.3) is solved by non-negativity-constrained least squares(NNLL),Now consider the problem in (4.4), it can be rewritten by

从稀疏表示到低秩表示(四)_ide_11

 

其中第一项可微,导数连续,第二项convex。那么可用一个凸优化解决。

Algo 2是(4.7)的一种解决方法(variant of Nesterov’s first order method),其中主要需要解决的是(4.6)按行更新,可以看作解决:

从稀疏表示到低秩表示(四)_凸优化_12

 

而这个非负约束可以由(4.9)消除掉(其解为(4.8)的全局最优解)

从稀疏表示到低秩表示(四)_sed_13

 

其证明见Reference [1]. 而(4.9)就可以由(4.11) 解决了:

从稀疏表示到低秩表示(四)_解决方法_14

 

where||·||q* isthe dual norm of ||·||q.

q=2时,||·||q*=||·||2

q=∞时,||·||q*=||·||1

从稀疏表示到低秩表示(四)_凸优化_15

 

5) BCD method withvector blocks

That is, a vector variable is minimized at each step fixing allother entries.

          Recent observations indicatethat the vector-block BCD method is also very efficient, often outperformingthe matrix-block BCD method.Accordingly, we develop the vector-block BCD method for (5) as follows.

           In the vector-blockBCD method, optimal solutions to sub-problems with respect to each column of W and each rows of H(1), ··· ,H(b) aresought.

从稀疏表示到低秩表示(四)_sed_16

 

The solution of (4.14) is given as a closed form:

从稀疏表示到低秩表示(四)_稀疏表示_17

 

Sub-problem (4.15) is easily seen to be equivalent to

从稀疏表示到低秩表示(四)_凸优化_18

 

Which is a special case of (4.8) .  Remarkon the two optimizing methods:

Optimization Variables in (5): {WH(1),…,H(B)}

Matrix –Block BCD method: divides the variables into (B+1) blocks: WH(1),…, H(B)

Vector –BlockBCD method: divides the variables into k(B+1) blocks, represented by the columnsof WH(1),…,H(B)

Both methods eventually rely on dual problem (4.11).

 

 

6)result

从稀疏表示到低秩表示(四)_sed_19

从稀疏表示到低秩表示(四)_ide_20

 

6. Reference

1. 2011,Multi-label Learning via Structured Decompostion andgroup sparsity,下载链接:​​http://arxiv.org/pdf/1103.0102.pdf​

2. NMF代码​​http://www.csie.ntu.edu.tw/~cjlin/nmf/​

3. “Algorithmsfor NMF”​​http://hebb.mit.edu/people/seung/papers/nmfconverge.pdf​

4. NMFtoolbox ​​http://cogsys.imm.dtu.dk/toolbox/nmf/​

5. "Group Nonnegative Matrix Factorization for EEG Classification"