从稀疏表示到低秩表示(四)
确定研究方向后一直在狂补理论,最近看了一些文章,有了些想法,顺便也总结了representation系列的文章,由于我刚接触,可能会有些不足,愿大家共同指正。
从稀疏表示到低秩表示系列文章包括如下内容:
四、Group sparsity
此部分是上篇的续篇,介绍sparse representation 的改进
Group sparsity
group sparsity或者单纯的sparsity对于一些有物理意义的东西比较好解释,我们通常觉得一些东西的基(basis,or say, feature)和令一些东西的基是不同的,所以可以按照sample分group,或者feature分group。现在group的东西也做得差不多了,从当初的group Lasso到group PCA和GroupNMF(Nonnegative Matrix Factorization). JinguKim et al. “Group Sparsity in Nonnegative Matrix Factorization” 链接:
http://www.cc.gatech.edu/~hpark/papers/GSNMF_SDM12_Final.pdf
1 Why use Group Sparsity?
An observationthat features or data items within a group are expected to share the samesparsity pattern in their latent factor representation.
假设的Group sparsity就是从属于同一个group的数据项或者特征在low-rank 表示中有相似的sparsity pattern.
2. Difference fromNMF.
As a variation oftraditional NMF, Group NMF considersgroup sparsity regularization methods for NMF.
3. Application.
Dimensionreduction, Noise removal intext mining, bio-informations, blind source separation,computer vision. Group NMF enable natural interpretation of discovered latent factors.
4. What is group?
Different types offeatures, such as in CV: pixel values, gradient features, 3D pose features,etc. 同种feature组成一个group
5. Related work onGroup Sparsity
5.1. Lasso(TheLeast Absolute Shrinkage and Selection Operator)
l1-normpenalized linear regression
5.2. Group Lasso
Group sparsityusing l1,2-norm regularization
where the sqrt(pl)termsaccounts for the varying group sizes
5.3. Sparse grouplasso
5.4. Hierarchical regularizationwith tree structure,2010
R. Jenatton, J. Mairal, G. Obozinski,and F. Bach. “Proximal methods for sparsehierarchical dictionary learning”. ICML 2010
5.5. There aresome other works focusing on group sparsity on PCA
6 NMF
By incorporating mixed-normregularization in NMF, it is based on l1,q-norm regularization. Regularizationby l1-norm is well-known to promote a sparse representation [31]. When thisapproach is extended to groups of parameters, l1,q-norm has been shown toinduce a sparse representation at the level of groups.
Affine NMF:extending NMF with an offset vector. AffineNMF is used to simultaneously factorize.
Problem to Solve
1 )Consider a matrix X∈ Rm×n .Assume that the rows ofXrepresent features and the columns of Xrepresent data items.
2) In standard NMF, we are interested in discovering two low-rankfactormatrices W and H by minimizing an objective function:
(4)
constrain :W>=0 and H>=0
3) Group structure and Groupsparsity
In this figure, (a)中group分sample, 对于basis W, 一个group内系数H的sparsity相同; (b)中group分feature,group sparsity体现在latent component matrices的构造中。
As group structure can be found in many other datamining problems, we proceed to discuss how group sparsity can be promoted by employingmixed-norm regularization as follows.
4) Formulation with mixed-norm regularization
Suppose the columns of X ∈ Rm×n are divided into B groups as X = (X(1), · · · ,X(B)), Accordingly, the coefficient matrix is divided into B groupsas H =(H(1), · · · ,H(B)),where H(b) ∈ Rk×n, In group NMF, formula (4) can be written as:
为了得到group sparsity, 系数项H加入了mixed-norm regularization term ,采用l1,q-norm,得到:
其中W的F范数是为了防止优化过程中变大, 属于权衡系数,control the strength of each regularization term.
The l1,q-norm of Y ∈ Ra×c is defined by:
其中,重点讨论。
That is, the l1,q-norm of a matrix isthe sum of vector lq-norms of its rows.
所以,||Y||1,q 的惩罚项希望得到Y中的0行越多越好。在这里,b个类,每一类的X(b)和H(b)不同,所以obj function希望使得H(b)中有尽可能多的0行, 刚好符合我们的groupsparsity。
5)block coordinate descent (BCD) method
由于有了mixed-norm regularization,所以优化问题难于标准NMF问题,采用block coordinate descent (BCD) method in non-linear optimization,分为BCD method with matrix blocks和BCD methodwith vector blocks。
(4.3) is solved by non-negativity-constrained least squares(NNLL),Now consider the problem in (4.4), it can be rewritten by
其中第一项可微,导数连续,第二项convex。那么可用一个凸优化解决。
Algo 2是(4.7)的一种解决方法(variant of Nesterov’s first order method),其中主要需要解决的是(4.6)按行更新,可以看作解决:
而这个非负约束可以由(4.9)消除掉(其解为(4.8)的全局最优解)
其证明见Reference [1]. 而(4.9)就可以由(4.11) 解决了:
where||·||q* isthe dual norm of ||·||q.
q=2时,||·||q*=||·||2
q=∞时,||·||q*=||·||1
5) BCD method withvector blocks
That is, a vector variable is minimized at each step fixing allother entries.
Recent observations indicatethat the vector-block BCD method is also very efficient, often outperformingthe matrix-block BCD method.Accordingly, we develop the vector-block BCD method for (5) as follows.
In the vector-blockBCD method, optimal solutions to sub-problems with respect to each column of W and each rows of H(1), ··· ,H(b) aresought.
The solution of (4.14) is given as a closed form:
Sub-problem (4.15) is easily seen to be equivalent to
Which is a special case of (4.8) . Remarkon the two optimizing methods:
Optimization Variables in (5): {W, H(1),…,H(B)}
Matrix –Block BCD method: divides the variables into (B+1) blocks: W, H(1),…, H(B)
Vector –BlockBCD method: divides the variables into k(B+1) blocks, represented by the columnsof W, H(1),…,H(B)
Both methods eventually rely on dual problem (4.11).
6)result
6. Reference
1. 2011,Multi-label Learning via Structured Decompostion andgroup sparsity,下载链接:http://arxiv.org/pdf/1103.0102.pdf
2. NMF代码http://www.csie.ntu.edu.tw/~cjlin/nmf/
3. “Algorithmsfor NMF”http://hebb.mit.edu/people/seung/papers/nmfconverge.pdf
4. NMFtoolbox http://cogsys.imm.dtu.dk/toolbox/nmf/
5. "Group Nonnegative Matrix Factorization for EEG Classification"