Semi-Supervised Few-Shot Learning Via Dependency Maximization and Instance Discriminant Analysis

Hou, Zejiang; Kung, Sun-Yuan

doi:10.1007/s11265-022-01796-x

Semi-Supervised Few-Shot Learning Via Dependency Maximization and Instance Discriminant Analysis

Published: 02 August 2022

Volume 95, pages 13–24, (2023)
Cite this article

Journal of Signal Processing Systems Aims and scope Submit manuscript

366 Accesses
Explore all metrics

Abstract

We study the few-shot learning (FSL) problem, where a model learns to recognize new objects with extremely few labeled training data per category. Most of previous FSL approaches resort to the meta-learning paradigm, where the model accumulates inductive bias through learning many training tasks so as to solve a new unseen few-shot task. In contrast, we propose a simple semi-supervised FSL approach to exploit unlabeled data accompanying the few-shot task for improving few-shot performance. (i) Firstly, we propose a Dependency Maximization method based on the Hilbert-Schmidt norm of the cross-covariance operator, which maximizes the statistical dependency between the embedded features of those unlabeled data and their label predictions, together with the supervised loss over the support set. We then use the obtained model to infer the pseudo-labels of the unlabeled data. (ii) Furthermore, we propose an Instance Discriminant Analysis to evaluate the credibility of each pseudo-labeled example and select the most faithful ones into an augmented support set to retrain the model as in the first step. We iterate the above process until the pseudo-labels of the unlabeled set become stable. Our experiments demonstrate that the proposed method outperforms previous state-of-the-art methods on four widely used few-shot classification benchmarks, including mini-ImageNet, tiered-ImageNet, CUB, CIFARFS, as well as the standard few-shot semantic segmentation benchmark PASCAL-5$^{i}$.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

¥17,985 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (Japan)

Instant access to the full article PDF.

Institutional subscriptions

Self-supervised pairwise-sample resistance model for few-shot classification

Article 19 April 2023

When Does Self-supervision Improve Few-Shot Learning?

Self-Supervision Can Be a Good Few-Shot Learner

References

Chen, W.-Y., Liu, Y.-C., Kira, Z., Wang, Y.-C. F., & Huang, J.-B. (2019). A closer look at few-shot classification. ICLR.
Raina, R., Battle, A., Lee, H., Packer, B., & Ng, A. Y. (2007). Self-taught learning: transfer learning from unlabeled data. ICML.
Antoniou, A., Edwards, H., & Storkey, A. (2018). How to train your maml. arXiv preprint arXiv:1810.09502
Finn, C., Abbeel, P., & Levine, S. (2017). Model-agnostic meta-learning for fast adaptation of deep networks. ICML.
Rusu, A. A., Rao, D., Sygnowski, J., Vinyals, O., Pascanu, R., Osindero, S., & Hadsell, R. (2019). Meta-learning with latent embedding optimization. arXiv preprint arXiv:1807.05960
Sun, Q., Liu, Y., Chua, T.-S., & Schiele, B. (2019). Meta-transfer learning for few-shot learning. CVPR.
Vinyals, O., Blundell, C., Lillicrap, T., Wierstra, D., et al. (2016). Matching networks for one shot learning. NeurIPS.
Snell, J., Swersky, K., & Zemel, R. (2017). Prototypical networks for few-shot learning. NeurIPS.
Ye, H.-J., Hu, H., Zhan, D.-C., & Sha, F. (2020). Few-shot learning via embedding adaptation with set-to-set functions. CVPR.
Hou, R., Chang, H., Bingpeng, M., Shan, S., & Chen, X. (2019). Cross attention network for few-shot classification. NeurIPS.
Sung, F., Yang, Y., Zhang, L., Xiang, T., Torr, P. H., & Hospedales, T. M. (2018). Learning to compare: Relation network for few-shot learning. CVPR.
Bateni, P., Goyal, R., Masrani, V., Wood, F., & Sigal, L. (2020). Improved few-shot visual classification. CVPR.
Zhang, C., Cai, Y., Lin, G., & Shen, C. (2020). Deepemd: Few-shot image classification with differentiable earth mover’s distance and structured classifiers. CVPR.
Simon, C., Koniusz, P., Nock, R., & Harandi, M. (2020). Adaptive subspaces for few-shot learning. CVPR.
Gao, H., Shou, Z., Zareian, A., Zhang, H., & Chang, S.-F. (2018). Low-shot learning via covariance-preserving adversarial augmentation networks. NeurIPS.
Li, K., Zhang, Y., Li, K., & Fu, Y. (2020). Adversarial feature hallucination networks for few-shot learning. CVPR.
Zhang, R., Che, T., Ghahramani, Z., Bengio, Y., & Song, Y. (2018). Metagan: An adversarial approach to few-shot learning. NeurIPS.
Liu, Y., Lee, J., Park, M., Kim, S., Yang, E., Hwang, S. J., & Yang, Y. (2018). Learning to propagate labels: Transductive propagation network for few-shot learning. arXiv preprint arXiv:1805.10002
Rodríguez, P., Laradji, I., Drouin, A., & Lacoste, A. (2020). Embedding propagation: Smoother manifold for few-shot classification. ECCV.
Hu, S. X., Moreno, P. G., Xiao, Y., Shen, X., Obozinski, G., Lawrence, N. D., & Damianou, A. (2020). Empirical bayes transductive meta-learning with synthetic gradients. ICLR.
Dhillon, G. S., Chaudhari, P., Ravichandran, A., & Soatto, S. (2020). A baseline for few-shot image classification. ICLR.
Boudiaf, M., Masud, Z. I., Rony, J., Dolz, J., Piantanida, P., & Ayed, I. B. (2020). Transductive information maximization for few-shot learning. NeurIPS.
Li, X., Sun, Q., Liu, Y., Zhou, Q., Zheng, S., Chua, T.-S., & Schiele, B. (2019). Learning to self-train for semi-supervised few-shot classification. NeurIPS.
Ren, M., Triantafillou, E., Ravi, S., Snell, J., Swersky, K., Tenenbaum, J. B., Larochelle, H., & Zemel, R. S. (2018). Meta-learning for semi-supervised few-shot classification. ICLR.
Wang, Y., Xu, C., Liu, C., Zhang, L., & Fu, Y. (2020). Instance credibility inference for few-shot learning. CVPR.
Baker, C. R. (1973). Joint measures and cross-covariance operators. Transactions of the American Mathematical Society.
Gretton, A., Bousquet, O., Smola, A., & Schölkopf, B. (2005). Measuring statistical dependence with hilbert-schmidt norms. ALT.
Ziko, I., Dolz, J., Granger, E., & Ayed, I. B. (2020). Laplacian regularized few-shot learning. ICML.
Liu, J., Song, L., & Qin, Y. (2020a). Prototype rectification for few-shot learning. ECCV.
Lichtenstein, M., Sattigeri, P., Feris, R., Giryes, R., & Karlinsky, L. (2020). Tafssl: Task-adaptive feature sub-space learning for few-shot classification. ECCV.
Lee, K., Maji, S., Ravichandran, A., & Soatto, S. (2019). Meta-learning with differentiable convex optimization. CVPR.
Mangla, P., Kumari, N., Sinha, A., Singh, M., & Krishnamurthy, B. (2020). Balasubramanian (Vol. N). Manifold mixup for few-shot learning. WACV: Charting the right manifold.
Google Scholar
Ravi, S., & Larochelle, H. (2017). Optimization as a model for few-shot learning. ICLR.
Wah, C., Branson, S., Welinder, P., Perona, P., & Belongie, S. (2011). The caltech-ucsd birds-200-2011 dataset. Computation and Neural Systems Technical Report.
Zagoruyko, S., & Komodakis, N. (2016). Wide residual networks. BMVC.
Zhang, C., Lin, G., Liu, F., Yao, R., & Shen, C. (2019b). Canet: Class-agnostic segmentation networks with iterative refinement and attentive few-shot learning. CVPR.
Zhang, C., Lin, G., Liu, F., Guo, J., Wu, Q., & Yao, R. (2019a). Pyramid graph networks with connection attentions for region-based one-shot semantic segmentation. CVPR.
Liu, W., Zhang, C., Lin, G., & Liu, F. (2020b). Crnet: Cross-reference networks for few-shot segmentation. CVPR.
Gairola, S., Hemani, M., Chopra, A., & Krishnamurthy, B. (2020). Simpropnet: Improved similarity propagation for few-shot image segmentation. arXiv preprint arXiv:2004.15014
Yang, Y., Meng, F., Li, H., Wu, Q., Xu, X., & Chen, S. (2020b). A new local transformation module for few-shot segmentation. ICMM.
Yang, B., Liu, C., Li, B., Jiao, J., & Ye, Q. (2020a). Prototype mixture models for few-shot semantic segmentation. ECCV.
Liu, Y., Zhang, X., Zhang, S., & He, X. (2020c). Part-aware prototype network for few-shot semantic segmentation. ECCV.
Everingham, M., Van Gool, L., Williams, C. K., Winn, J., & Zisserman, A. (2010). The pascal visual object classes (voc) challenge. IJCV.
Zhao, H., Shi, J., Qi, X., Wang, X., & Jia, J. (2017). Pyramid scene parsing network. CVPR.
Merikoski, J. K., Sarria, H., & Tarazaga, P. (1994). Bounds for singular values using traces. Linear Algebra and its Applications, 210, 227–254.
Article MathSciNet MATH Google Scholar
Von Neumann, J. (1937). “Some Matrix-Inequalities and Metrization of Matrix-Space. Rev: Tomsk. Univ.
Horn, R. A., & Johnson, C. R. (2012). Matrix analysis. Cambridge University Press.
Book Google Scholar

Download references

Author information

Authors and Affiliations

Princeton University, Princeton, USA
Zejiang Hou & Sun-Yuan Kung

Authors

Zejiang Hou
View author publications
You can also search for this author in PubMed Google Scholar
Sun-Yuan Kung
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zejiang Hou.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Before we provide the proof for Theorem 3, we list two useful lemmas that are used repeatedly in the following.

Lemma 1

([45]) The non-increasingly ordered singular values of a matrix $\mathbf {M}$ obey $0\le \sigma _i\le \dfrac{\Vert M\Vert _F}{\sqrt{i}}$, where $\Vert \cdot \Vert _F$ denotes the matrix Frobenius norm.

Lemma 2

([46]) Let $\sigma _i(M)$ and $\sigma _i(N)$ be the non-increasingly ordered singular values of matrices $\mathbf {M},\mathbf {N}\in {\mathbb {R}}^{a\times b}$. Then, ${{\,\mathrm{tr}\,}}\{\mathbf {M}\mathbf {N}^T\}\le \sum _i^r\sigma _i(\mathbf {M})\sigma _i(\mathbf {N})$, where $r={min}(a,b)$.

1.1 Proof of Theorem 3

Proof

The Fishers Criterion can be rewritten as $\psi ={{\,\mathrm{tr}\,}}\{\bar{\mathbf {S}}^{-1}\mathbf {S}_B\}$, where $\bar{\mathbf {S}}=\mathbf {F}\mathbf {F}^T$ ($\mathbf {F}$ is the matrix containing all features of the unlabeled set, arranged in columns) and $\mathbf {S}_B=\sum _{c=1}^NM_c\varvec{\mu }_c\varvec{\mu }_c^T=\sum _{c=1}^N\mathbf {S}_{c}$ ($\varvec{\mu }_c$ is the mean feature vector of class c). For notation clarity and simplicity, we assume that all data are centered and that data mean does not change after only one sample is removed. This is justifiable when the number of unlabeled data is sufficiently large, which is the case we consider here.

Suppose the removed instance has pseudo-label belonging to class u. After removing the instance $f(\mathbf {x}_u)$, the two scatter matrices becomes: $\bar{\mathbf {S}}'=\bar{\mathbf {S}}-f(\mathbf {x}_u)f(\mathbf {x}_u)^T$ and $\mathbf {S}_B'=\mathbf {S}_B+\mathbf {S}_u'-\mathbf {S}_u=\mathbf {S}_B+\mathbf {E}_B$, where $\mathbf {S}_u'=(M_u-1)\varvec{\mu }_u'\varvec{\mu }_u'^T$ and $\varvec{\mu }_u'=(\varvec{\mu }_uM_u-f(\varvec{\mu }_u))/(M_u-1)$. Then, we can rewrite:

$$\begin{aligned} \mathbf {E}_B=\dfrac{M_u\varvec{\mu }_u\varvec{\mu }_u^T-M_u\varvec{\mu }_uf(\mathbf {x}_u)^T-M_uf(\mathbf {x}_u)\varvec{\mu }_u^T+f(\mathbf {x}_u)f(\mathbf {x}_u)^T}{M_u-1} \end{aligned}$$

(11)

We can then define the IDA as:

$$\begin{aligned} \begin{aligned}d\psi _u&={{\,\mathrm{tr}\,}}\{\bar{\mathbf {S}}^{-1}\mathbf {S}_B-\bar{\mathbf {S}}'^{-1}\mathbf {S}_B'\}\\&={{\,\mathrm{tr}\,}}\{\bar{\mathbf {S}}^{-1}\mathbf {S}_B-(\bar{\mathbf {S}}-f(\mathbf {x}_u)f(\mathbf {x}_u)^T)^{-1}(\mathbf {S}_B+\mathbf {E}_B)\} \end{aligned} \end{aligned}$$

(12)

The latter term can be reformulated by the Woodbury identity [47]:

$$\begin{aligned} \begin{aligned}&(\bar{\mathbf {S}}-f(\mathbf {x}_u)f(\mathbf {x}_u)^T)^{-1}(\mathbf {S}_B+\mathbf {E}_B)\\&=(\bar{\mathbf {S}}^{-1}+\dfrac{\bar{\mathbf {S}}^{-1}f(\mathbf {x}_u)f(\mathbf {x}_u)^T\bar{\mathbf {S}}^{-1}}{1-f(\mathbf {x}_u)^T\bar{\mathbf {S}}^{-1}f(\mathbf {x}_u)})(\mathbf {S}_B+\mathbf {E}_B) \end{aligned} \end{aligned}$$

(13)

Substitute this term into the above IDA equation, we have:

$$\begin{aligned} \begin{aligned} d\psi _u={{\,\mathrm{tr}\,}}\{\dfrac{\bar{\mathbf {S}}^{-1}f(\mathbf {x}_u)f(\mathbf {x}_u)^T\bar{\mathbf {S}}^{-1}\mathbf {S}_B}{f(\mathbf {x}_u)^T\bar{\mathbf {S}}^{-1}f(\mathbf {x}_u)-1}+ \bar{\mathbf {S}}^{-1}\tilde{\mathbf {E}}_B+ \dfrac{\bar{\mathbf {S}}^{-1}f(\mathbf {x}_u)f(\mathbf {x}_u)^T\bar{\mathbf {S}}^{-1}\mathbf {E}_B}{f(\mathbf {x}_u)^T\bar{\mathbf {S}}^{-1}f(\mathbf {x}_u)-1}\} \end{aligned} \end{aligned}$$

(14)

where $\tilde{\mathbf {E}}_B=-{\mathbf {E}}_B$. To upper-bound $d\psi _u$, we derive an upper-bound for the three terms respectively, given that trace operation is additive.

Upper-bound for ${{\,\mathrm{tr}\,}}\{\dfrac{\bar{\mathbf {S}}^{-1}f(\mathbf {x}_u)f(\mathbf {x}_u)^T\bar{\mathbf {S}}^{-1}\mathbf {S}_B}{f(\mathbf {x}_u)^T\bar{\mathbf {S}}^{-1}f(\mathbf {x}_u)-1}\}$: From Lemma 2, we have:

$$\begin{aligned} \begin{aligned}&{{\,\mathrm{tr}\,}}\{\dfrac{\bar{\mathbf {S}}^{-1}f(\mathbf {x}_u)f(\mathbf {x}_u)^T\bar{\mathbf {S}}^{-1}\mathbf {S}_B}{f(\mathbf {x}_u)^T\bar{\mathbf {S}}^{-1}f(\mathbf {x}_u)-1}\} \\&\le \dfrac{\sum _i\sigma _i(\bar{\mathbf {S}}^{-1}\mathbf {S}_B\bar{\mathbf {S}}^{-1})\sigma _i(f(\mathbf {x}_u)f(\mathbf {x}_u)^T)}{f(\mathbf {x}_u)^T\bar{\mathbf {S}}^{-1}f(\mathbf {x}_u)-1}\\&\le \dfrac{f(\mathbf {x}_u)^Tf(\mathbf {x}_u)\sigma _1(\bar{\mathbf {S}}^{-1}\mathbf {S}_B\bar{\mathbf {S}}^{-1})}{f(\mathbf {x}_u)^T\bar{\mathbf {S}}^{-1}f(\mathbf {x}_u)-1} \end{aligned} \end{aligned}$$

(15)

where $\sigma _1(\cdot )$ denotes the largest singular value. Given that the largest singular value is actually the spectral norm, based on the norm submultiplicative, we have:

$$\begin{aligned} \sigma _1(\bar{\mathbf {S}}^{-1}\mathbf {S}_B\bar{\mathbf {S}}^{-1})\le \Vert \bar{\mathbf {S}}^{-1}\Vert _2^2\Vert \mathbf {S}_B\Vert _2 \end{aligned}$$

(16)

For the first norm, $\Vert \bar{\mathbf {S}}^{-1}\Vert _2=1/\sigma _{min}(\bar{\mathbf {S}})$. Typically, $\bar{\mathbf {S}}$ is regularized by a ridge parameter $\rho >0$, i.e. $\bar{\mathbf {S}}+\rho \mathbf {I}$, it can be said that $\sigma _{min}(\bar{\mathbf {S}})>\rho$, so that $\Vert \bar{\mathbf {S}}^{-1}\Vert _2<1/\rho$. For the second norm, $\Vert \mathbf {S}_B\Vert _2=\Vert \sum _{c=1}^NM_c\varvec{\mu }_c\varvec{\mu }_c^T\Vert _2\le \sum _{c=1}^NM_c\Vert \varvec{\mu }_c\varvec{\mu }_c^T\Vert _2=\sum _{c=1}^NM_c\varvec{\mu }_c^T\varvec{\mu }_c=\delta$. It follows that $\sigma _1(\bar{\mathbf {S}}^{-1}\mathbf {S}_B\bar{\mathbf {S}}^{-1})\le \delta /\rho ^2$. Finally, based on the von Neumann [46] property, $f(\mathbf {x}_u)^T\bar{\mathbf {S}}^{-1}f(\mathbf {x}_u)-1={{\,\mathrm{tr}\,}}\{f(\mathbf {x}_u)^T\bar{\mathbf {S}}^{-1}f(\mathbf {x}_u)\}-1=C{\sigma _1(\bar{\mathbf {S}}^{-1})f(\mathbf {x}_u)^Tf(\mathbf {x}_u)}-1$, where $C\in [-1,1]$. Hence, for simplicity, we use the following approximation: $f(\mathbf {x}_u)^T\bar{\mathbf {S}}^{-1}f(\mathbf {x}_u)-1\approx f(\mathbf {x}_u)^Tf(\mathbf {x}_u)/\rho -1$. Then, we can derive the upper-bound for ${{\,\mathrm{tr}\,}}\{\dfrac{\bar{\mathbf {S}}^{-1}f(\mathbf {x}_u)f(\mathbf {x}_u)^T\bar{\mathbf {S}}^{-1}\mathbf {S}_B}{f(\mathbf {x}_u)^T\bar{\mathbf {S}}^{-1}f(\mathbf {x}_u)-1}\}$ as:

$$\begin{aligned} {{\,\mathrm{tr}\,}}\{\dfrac{\bar{\mathbf {S}}^{-1}f(\mathbf {x}_u)f(\mathbf {x}_u)^T\bar{\mathbf {S}}^{-1}\mathbf {S}_B}{f(\mathbf {x}_u)^T\bar{\mathbf {S}}^{-1}f(\mathbf {x}_u)-1}\}\le \dfrac{\delta f(\mathbf {x}_u)^Tf(\mathbf {x}_u)}{\rho (f(\mathbf {x}_u)^Tf(\mathbf {x}_u)-\rho )} \end{aligned}$$

(17)

Upper-bound for ${{\,\mathrm{tr}\,}}\{\bar{\mathbf {S}}^{-1}\tilde{\mathbf {E}}_B\}$: From Lemma 2, we have:

$$\begin{aligned} {{\,\mathrm{tr}\,}}\{\bar{\mathbf {S}}^{-1}\tilde{\mathbf {E}}_B\}\le \sum _{i=1}^4\sigma _i(\bar{\mathbf {S}}^{-1})\sigma _i(\tilde{\mathbf {E}}_B) \end{aligned}$$

(18)

since $\text {rank}(\tilde{\mathbf {E}}_B)\le 4$ [47]. Then, with Lemma 1, we have $\sigma _i(\tilde{\mathbf {E}}_B)\le \dfrac{\Vert \tilde{\mathbf {E}}_B\Vert _F}{\sqrt{i}}=\dfrac{\Vert {\mathbf {E}}_B\Vert _F}{\sqrt{i}}$. By substituting the definition of ${\mathbf {E}}_B$ and using the triangular inequality, we have:

$$\begin{aligned} \sigma _i(\tilde{\mathbf {E}}_B)\le \dfrac{\Vert M_u\varvec{\mu }_u\varvec{\mu }_u^T-M_u\varvec{\mu }_uf(\mathbf {x}_u)^T-M_uf(\mathbf {x}_u)\varvec{\mu }_u^T\Vert _F+\Vert f(\mathbf {x}_u)f(\mathbf {x}_u)^T\Vert _F}{(M_u-1)\sqrt{i}} \end{aligned}$$

(19)

Based on the property that $\Vert M\Vert _F^2={{\,\mathrm{tr}\,}}(M^TM)$:

$$\begin{aligned} \sigma _i(\tilde{\mathbf {E}}_B)\le \dfrac{\nu _u+f(\mathbf {x}_u)^Tf(\mathbf {x}_u)}{(M_u-1)\sqrt{i}} \end{aligned}$$

(20)

where the definition of $\nu _u$ is listed in Theorem 3 of our paper. With the bound on $\sigma _1(\bar{\mathbf {S}}^{-1})<1/\rho$, we can derive the upper-bound for ${{\,\mathrm{tr}\,}}\{\bar{\mathbf {S}}^{-1}\tilde{\mathbf {E}}_B\}$ as:

$$\begin{aligned} {{\,\mathrm{tr}\,}}\{\bar{\mathbf {S}}^{-1}\tilde{\mathbf {E}}_B\}\le \sum _{i=1}^4\dfrac{\nu _u+f(\mathbf {x}_u)^Tf(\mathbf {x}_u)}{\rho (M_u-1)\sqrt{i}}\le \dfrac{H_{4,1/2}(\nu _u+f(\mathbf {x}_u)^Tf(\mathbf {x}_u))}{\rho (M_u-1)} \end{aligned}$$

(21)

Upper-bound for ${{\,\mathrm{tr}\,}}\{\dfrac{\bar{\mathbf {S}}^{-1}f(\mathbf {x}_u)f(\mathbf {x}_u)^T\bar{\mathbf {S}}^{-1}\mathbf {E}_B}{f(\mathbf {x}_u)^T\bar{\mathbf {S}}^{-1}f(\mathbf {x}_u)-1}\}$: With similar derivation as in the first term, we have:

$$\begin{aligned} \begin{aligned}&{{\,\mathrm{tr}\,}}\{\dfrac{\bar{\mathbf {S}}^{-1}f(\mathbf {x}_u)f(\mathbf {x}_u)^T\bar{\mathbf {S}}^{-1}\mathbf {E}_B}{f(\mathbf {x}_u)^T\bar{\mathbf {S}}^{-1}f(\mathbf {x}_u)-1}\} \\&\le \dfrac{f(\mathbf {x}_u)^Tf(\mathbf {x}_u)\sigma _1(\bar{\mathbf {S}}^{-1}\mathbf {E}_B\bar{\mathbf {S}}^{-1})}{f(\mathbf {x}_u)^Tf(\mathbf {x}_u)/\rho -1} \end{aligned} \end{aligned}$$

(22)

Again, based on the norm submultiplicative, $\sigma _1(\bar{\mathbf {S}}^{-1}\mathbf {E}_B\bar{\mathbf {S}}^{-1})\le \Vert \bar{\mathbf {S}}^{-1}\Vert _2^2\Vert \mathbf {E}_B\Vert _2$. From the derivation in the second term, we readily get $\Vert \mathbf {E}_B\Vert _2=\sigma _1(\Vert \mathbf {E}_B\Vert _2)\le \Vert \mathbf {E}_B\Vert _F\le \dfrac{\nu _u+f(\mathbf {x}_u)^Tf(\mathbf {x}_u)}{(M_u-1)}$. Using the upper-bound for $\Vert \bar{\mathbf {S}}^{-1}\Vert _2$, we can obtain the bound $\sigma _1(\bar{\mathbf {S}}^{-1}\mathbf {E}_B\bar{\mathbf {S}}^{-1})\le \Vert \mathbf {E}_B\Vert _F\le \dfrac{\nu _u+f(\mathbf {x}_u)^Tf(\mathbf {x}_u)}{(M_u-1)\rho ^2}$. Finally, we can derive the upper-bound for the third term:

$$\begin{aligned} {{\,\mathrm{tr}\,}}\{\dfrac{\bar{\mathbf {S}}^{-1}f(\mathbf {x}_u)f(\mathbf {x}_u)^T\bar{\mathbf {S}}^{-1}\mathbf {E}_B}{f(\mathbf {x}_u)^T\bar{\mathbf {S}}^{-1}f(\mathbf {x}_u)-1}\}\le \dfrac{f(\mathbf {x}_u)^Tf(\mathbf {x}_u)(\nu _u+f(\mathbf {x}_u)^Tf(\mathbf {x}_u))}{\rho (f(\mathbf {x}_u)^Tf(\mathbf {x}_u)-\rho )(M_u-1)} \end{aligned}$$

(23)

Finally, we can conclude the upper-bound for $d\psi _u$ by combining the upper-bounds for three additive terms together.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Hou, Z., Kung, SY. Semi-Supervised Few-Shot Learning Via Dependency Maximization and Instance Discriminant Analysis. J Sign Process Syst 95, 13–24 (2023). https://doi.org/10.1007/s11265-022-01796-x

Download citation

Received: 01 March 2022
Revised: 01 June 2022
Accepted: 15 July 2022
Published: 02 August 2022
Issue Date: January 2023
DOI: https://doi.org/10.1007/s11265-022-01796-x

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

¥17,985 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (Japan)

Instant access to the full article PDF.

Institutional subscriptions

Semi-Supervised Few-Shot Learning Via Dependency Maximization and Instance Discriminant Analysis

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Self-supervised pairwise-sample resistance model for few-shot classification

When Does Self-supervision Improve Few-Shot Learning?

Self-Supervision Can Be a Good Few-Shot Learner

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s Note

Appendix

Lemma 1

Lemma 2

1.1 Proof of Theorem 3

Proof

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Semi-Supervised Few-Shot Learning Via Dependency Maximization and Instance Discriminant Analysis

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Self-supervised pairwise-sample resistance model for few-shot classification

When Does Self-supervision Improve Few-Shot Learning?

Self-Supervision Can Be a Good Few-Shot Learner

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s Note

Appendix

Appendix

Lemma 1

Lemma 2

1.1 Proof of Theorem 3

Proof

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation