Extreme vocabulary learning

Dong, Hanze; Sun, Zhenfeng; Fu, Yanwei; Zhong, Shi; Zhang, Zhengjun; Jiang, Yu-Gang

doi:10.1007/s11704-019-8249-3

Extreme vocabulary learning

Research Article
Published: 31 December 2019

Volume 14, article number 146315, (2020)
Cite this article

Frontiers of Computer Science Aims and scope Submit manuscript

Hanze Dong¹^na1,
Zhenfeng Sun²^na1,
Yanwei Fu¹,
Shi Zhong²,
Zhengjun Zhang³ &
…
Yu-Gang Jiang²

100 Accesses
1 Citation
1 Altmetric
Explore all metrics

Abstract

Regarding extreme value theory, the unseen novel classes in the open-set recognition can be seen as the extreme values of training classes. Following this idea, we introduce the margin and coverage distribution to model the training classes. A novel visual-semantic embedding framework — extreme vocabulary learning (EVoL) is proposed; the EVoL embeds the visual features into semantic space in a probabilistic way. Notably, we adopt the vast open vocabulary in the semantic space to help further constraint the margin and coverage of training classes. The learned embedding can directly be used to solve supervised learning, zero-shot learning, and open set recognition simultaneously. Experiments on two benchmark datasets demonstrate the effectiveness of the proposed framework against conventional ways.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

¥17,985 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (Japan)

Instant access to the full article PDF.

Institutional subscriptions

An Empirical Study and Analysis of Generalized Zero-Shot Learning for Object Recognition in the Wild

Semantic Contrastive Embedding for Generalized Zero-Shot Learning

Article 18 August 2022

Enlarge the Hidden Distance: A More Distinctive Embedding to Tell Apart Unknowns for Few-Shot Learning

References

Biederman I. Recognition-by-components: a theory of human image understanding. Psychological Review, 1987, 94(2): 115
Article Google Scholar
Scheirer W J, Jain L P, Boult T E. Probability models for open set recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2014, 36(11): 2317–2324
Article Google Scholar
Rebuff S A, Kolesnikov A, Lampert C H. iCaRL: incremental classifier and representation learning sylvestre-alvise. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2017, 2001–2010
Opelt A, Pinz A, Zisserman A. Incremental learning of object detectors using a visual shape alphabet. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 2006, 3–10
Da Q, Yu Y, Zhou Z H. Learning with augmented class by exploiting unlabeled data. In: Proceedings of AAAI Conference on Artificial Intelligence. 2014, 1760–1766
Scheirer W J, de Rezende Rocha A, Sapkota A, Boult T E. Toward open set recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2013, 35(7): 1757–1772
Article Google Scholar
Rudd E M, Jain L P, Scheirer W J, Boult T E. The extreme value machine. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 40(3): 762–768
Article Google Scholar
Bendale A, Boult T. Towards open world recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2015, 1893–1902
Sattar H, Muller S, Fritz M, Bulling A. Prediction of search targets from fixations in open-world settings. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 2015, 981–990
Lampert C H, Nickisch H, Harmeling S. Attribute-based classification for zero-shot visual object categorization. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2013, 36(3): 453–465
Article Google Scholar
Frome A, Corrado G S, Shlens J, Bengio S, Dean J, Ranzato M, Mikolov T. DeViSE: a deep visual-semantic embedding model. In: Proceedings of the 26th International Conference on Neural Information Processing Systems. 2013, 2121–2129
Norouzi M, Mikolov T, Bengio S, Singer Y, Shlens J, Frome A, Corrado G S, Dean J. Zero-shot learning by convex combination of semantic embeddings. 2013, arXiv preprint arXiv:1312.5650
Mikolov T, Sutskever I, Chen K, Corrado G, Dean J. Distributed representations of words and phrases and their compositionality. In: Proceedings of the 26th International Conference on Neural Information Processing Systems. 2013, 3111–3119
Kumar Verma V, Arora G, Mishra A, Rai P. Generalized zero-shot learning via synthesized examples. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018, 4281–4289
Long T, Xu X, Li Y, Shen F M, Song J K, Shen H T. Pseudo transfer with marginalized corrupted attribute for zero-shot learning. In: Proceedings of 2018 ACM Multimedia Conference on Multimedia Conference. 2018, 4281–4289
Long Y, Liu L, Shen FM, Shao L, Li XL. Zero-shotlearning using synthesised unseen visual data with diffusion regularisation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2018, 40(10): 2498–2512
Article Google Scholar
Xian Y Q, Lorenz T, Schiele B, Akata Z. Feature generating networks for zero-shot learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018, 5542–5551
Fu Y W, Sigal L. Semi-supervised vocabulary-informed learning. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 2016, 5337–5346
Fu Y W, Wang X M, Dong H Z, Jiang Y G, Wang M, Xue X Y, Sigal L. Vocabulary-informed zero-shot and open-set learning. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2019
Bai X, Rao C, Wang X G. Shape vocabulary: a robust and efficient shape representation for shape matching. IEEE Transactions on Image Processing, 2014, 23(9): 3935–3949
Article MathSciNet Google Scholar
Wang X G, Wang B Y, Bai X, Liu W Y, Tu Z W. Max-margin multiple-instance dictionary learning. In: Proceedings of the 30th International Conference on Machine Learning. 2013, 846–854
Zhang L, Xiang T, Gong S G. Learning a deep embedding model for zero-shot learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2017, 2021–2030
Pan S J, Yang Q. A survey on transfer learning. IEEE Transactions on Data and Knowledge Engineering, 2010, 22(10): 1345–1359
Article Google Scholar
Vilalta R, Drissi Y. A perspective view and survey of meta-learning. Artificial Intelligence Review, 2002, 18(2): 77–95
Article Google Scholar
Thrun S, Pratt L. Learning to Learn: Introduction and Overview. Springer, Boston, MA, 1998
Book Google Scholar
Rohrbach M, Stark M, Schiele B. Evaluating knowledge transfer and zero-shot learning in a large-scale setting. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 2012, 1641–1648
Tommasi T, Caputo B. The more you know, the less you learn: from knowledge transfer to one-shot learning of object categories. In: Proceedings of British Machine Vision Conference. 2009
Li F F, Fergus R, Perona P. A Bayesian approach to unsupervised one-shot learning of object categories. In: Proceedings of IEEE International Conference on Computer Vision. 2003, 1134–1141
Bart E, Ullman S. Cross-generalization: learning novel classes from a single example by feature replacement. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 2005, 672–679
Hertz T, Hillel A, Weinshall D. Learning a kernel function for classification with small training samples. In: Proceedings of International Conference on Machine Learning. 2016, 401–408
Fleuret F, Blanchard G. Pattern recognition from one example by chopping. In: Proceedings of the 18th International Conference on Neural Information Processing Systems. 2005, 371–378
Amit Y, Fink M, Srebro N, Ullman S. Uncovering shared structures in multiclass classification. In: Proceedings of International Conference on Machine Learning. 2007, 17–24
Wolf L, Martin I. Robust boosting for learning from few examples. In: Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. 2005, 359–364
Torralba A, Murphy K, Freeman W. Sharing visual features for multiclass and multiview object detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2007, 19(5): 854–869
Article Google Scholar
Rohrbach M, Ebert S, Schiele B. Transfer learning in a transductive setting. In: Proceedings of the 26th International Conference on Neural Information Processing Systems. 2013, 46–54
Rohrbach M, Stark M, Szarvas G, Gurevych I, Schiele B. What helps where — and why? semantic relatedness for knowledge transfer. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 2010, 910–917
Torralba A, Murphy K P, Freeman W T. Using the forest to see the trees: exploiting context for visual object detection and localization. Communications of the ACM, 2010, 53(3): 107–114
Article Google Scholar
Akata Z, Reed S, Walter D, Lee H, Schiele B. Evaluation of output embeddings for fine-grained image classification. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 2015, 2927–2936
Weston J, Bengio S, Usunier N. Wsabie: scaling up to large vocabulary image annotation. In: Proceedings of the 22nd International Joint Conference on Artificial Intelligence. 2011, 2764–2770
Akata Z, Perronnin F, Harchaoui Z, Schmid C. Label-embedding for attribute-based classification. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 2013, 819–826
Fu Y W, Hospedales T M, Xiang T, Fu Z Y, Gong S G. Transductive multi-view embedding for zero-shot recognition and annotation. In: Proceedings of European Conference on Computer Vision. 2014, 584–599
Farhadi A, Endres I, Hoiem D, Forsyth D. Describing objects by their attributes. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 2009, 1778–1785
Koch G, Zemel R, Salakhutdinov R. Siamese neural networks for one-shot image recognition. In: Proceedings of International Conference on Machine Learning — Deep Learning Workshop. 2015
Kotz S, Nadarajah S. Extreme Value Distributions: Theory and Applications. World Scientific, 2000
Bartlett P, Freund Y, Lee W S, Schapire R E. Boosting the margin: a new explanation for the effectiveness of voting methods. The Annals of Statistics, 1998, 26(5): 1651–1686
Article MathSciNet Google Scholar
Coles S. An Introduction to Statistical Modeling of Extreme Values. London: Springer, 2001
Book Google Scholar
Fu Y W, Hospedales T M, Xiang T, Gong S G. Transductive multiview zero-shot learning. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 37(11): 2332–2345
Article Google Scholar
Fu Z Y, Xiang T, Kodirov E, Gong S. Zero-shot object recognition by semantic manifold distance. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 2015, 2635–2644
Maaten L V D, Hinton G. Visualizing high-dimensional data using t-SNE. Journal of Machine Learning Research, 2008, 9(Nov): 2579–2605
MATH Google Scholar

Download references

Author information

These authors contributed equally to this work.

Authors and Affiliations

School of Data Science, Fudan University, Shanghai, 200433, China
Hanze Dong & Yanwei Fu
School of Computer Science, Fudan University, Shanghai, 201203, China
Zhenfeng Sun, Shi Zhong & Yu-Gang Jiang
Department of Statistics, University of Wisconsin, Madison, 53706, USA
Zhengjun Zhang

Authors

Hanze Dong
View author publications
You can also search for this author in PubMed Google Scholar
Zhenfeng Sun
View author publications
You can also search for this author in PubMed Google Scholar
Yanwei Fu
View author publications
You can also search for this author in PubMed Google Scholar
Shi Zhong
View author publications
You can also search for this author in PubMed Google Scholar
Zhengjun Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Yu-Gang Jiang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yanwei Fu.

Additional information

Hanze Dong is an undergraduate student majoring in mathematics and applied mathematics (data science track) at the School of Data Science, Fudan University, China. He works in Shanghai Key Lab of Intelligent Information Processing under the supervision of Professor Yanwei Fu. His current research interests include both machine learning theory and its applications.

Zhenfeng Sun is a DEng student in the School of Computer Science, Fudan University, China. He received his master degree in Computer Science Department of Tongji University, China in 2003. He had many years of working experience in the video industry. In 2005, he was responsible for deploying Alcatel Lucent’s leading triple play solution in China. In 2013, he joined the world’s largest IPTV operator, BESTV Company. He was responsible for terminal department and participated in a national key research project. He is now the co-founder of OTVClOUD company invested by Yunfeng Fund (set up by Jack Ma, founder of Alibaba). His research interests are video retrieval, recognition and innovative video applications. He participated in and obtained several patents.

Yanwei Fu received the PhD degree from Queen Mary University of London, UK in 2014, and the MEng degree from the Department of Computer Science and Technology, Nanjing University, China in 2011. He held a post-doctoral position at Disney Research, Pittsburgh, PA, USA, from 2015 to 2016. He is currently a tenure-track professor with Fudan University, China. His research interests are image and video understanding, and life-long learning.

Shi Zhong received PhD degree in 2018 at the School of Computer Science, Fudan University, China. His research interests mainly include computer vision and machine learning.

Zhengjun Zhang is full professor of Statistics in the Department of Statistics at University of Wisconsin-Madison, USA. He received his PhD degrees in Management Engineering and Statistics from Beihang University and the University of North Carolina at Chapel Hill, respectively. Dr. Zhang’s main research areas include the Big Data structure and inference, particularly in extreme value analysis for interdependent critical risk variables in finance, climate, and medical sciences, in stochastic optimizations in large and complex systems. Some of his selected journal publications are Annals of Statistics, Journal of Royal Statistical Society, Series B, Journal of American Statistical Association, Journal of Econometrics, Journal of Banking and Finance, Extremes, and Automatica.

Yu-Gang Jiang is professor of Computer Science at Fudan University and Director of Fudan-Jilian Joint Research Center on Intelligent Video Technology, China. He is interested in all aspects of extracting high-level information from big video data, such as video event recognition, object/scene recognition and largescale visual search. His work has led to many awards, including the inaugural ACM China Rising Star Award, the 2015 ACM SIGMM Rising Star Award, and the research award for outstanding young researchers from NSF China. He is currently an associate editor of ACM TOMM, Machine Vision and Applications (MVA) and Neurocomputing. He holds a PhD in Computer Science from City University of Hong Kong and spent three years working at Columbia University before joining Fudan in 2011.

Electronic Supplementary Material