图像自动标注是由计算机系统自动通过图片说明或关键词的形式分配元数据给一张数字图像的过程。这个计算机视觉技术的应用被用在图像检索系统来对数据库组织和定位感兴趣的图像。
这种方法可以被看作是一种具有非常大量类别(有词汇量那么大)的多元分类的图像分类问题。通常,提取特征向量和训练标注单词的图像分析使用机器学习技术来尝试对新图像自动标注标签。刚开始的方法学习图像的特征和训练标签之间的相关性,之后技术发展为使用机器翻译尝试翻译带“视觉词汇”的文本词汇,或聚集区域blobs。遵循这些努力的工作包括分类方法、相关模型等。
与基于内容的图像检索相比,自动图像标注的优点是,查询可以由用户更自然地指定 http://i.yz.yamagata-u.ac.jp/paper/inoue04irix.pdf 。基于内容的图像检索通常(目前)需要用户去通过图像的概念进行搜索,如颜色和纹理,或查找示例查询。在示例图像中的某些图像特征可能会覆盖用户真正关注的概念。图像检索的传统方法,如被库使用的,依赖于手动标注的图像,而这是昂贵和费时的,尤其是给定大量不断增长的图像数据库。
有些标注引擎是在线的,其中包括宾夕法尼亚州立大学研究人员开发的ALIPR.com实时标记引擎和Behold Image Search。
一些主要工作
- Word co-occurrence model
Y Mori, H Takahashi, and R Oka (1999). “Image-to-word transformation based on dividing and vector quantizing images with words.”. Proceedings of the International Workshop on Multimedia Intelligent Storage and Retrieval Management.
- Annotation as machine translation
P Duygulu, K Barnard, N de Fretias, and D Forsyth (2002). “Object recognition as machine translation: Learning a lexicon for a fixed image vocabulary”. Proceedings of the European Conference on Computer Vision. pp. 97–112.
- Statistical models
J Li and J Z Wang (2006). “Real-time Computerized Annotation of Pictures”. Proc. ACM Multimedia. pp. 911–920.
J Z Wang and J Li (2002). “Learning-Based Linguistic Indexing of Pictures with 2-D MHMMs”. Proc. ACM Multimedia. pp. 436–445.
- Automatic linguistic indexing of pictures
J Li and J Z Wang (2008). “Real-time Computerized Annotation of Pictures”. IEEE Trans. on Pattern Analysis and Machine Intelligence.
J Li and J Z Wang (2003). “Automatic Linguistic Indexing of Pictures by a Statistical Modeling Approach”. IEEE Trans. on Pattern Analysis and Machine Intelligence. pp. 1075–1088.
- Hierarchical Aspect Cluster Model
K Barnard, D A Forsyth (2001). “Learning the Semantics of Words and Pictures”. Proceedings of International Conference on Computer Vision. pp. 408–415.
- Latent Dirichlet Allocation model
D Blei, A Ng, and M Jordan (2003). “Latent Dirichlet allocation” (PDF). Journal of Machine Learning Research. pp. 3:993–1022.
- Supervised multiclass labeling
G Carneiro, A B Chan, P Moreno, and N Vasconcelos (2006). “Supervised Learning of Semantic Classes for Image Annotation and Retrieval” (PDF). IEEE Trans. on Pattern Analysis and Machine Intelligence. pp. 394–410.
- Texture similarity
R W Picard and T P Minka (1995). “Vision Texture for Annotation”. Multimedia Systems.
- Support Vector Machines
C Cusano, G Ciocca, and R Scettini (2004). “Image Annotation Using SVM”. Proceedings of Internet Imaging IV.
- Ensemble of Decision Trees and Random Subwindows
R Maree, P Geurts, J Piater, and L Wehenkel (2005). “Random Subwindows for Robust Image Classification”. Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition. pp. 1:34–30.
- Maximum Entropy
J Jeon, R Manmatha (2004). “Using Maximum Entropy for Automatic Image Annotation” (PDF). Int’l Conf on Image and Video Retrieval(CIVR 2004). pp. 24–32.
- Relevance models
J Jeon, V Lavrenko, and R Manmatha (2003). “Automatic image annotation and retrieval using cross-media relevance models” (PDF). Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval. pp. 119–126.
- Relevance models using continuous probability density functions
V Lavrenko, R Manmatha, and J Jeon (2003). “A model for learning the semantics of pictures” (PDF). Proceedings of the 16th Conference on Advances in Neural Information Processing Systems NIPS.
- Coherent Language Model
R Jin, J Y Chai, L Si (2004). “Effective Automatic Image Annotation via A Coherent Language Model and Active Learning” (PDF). Proceedings of MM’04.
- Inference networks
D Metzler and R Manmatha (2004). “An inference network approach to image retrieval” (PDF). Proceedings of the International Conference on Image and Video Retrieval. pp. 42–50.
- Multiple Bernoulli distribution
S Feng, R Manmatha, and V Lavrenko (2004). “Multiple Bernoulli relevance models for image and video annotation” (PDF). IEEE Conference on Computer Vision and Pattern Recognition. pp. 1002–1009.
- Multiple design alternatives
J Y Pan, H-J Yang, P Duygulu and C Faloutsos (2004). “Automatic Image Captioning” (PDF). Proceedings of the 2004 IEEE International Conference on Multimedia and Expo (ICME’04).
- Natural scene annotation
J Fan, Y Gao, H Luo and G Xu (2004). “Automatic Image Annotation by Using Concept-Sensitive Salient Objects for Image Content Representation”. Proceedings of the 27th annual international conference on Research and development in information retrieval. pp. 361–368.
- Relevant low-level global filters
A Oliva and A Torralba (2001). “Modeling the shape of the scene: a holistic representation of the spatial envelope” (PDF). International Journal of Computer Vision. pp. 42:145–175.
- Global image features and nonparametric density estimation
A Yavlinsky, E Schofield and S Rüger (2005). “Automated Image Annotation Using Global Features and Robust Nonparametric Density Estimation” (PDF). Int’l Conf on Image and Video Retrieval(CIVR, Singapore, Jul 2005).
- Video semantics
N Vasconcelos and A Lippman (2001). “Statistical Models of Video Structure for Content Analysis and Characterization” (PDF). IEEE Transactions on Image Processing. pp. 1–17.
Ilaria Bartolini, Marco Patella, and Corrado Romani (2010). “Shiatsu: Semantic-based Hierarchical Automatic Tagging of Videos by Segmentation Using Cuts”. 3rd ACM International Multimedia Workshop on Automated Information Extraction in Media Production (AIEMPro10).
- Image Annotation Refinement
Yohan Jin, Latifur Khan, Lei Wang, and Mamoun Awad (2005). “Image annotations by combining multiple evidence & wordNet”. 13th Annual ACM International Conference on Multimedia (MM 05). pp. 706–715.
Changhu Wang, Feng Jing, Lei Zhang, and Hong-Jiang Zhang (2006). “Image annotation refinement using random walk with restarts”. 14th Annual ACM International Conference on Multimedia (MM 06).
Changhu Wang, Feng Jing, Lei Zhang, and Hong-Jiang Zhang (2007). “content-based image annotation refinement”. IEEE Conference on Computer Vision and Pattern Recognition (CVPR 07).
Ilaria Bartolini and Paolo Ciaccia (2007). “Imagination: Exploiting Link Analysis for Accurate Image Annotation”. Springer Adaptive Multimedia Retrieval.
Ilaria Bartolini and Paolo Ciaccia (2010). “Multi-dimensional Keyword-based Image Annotation and Search”. 2nd ACM International Workshop on Keyword Search on Structured Data (KEYS 2010).
- Automatic Image Annotation by Ensemble of Visual Descriptors
Emre Akbas and Fatos Y. Vural (2007). “Automatic Image Annotation by Ensemble of Visual Descriptors”. Intl. Conf. on Computer Vision (CVPR) 2007, Workshop on Semantic Learning Applications in Multimedia.
- A New Baseline for Image Annotation
Ameesh Makadia and Vladimir Pavlovic and Sanjiv Kumar (2008). “A New Baseline for Image Annotation” (PDF). European Conference on Computer Vision (ECCV).
- Simultaneous Image Classification and Annotation
Chong Wang and David Blei and Li Fei-Fei (2009). “Simultaneous Image Classification and Annotation” (PDF). Conf. on Computer Vision and Pattern Recognition (CVPR).
- TagProp: Discriminative Metric Learning in Nearest Neighbor Models for Image Auto-Annotation
Matthieu Guillaumin and Thomas Mensink and Jakob Verbeek and Cordelia Schmid (2009). “TagProp: Discriminative Metric Learning in Nearest Neighbor Models for Image Auto-Annotation” (PDF). Intl. Conf. on Computer Vision (ICCV).
- Image Annotation Using Metric Learning in Semantic Neighbourhoods
Yashaswi Verma and C. V. Jawahar (2012). “Image Annotation Using Metric Learning in Semantic Neighbourhoods” (PDF). European Conference on Computer Vision (ECCV).
参见
- 模式识别
- 图像检索
- 基于内容的图像检索
参考文献
- Datta, Ritendra; Dhiraj Joshi; Jia Li; James Z. Wang (2008). “Image Retrieval: Ideas, Influences, and Trends of the New Age”. ACM Computing Surveys 40 (2): 1–60. doi:10.1145/1348246.1348248.
- Nicolas Hervé; Nozha Boujemaa (2007). “Image annotation : which approach for realistic databases ?” (PDF). ACM International Conference on Image and Video Retrieval.
- M Inoue (2004). “On the need for annotation-based image retrieval” (PDF). Workshop on Information Retrieval in Context. pp. 44–46.
外部链接
- ALIPR.com - 宾夕法尼亚州立大学研究人员开发的实时自动标记引擎。
- Behold Image Search - 一个使用自动生成的标记的引用超过100万张Flickr图像的图像搜索引擎。
- SpiritTagger Global Photograph Annotation - 来自加利福尼亚大学圣塔芭芭拉分校的140万张图像标注系统,预测照片在哪里拍摄和建议标签。
- Akiwi - Semi automatic image tagging - 带用户交互的图像标注