Abstract
The scale of shot, i.e. the apparent distance of the camera from the main subject of a scene, is one of the main stylistic and narrative functions of audiovisual products, conveying meaning and inducing the viewer’s emotional state. The statistical distribution of different shot scales in a film may be an important identifier of an individual film, an individual author, and of various narrative and affective functions of a film. In order to understand at which level shot scale distribution (SSD) of a movie might become its fingerprint, it is necessary to produce automatic recognition of shot scale on a large movie corpus. In our work we propose an automatic framework for estimating the SSD of a movie by using inherent characteristics of shots containing information about camera distance, without the need to recover the 3D structure of the scene. In the experimental investigation, the comparison of obtained results with manual SSD annotations proves the validity of the framework. Experiments conducted on movies by Michelangelo Antonioni taken from different stylistic periods (1950–57, 1960–64, 1966–75, 1980–82) show a strong similarity in shot scale distributions within each period, thus opening interesting research lines regarding the possible aesthetic and cognitive sources of such a regularity.
Similar content being viewed by others
References
Arijon D (1991) Grammar of the film language, Silman-James Press
Balázs B (1924) Der sichtbare Mensch Berlin
Barnich O, Van Droogenbroeck M (2011) Vibe: A universal background subtraction algorithm for video sequences. Image Process IEEE Trans 20(6):1709–1724
Barrow HG, Tenenbaum JM (1981) Interpreting line drawings as three-dimensional surfaces. Artif Intell 17(1):75–116
Benini S, Canini L, Leonardi R (2010) Estimating cinematographic scene depth in movie shots. In: 2010 IEEE international conference on Multimedia and expo (ICME). IEEE, pp 855–860
Bhattacharya S, Mehran R, Sukthankar R, Shah M. (2014) Classification of cinematographic shots using lie algebra and its application to complex event recognition. IEEE Trans Multimed 16(3):686–696. doi:10.1109/TMM.2014.2300833
Breiman L (2001) Random forests. Mach Learn 45(1):5–32
Brooks MJ, Horn BKP (1989) Shape and source from shading. In: Horn BKP, Brooks MJ (eds) Shape from shading. MIT Press, Cambridge, MA, pp 53–68
Canini L, Benini S, Leonardi R (2013) Classifying cinematographic shot types. Multimed Tools Appl 62(1):51–73
Cantoni V, Lombardi L, Porta M, Vallone U (2001) Qualitative estimation of depth in monocular vision. In: Visual form 2001. Springer, pp 135–144
Caruana R, Niculescu-Mizil A (2006) An empirical comparison of supervised learning algorithms. In: Proceedings of the 23rd international conference on machine learning. ACM, pp 161–168
Chatman S, Duncan P (2008) Michelangelo Antonioni. Tutti i film. Kleine film Taschen. http://books.google.it/books?id=z1QWPQAACAAJ
Chen F, Delannay D, De Vleeschouwer C (2011) An autonomous framework to produce and distribute personalized team-sport video summaries: a basketball case study. IEEE Trans Multimed 13(6):1381–1394. doi:10.1109/TMM.2011.2166379
Cherif I, Solachidis V, Pitas I (2007) Shot type identification of movie content. In: ISSPA 2007. 9th international symposium on Signal processing and its applications, 2007. IEEE, pp 1–4
Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20(3):273–297
Criminisi A, Shotton J, Konukoglu E (2011) Decision forests for classification, regression, density estimation, manifold learning and semi-supervised learning. Microsoft Res Camb, Tech Rep MSRTR-2011-114 5(6):12
Duan LY, Xu M, Tian Q, Xu CS, Jin JS (2005) A unified framework for semantic shot classification in sports video. IEEE Trans Multimed 7(6):1066–1083
Ekin A, Tekalp AM (2003) Robust dominant color region detection and color-based applications for sports video. In: 2003 international conference on Image processing, 2003. ICIP 2003. Proceedings, vol 1. IEEE, pp i–21
Fan J, Elmagarmid A, Zhu X, Aref W, Wu L (2004) Classview: hierarchical video shot classification, indexing, and accessing. IEEE Trans Multimed 6 (1):70–86. doi:10.1109/TMM.2003.819583
Felzenszwalb PF, Girshick RB, McAllester D, Ramanan D (2010) Object detection with discriminatively trained part-based models. IEEE Trans Pattern Anal Mach Intell 32(9):1627–1645
Hoiem D, Adviser-Efros AA, Adviser-Hebert M (2007) Seeing the world behind the image: spatial layout for three-dimensional scene understanding Carnegie Mellon University
Hsu CW, Lin CJ (2002) A comparison of methods for multiclass support vector machines. IEEE Trans Neural Netw 13(2):415–425
Internet movie database (imdb). http://www.imdb.com/
Keller JM, Crownover RM, Chen RY (1987) Characteristics of natural scenes related to the fractal dimension. IEEE Transactions on Pattern Analysis and Machine Intelligence (5), pp 621– 627
Kovács AB (2014) Shot scale distribution: an authorial fingerprint or a cognitive pattern? Projections 8(2). doi:10.3167/proj.2014.080204
Kurita T, Otsu N, Abdelmalek N (1992) Maximum likelihood thresholding based on population mixture models. Pattern Recogn 25(10):1231–1240
Matessi A, Lombardi L (1999) Vanishing point detection in the hough transform space. In: Euro-par’99 parallel processing. Springer, pp 987–994
McIvor AM (2000) Background subtraction techniques. Proc. Image Vis Comput 1(3):155–163
Nagai T, Naruse T, Ikehara M, Kurematsu A (2002) Hmm-based surface reconstruction from single images. In: 2002 international conference on Image processing. 2002. Proceedings, vol 2. IEEE, pp II–561
Oliva A, Torralba A (2001) Modeling the shape of the scene: a holistic representation of the spatial envelope. Int J Comput Vis 42(3):145–175
Palmer SE (1999) Vision science: Photons to phenomenology, vol 1. MIT press Cambridge, MA
Shimshoni I, Moses Y, Lindenbaum M (2000) Shape reconstruction of 3d bilaterally symmetric surfaces. Int J Comput Vis 39(2):97–110
Super BJ, Bovik AC (1995) Shape from texture using local spectral moments. IEEE Trans Pattern Anal Mach Intell 17(4):333–343
Svanera M, Benini S, Adami N, Leonardi R, Kovács AB 13th International Workshop on Content-Based Multimedia Indexing, CBMI 2015, Prague, Czech Republic, June 10-12, 2015, pp. 1–6. IEEE (2015). doi:10.1109/CBMI.2015.7153627
Torralba A, Oliva A (2002) Depth estimation from image structure. IEEE Trans Pattern Anal Mach Intell 24(9):1226–1238
Tsingalis I, Vretos N, Nikolaidis N, Pitas I (2012) Svm-based shot type classification of movie content. In: Proceedings of 9th mediterranean electro technical conference. Istanbul, Turkey, pp 104–107
Viola P, Jones M (2001) Rapid object detection using a boosted cascade of simple features. In: 2001. CVPR 2001. Proceedings of the 2001 IEEE computer society conference on Computer vision and pattern recognition, vol 1. IEEE, pp i–511
Wang HL, Cheong LF (2009) Taxonomy of directing semantics for film shot classification. IEEE Trans Circ Syst Video Technol 19(10):1529–1542. doi:10.1109/TCSVT.2009.2022705
Wikipedia: Art film — wikipedia, the free encyclopedia (2015). http://en.wikipedia.org/w/index.php?title=Art_film&oldid=646428976. [Online; accessed 20-March-2015]
Xie L, Chang SF, Divakaran A, Sun H (2002) Structure analysis of soccer video with hidden markov models, vol 4. IEEE, pp IV–4096
Xu M, Wang J, Hasan MA, He X, Xu C, Lu H, Jin JS (2011) Using context saliency for movie shot classification. In: 2011 18th IEEE international conference on Image processing (ICIP). IEEE, pp 3653–3656
Zhu X, Ramanan D (2012) Face detection, pose estimation, and landmark localization in the wild. In: 2012 IEEE conference on Computer vision and pattern recognition (CVPR). IEEE, pp 2879–2886
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Benini, S., Svanera, M., Adami, N. et al. Shot scale distribution in art films. Multimed Tools Appl 75, 16499–16527 (2016). https://doi.org/10.1007/s11042-016-3339-9
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-016-3339-9