Authors:
Sravani Yenduri
;
Nazil Perveen
;
Vishnu Chalavadi
and
C. Krishna Mohan
Affiliation:
Indian Institute of Technology Hyderabad, Kandi, Sangareddy, Telangana, 502285, India
Keyword(s):
Spatio-temporal Features, Gaussian Mixture Model (GMM), Maximum A Posterior (MAP) Adaptation, Factor Analysis, Fine-grained Action Recognition.
Abstract:
Modelling the subtle interactions between human and objects is crucial in fine-grained action recognition. However, the existing methodologies that employ deep networks for modelling the interactions are highly supervised, computationally expensive, and need a vast amount of annotated data for training. In this paper, a framework for an efficient representation of fine-grained actions is proposed. First, spatio-temporal features, namely, histogram of optical flow (HOF), and motion boundary histogram (MBH) are extracted for each input video as these features are more robust to irregular motions and capture the motion information in videos efficiently. Then a large Gaussian mixture model (GMM) is trained using the maximum a posterior (MAP) adaption, to capture the attributes of fine-grained actions. The adapted means of all mixtures are concatenated to form an attribute vector for each fine-grained action video. This attribute vector is of large dimension and contains redundant attribu
tes that may not contribute to the particular fine-grained action. So, factor analysis is used to decompose the high-dimensional attribute vector to a low-dimension in order to retain only the attributes which are responsible for that fine-grained action. The efficacy of the proposed approach is demonstrated on three fine-grained action datasets, namely, JIGSAWS, KSCGR, and MPII cooking2.
(More)