Abstract
Existing projection-based person re-identification methods usually suffer from long time training, high dimension of projection matrix, and low matching rate. In addition, the intra-class instances may be much less than the inter-class instances when a training dataset is built. To solve these problems, a novel relative distance metric leaning based on clustering centralization and projection vectors learning (RDML-CCPVL) is proposed. When constructing training dataset, the images of a same target person are clustering centralized with FCM. The training datasets are built by these clusters in order to alleviate the imbalanced data problem of the training datasets. In addition, during learning projection matrix, the resulted projection vectors can be approximately orthogonal by using an iteration strategy and a conjugate gradient projection vector learning method to update training datasets. The advantage of this method is its quadratic convergence, which can promote the convergence. Experimental results show that the proposed algorithm has higher efficiency. The matching rate can be significantly improved, and the time of training is much shorter than most of existing algorithms of person re-identification.
You have full access to this open access chapter, Download conference paper PDF
Similar content being viewed by others
Keywords
1 Introduction
Recently, surveillance systems have become ubiquitous in public places like airports, railway stations, college campuses, and office buildings [1]. There are a large number of cameras in the surveillance systems and they provide huge amounts of video data. The analysis of the computer vision obtained in a surveillance system often requires the ability to track people across multiple cameras. Therefore, person re-identification (Re-ID) has attracted more and more interests [2,3,4]. Re-ID is defined as a process of establishing correspondence between images of a person taken from different cameras. In the past five years, a large number of models have been proposed for Re-ID systems [5, 6], which can be categorized generally into two types: (1) designing discriminative, descriptive and robust visual descriptors to characterize a person’s appearance [7, 8]; (2) learning suitable distance metrics that maximize the chance of a correct correspondence [9,10,11]. In this paper, we focus on the second type of person re-identification, that is, given a set of features extracted from each person image, we seek to quantify and differentiate these features by learning the optimal distance measure that is most likely to give correct matches.
Many metric learning algorithms have been proposed to act with the distances or similarity functions of person re-identification features. For example, Pedagadi et al. applied the Local Fisher Discriminate Analysis (LFDA) [12] to solve person re-identification problems. The authors in [13] introduced the KISSME method from equivalence constraints based on a statistical inference perspective. Dikmen et al. proposed a metric learning framework to obtain a robust Mahalanobis metric for Large Margin Nearest Neighbour classification with Rejection (LMNN) [14]. [15] presented an information-theoretic approach to learn a Mahalanobis distance function. Zheng et al. proposed the Relative Distance Comparison (RDC) [16] approach to maximize the likelihood of a pair of true matches which having a relatively smaller distance than that of a wrongly matched pair in a soft discriminate manner. Chen [17] formulated an asymmetric distance model for learning camera-specific projections to transform the unmatched features of each view to a common space. Chen [18] presented a relevance metric learning method with listwise constraints (RMLLCs) by adopting listwise similarities using predefined similarity lists.
However, the existing Re-ID methods discussed above all have two shortcomings.
-
(1)
As we all known, for the person re-identification datasets, there are usually much more inter-class person image pairs than intra-class ones. Imbalanced datasets would drastically affect the modelling of machine learning methods.
-
(2)
The training set for learning the model consists of images of matched people across different camera views. In order to capture the large intra and inter variations, this represents a large scale learning problem that challenges existing machine learning algorithms.
To address the problems of the existing Re-ID methods, a novel relative distance metric leaning based on clustering centralization and projection vectors learning for person re-identification (RDML-CCPVL) is proposed. First, there are usually much more inter-class instances than intra-class instances when traditional metric learning methods collect training dataset. Constructing counterexamples of each instance needs to compute the distances of all the other instances, so the training time is greatly increased. Using FCM [19], the number of counterexamples of each instance is decreased by divided into clusters and the important structural information is retained, so the overfitting problem caused by class imbalance is relieved.
Second, traditional matrix projection learning methods usually have greater storage and computing complexity. In this work, we decomposed the projection matrices into low rank ones by eigenvalue decomposition for projection matrices. According to our iterative optimization method, updating the distance vectors of instance features only need to learn a new projection vector using the updated training dataset each time and stop when achieving a good enough accuracy. The conjugate gradient method is used to learn the projection vector, which only needs to compute the initial gradient one time. For the quadratic function, the conjugate gradient method can converge to the target precision soon due to quadratic termination. Our method can effectively reduce the computational complexity and storage. In addition, our algorithm can approximately ensure to keep the orthogonal characteristics of the vectors after eigenvalue decomposition.
2 Relative Distance Metric Leaning Based on Clustering Centralization and Projection Vectors Learning
2.1 Learning Function Based on Clustering Centralization
The person re-identification problem can be casted into a distance comparison problem [16]. Suppose we have a set of m training pedestrian images \(\mathbf{D }^k = [\mathbf{{X} },\mathbf{Y }]_1^m\) with a feature dataset \(\mathbf {X} = \{{x}_i,i = 1,...,{m}\} ,{x}_{i} \in \mathbf R ^d\), where d is the dimension of the features and \(\mathbf Y = \{ {y_i},i = 1,...,m\}\) is the input label dataset. For an instance \(x _a\) of person A, we want to find another instance \(x _b\) of pedestrian A captured elsewhere in space and time by learning the distance \(dis (x _a,x _b)\) of these two instances. As we all known, the distances of intra-class instances should be smaller in general compared with the distances of inter-class instances, so that \(dis (x _a,x _b) < dis (x _a,x _c)\), where \(x _c\) is an instance of any other pedestrian except A. Here we construct a pairwise dataset \(\mathbb {S} = \{ \mathbb {S}_t = ({\varvec{{d}}}_t^{pos},{\varvec{d}}_t^{neg})\} _{t = 1}^m\) to describe the distances of the instances, where \({\varvec{d}}_t^{pos}\) is the distance of \(x _t\) with intra-class instances and \({\varvec{d}}_t^{neg}\) is the distance of \(x _t\) with inter-class instances.
However, as we all known, the number of inter-class instances is much larger than the number of intra-class instances. So the pairwise dataset \(\mathbb {S} = \{ \mathbb {S}_t = ({\varvec{d}}_t^{pos},{\varvec{d}}_t^{neg})\} _{t = 1}^m\) will be a class imbalanced dataset. Training with such a dataset will lead to over fit-ting of the inter-class instances and under fitting of the intra-class instances, which will decrease the performance of the learning algorithm. Here we choose FCM [19] to alleviate the class imbalanced data problem. The principle of our clustering centralization is shown in Fig. 1. From Fig. 1, the traditional training datasets constructing methods [16] (shown with green lines) include much more inter-class instances, while our method (shown with red lines) with clustering centralization built training datasets by these clusters, which effectively alleviate the class imbalanced problem of the existing Re-ID algorithms. The other advantage of our training clustering centralization method is the number of clusters can adjust for different Re-ID datasets to get the optimal performance, which will be discussed in Sect. 4.3 lately. After clustering centralization, we have , is the distance of \(x _t\) with the centers of clusters of its counterexamples. In order to maximize the inter-class distance and minimize the intra-class distance at the same time, we can formulate it into a minimization problem as following:
where g() is a distance function. The function in Eq. (1) is unbounded, so it cannot guarantee convergence during iteration. Here, we transform it to a continuous sigmoid function as following:
Considering the convenience of computing, Eq. (1) is then transformed to a logistic form as following:
We can see minimizing Eq. (1) is equivalent to maximizing Eq. (2), maximizing Eq. (2) is equivalent to minimizing Eq. (3).
Because the Mahalanobis matrix of the Mahalanobis distance function has good projective property and learning property, here we choose the Mahalanobis distance function as the distance function \(g \):
where M is a semi-definite matrix. Our goal becomes to learn M in Eq. (4) by minimizing the functional defined in Eq. (3). By performing eigenvalue decomposition on M, we can find \(\mathbf{M } = \mathbf{P }\mathbf{P }^T\), P is a matrix of column orthogonal vectors. The number of orthogonal bases may be smaller than the rank of matrix M, therefore, \(\mathbf{P } \in \mathbf{R ^{n*d'}}\) can be regard as a dimension reduction matrix, where \(d'\) is the number of orthogonal basis after dimension reduction. Equation (4) can be transformed as following:
In addition, for a small dataset, the function shown in Eq. (3) may be overfitting. In order to alleviate the risk of over fitting and ensure the sparsity of the projection matrix, we introduce \(r{\left\| \mathbf P \right\| ^2}\) as a regularization term, where r is the regularization factor. The distance function can be formulated as:
2.2 An Iterative Optimization Algorithm for Projection Vector Learning
In this paper, we choose a similar iterative optimization method of [16] to learn an optimal P.
Step 1. Assume that after \(\textit{l}-1\) iterations a set of orthogonal vectors \(\varvec{p}_1,\varvec{p}_2,...,\varvec{p}_{\textit{l}-1}\) have been learned, to the next vector \(\varvec{p}\), let
where \(s \in \{ pos,neg\} \), \(t \in 1,...,\left| \mathbb {S} \right| \).
Step 2. After obtain \({\varvec{d}}_t^{s,l}\) from Eq. (7), let . Then, we use conjugate gradient method to learn projection vectors of the Eq. (8).
The optimal projection vector after kth iteration is defined as following:
where \({\alpha _k}\) is computed by \(f({\varvec{p}}_l^k + \alpha {{\varvec{q}}_k})\) using one-dimensional accurate search, \({{\varvec{q}}_k}\) is the search direction of projection vector after kth iteration, and the conjugate direction is computed by PRP equation as following.
where \({g_l}\) is:
If \(|f({\varvec{p}}_l^k) - f({\varvec{p}}_l^{k - 1})| < {\varepsilon _g}\), the iteration is terminated.
The initial value of \({{\varvec{p}}_l}\) is formulated as following:
Different from the iterative optimization algorithm of [16], a smaller perturbation term u is added into Eq. (7), which make each projection spaces preserving a relation with each other and are more suitable to the real-world learning problem.
According to Eqs. (9) and (11), we can see \({{\varvec{p}}_l} \in span\{ {\varvec{d}}_i^{s,l}\}\), where \(span\{ {\varvec{d}}_i^{s,l}\} \) is a range space of \(\{ {\varvec{d}}_i^{pos,l}\} \cup \{ {\varvec{d}}_i^{neg,l}\} \), \(s \in \{ pos,neg\}\), \(i \in 1,...,\left| \mathbb {S} \right| \). According to Eq. (7), we know that \({\varvec{p}}_j^T{\varvec{d}}_i^{s,j + 1} \approx 0\) where \(j = 1,...,l - 1\) and \({{\varvec{p}}_l} \in span\{ {\varvec{d}}_i^{s,l}\} \), where \(span\{ {\varvec{d}}_i^{s,l}\} \subseteq span\{ {\varvec{d}}_i^{s,l - 1}\} \subseteq ... \subseteq span\{ {\varvec{d}}_i^{s,0}\} \), so \({{\varvec{p}}_l}\) and \({{\varvec{p}}_j},j = 1,...,l - 1\) is approximately orthogonal.
3 Learning Algorithm for RDML-CCPVL
Based on the above iteration, the learning algorithm of the proposed RDML-CCPVL is presented in Algorithm 1.
4 Experimental and Analysis
4.1 Datasets and Experimental Setting
Four popular datasets are selected for our experiments: VIPeR [20], CUHK01 [21], 3DPeS [22], CAVIAR4REID [23], TownCenter [24] and Market-1501 [25]. Datasets have different problems of angle, lighting, occlusion and low resolution, etc. For each image, we use six type of features descriptor, such as RGB, YCbCr, HSV, Lab, YIQ and Gabor [26]. Then we use PCA to compress them into 2688-dimensional feature vectors.
We compare the proposed RDML-CCPVL with seven existing person re-identification works: ITML [15], LMNN [14], KISSME [13], PRDC [16], LFDA [12], CVDCA [17], RMLLC [18]. The performance of all the methods is evaluated in terms of cumulative matching characteristic (CMC), which is a standard measurement for Re-ID. The CMC curve represents the probability of finding the correct match over the top r in the gallery image ranking, with r varying from 1 to 30. In order to evaluate the efficiency of our algorithm, in this paper, the comparison of the training time between our algorithm with other existing algorithms is also shown. Since the number c of the clustering centralization is a important parameter for our algorithm, Normalized Discounted Cumulative Gain (NDCG) [27] is choose to evaluate the performance of RDML-CCPVL with varying number of clusters c. We run all the benchmarking algorithms with MATLAB 7 on a 1.90 GHz machine with 8G RAM.
4.2 Comparisons of Performance on Different Re-ID Datasets
Comparisons of the CMC and training time on different Re-ID datasets are shown in Tables 1, 2, 3, 4, 5 and 6 with optimal value of c, respectively. Since clustering centralization is used to solve the imbalanced data problem of the training datasets, it can be seen that our algorithm is considerably superior to other algorithms. The CMC of RDML-CCPVL is at least 3.2%–4.5% higher than other algorithms at rank 30. The advantage of clustering centralization is the reduction of the training datasets, we can see the training time of RDML-CCPVL is much less than other algorithms except method of KISSME, because this algorithm is a one-pass metric learning method.
4.3 Experiments with Vary Number of Clusters
In this section, we report the change tendency of the performance of RDML-CCPVL by running them on the different datasets with varying number of clusters c. There are two pictures of each person in dataset VIPeR and four pictures in dataset CUHK01, so we choose two clusters of each person of these two datasets. From the Fig. 3 we can see that the proposed RDML-CCPVL is considerably sensitive to c for all the datasets except Town Centre. The performance on 3DPeS, CAVIAR4REID and Market-1501 of will become better as c increases because more clusters mean more information of the datasets. But when c becomes too large, the curves in Fig. 2 start to go down because we find that c has an optimal range of value for different datasets and too many clusters also can not generate useful distribution information. For Town Centre, which include only images of Video sequences, so how many clusters of intra-class instances of the same person has little difference. When \(c =2\), the proposed algorithm achieves the optimal performance on Town Centre.
5 Conclusion
In this work, we proposed a relative distance metric leaning algorithm based on clustering centralization and projection vectors learning for person re-identification problem. We have shown that cluster centralization can improve the performance and efficiency in person re-identification and reduce both the computational complexity and the storage space. In addition, the conjugate gradient method is used in the projection vector learning. The proposed approach shows a significant improvement over other existing algorithms.
References
Agrawal, P., Narayanan, P.: Person re-identification in videos. IEEE Trans. Circuits Syst. Video Technol. 21(3), 299–310 (2011)
Cai, X., Wang, C., Xiao, B., Chen, X., Zhou, J.: Deep nonlinear metric learning with independent subspace analysis for face verification. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 749–752. ACM Press, New York (2012)
Cui, Z., Li, W., Xu, D., Shan, S., Chen, X.: Fusing robust face region descriptors via multiple metric learning for face recognition in the wild. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3554–3561. IEEE Press, New York (2013)
Ji, S., Xu, W., Yang, M., Yu, K.: 3D convolutional neural networks for human action recognition. IEEE Trans. Pattern Anal. Mach. Intell. 35(1), 221–231 (2013)
Cao, Z., Yin, Q., Tang, X., Sun, J.: Face recognition with learning-based descriptor. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2707–2714. IEEE Press, New York (2010)
Hu, J., Lu, J., Tan, Y.P.: Discriminative deep metric learning for face verification in the wild. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1875–1882. IEEE Press, New York (2014)
Liao, S., Hu, Y., Zhu, X., Li, S.Z.: Person re-identification by local maximal occurrence representation and metric learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2197–2206. IEEE Press, New York (2015)
An, L., Kafai, M., Yang, S., Bhanu, B.: Person re-identification with reference descriptor. IEEE Trans. Circuits Syst. Video Technol. 26(4), 776–787 (2016)
An, L., Yang, S., Bhanu, B.: Person re-identification by robust canonical correlation analysis. IEEE Signal Process. Lett. 22(8), 1103–1107 (2015)
Loy, C.C., Liu, C., Gong, S.: Person re-identification by manifold ranking. In: Proceedings of the 20th IEEE International Conference on Image Processing, pp. 3567–3571. IEEE Press, New York (2013)
Roth, P.M., Hirzer, M., Köstinger, M., Beleznai, C., Bischof, H.: Mahalanobis distance learning for person re-identification. In: Gong, S., Cristani, M., Yan, S., Loy, C.C. (eds.) Person Re-Identification. ACVPR, pp. 247–267. Springer, London (2014). https://doi.org/10.1007/978-1-4471-6296-4_12
Pedagadi, S., Orwell, J., Velastin, S., Boghossian, B.: Local Fisher discriminant analysis for pedestrian re-identification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3318–3325. IEEE Press, New York (2013)
Koestinger, M., Hirzer, M., Wohlhart, P., Roth, P.M., Bischof, H.: Large scale metric learning from equivalence constraints. In: Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition, pp. 2288–2295. IEEE Press, New York (2012)
Dikmen, M., Akbas, E., Huang, T.S., Ahuja, N.: Pedestrian recognition with a learned metric. In: Kimmel, R., Klette, R., Sugimoto, A. (eds.) ACCV 2010. LNCS, vol. 6495, pp. 501–512. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-19282-1_40
Davis, J.V., Kulis, B., Jain, P., Sra, S., Dhillon, I.S.: Information theoretic metric learning. In: Proceedings of the 24th International Conference on Machine Learning, pp. 209–216. ACM Press, New York (2007)
Zheng, W.S., Gong, S., Xiang, T.: Reidentification by relative distance comparison. IEEE Trans. Pattern Anal. Mach. Intell. 35(3), 653–668 (2013)
Chen, Y.C., Zheng, W.S., Lai, J.H., Yuen, P.: An asymmetric distance model for cross-view feature mapping in person re-identification. IEEE Trans. Circuits Syst. Video Technol. 1 (2016)
Chen, J.X., Zhang, Z.X., Wang, Y.L.: Relevance metric learning for person reidentification by exploiting listwise similarities. IEEE Trans. Image Process. 24(12), 4741–4755 (2015)
Bezdek, J.C.: Pattern recognition with fuzzy objective function algorithms. Adv. Appl. Pattern Recognit. 22(1171), 203–239 (1981)
Gray, D., Tao, H.: Viewpoint invariant pedestrian recognition with an ensemble of localized features. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008. LNCS, vol. 5302, pp. 262–275. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-88682-2_21
Li, W., Zhao, R., Wang, X.: Human reidentification with transferred metric learning. In: Lee, K.M., Matsushita, Y., Rehg, J.M., Hu, Z. (eds.) ACCV 2012. LNCS, vol. 7724, pp. 31–44. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-37331-2_3
Baltieri, D., Vezzani, R., Cucchiara, R.: 3DPeS: 3D people dataset for surveillance and forensics. In: Joint ACM Workshop on Human Gesture and Behavior Understanding, pp. 59–64. ACM Press, New York (2011)
Cheng, D.S., Cristani, M., Stoppa, M., Bazzani, L., Murino, V.: Custom pictorial structures for re-identification. In: Proceedings of the British Machine Vision Conference, pp. 1–11. Piscataway Press, NJ (2011)
Benfold, B., Reid, I.: Stable multi-target tracking in real-time surveillance video. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3457–3464. IEEE Press, New York (2011)
Zheng, L., Shen, L., Tian, L., Wang, S., Wang, J., Tian, Q.: Scalable person re-identification: a benchmark. In: Proceedings of the 14th IEEE International Conference on Computer Vision, pp. 1116–1124. IEEE Press, New York (2015)
Chen, Y., Zheng, W., Lai, J.: Mirror representation for modeling view-specific transform in person re-identification. In: Proceeings of the 24th International Joint Conference on Artificial Intelligence, pp. 3402–3408. AAAI Press, Palo Alto (2015)
Wang, Y., Wang, L., Li, Y., He, D., Chen, W., Liu, T.Y.: A theoretical analysis of NDCG type ranking measures. J. Mach. Learn. Res. 30, 25–54 (2013)
Acknowledgements
This work was supported in part by the National Natural Science Foundation of China under Grants No. 61502058 and 61572085.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Ni, T., Ding, Z., Chen, F., Wang, H. (2017). Relative Distance Metric Leaning Based on Clustering Centralization and Projection Vectors Learning for Person Re-identification. In: Yang, J., et al. Computer Vision. CCCV 2017. Communications in Computer and Information Science, vol 771. Springer, Singapore. https://doi.org/10.1007/978-981-10-7299-4_40
Download citation
DOI: https://doi.org/10.1007/978-981-10-7299-4_40
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-7298-7
Online ISBN: 978-981-10-7299-4
eBook Packages: Computer ScienceComputer Science (R0)