Abstract
Transfer learning is a widely investigated learning paradigm that is initially proposed to reuse informative knowledge from related domains, as supervised information in the target domain is scarce while it is sufficiently available in the multiple source domains. One of the challenging issues in transfer learning is how to handle the distribution differences between the source domains and the target domain. Most studies in the research field implicitly assume that data distributions from the source domains and the target domain are similar in a well-designed feature space. However, it is often the case that label assignments for data in the source domains and the target domain are significantly different. Therefore, in reality even if the distribution difference between a source domain and a target domain is reduced, the knowledge from multiple source domains is not well transferred to the target domain unless the label information is carefully considered. In addition, noisy data often emerge in real world applications. Therefore, considering how to handle noisy data in the transfer learning setting is a challenging problem, as noisy data inevitably cause a side effect during the knowledge transfer. Due to the above reasons, in this paper, we are motivated to propose a robust framework against noise in the transfer learning setting. We also explicitly consider the difference in data distributions and label assignments among multiple source domains and the target domain. Experimental results on one synthetic data set, three UCI data sets and one real world text data set in different noise levels demonstrate the effectiveness of our method.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Agrawal, R., Gehrke, J., Gunopulos, D., Raghavan, P. (1998). Automatic subspace clustering of high dimensional data for data mining applications. In SIGMOD conference (pp. 94–105).
Ankerst, M., Breunig, M.M., Kriegel, H., Sander, J. (1999). OPTICS: ordering points to identify the clustering structure. SIGMOD Record, 28(2), 49–60.
Argyriou, A., Evgeniou, T., Pontil, M. (2006). Multi-task feature learning. In NIPS (pp. 41–48).
Blitzer, J., McDonald, R., Pereira, F. (2006). Domain adaptation with structural correspondence learning. In EMNLP (pp. 120–128).
Brodley, C.E., & Friedl, M.A. (1999). Identifying mislabeled training data. Journal of Artificial Intelligence Research (JAIR), 11, 131–167.
Cormen, T.H., Stein, C., Rivest, R.L., Leiserson, C.E. (2001). Introduction to algorithms, section 26.2, “The Floyd–Warshall algorithm” (2nd ed., pp. 558–565). McGraw-Hill Higher Education.
Cover, T., & Hart, P. (1967). Nearest neighbor pattern classification. IEEE Transactions on Information Theory, 13(1), 21–27.
Dai, W., Yang, Q., Xue, G.R., Yu, Y. (2007). Boosting for transfer learning. In ICML (pp. 193–200).
Ester, M., Kriegel, H., Sander, J., Xu, X. (1996). A density-based algorithm for discovering clusters in large spatial databases with noise. In KDD (pp. 226–231).
Fellegi, I.P., & Holt, D. (1976). A systematic approach to automatic edit and imputation. Journal of the American Statistical Association, 71(353), 17–35.
Ferri, F.J., Albert, J.V., Vidal, E. (1999). Considerations about sample-size sensitivity of a family of edited nearest-reighbor rules. Transactions on Systems, Man, and Cybernetics, Part B, 29(5), 667–672.
Frommberger, L. (2007). Generalization and transfer learning in noise-affected robot navigation tasks. In EPIA workshops (pp. 508–519).
Gutstein, S., Fuentes, O., Freudenthal, E. (2008). The utility of knowledge transfer for noisy data. In FLAIRS conference (pp. 59–64).
Han, J., & Kamber, M. (2000). Data mining: Concepts and techniques (2nd ed.). Morgan Kaufmann.
Hickey, R.J. (1996). Noise modelling and evaluating learning from examples. Artificial Intelligence, 82(1–2), 157–179.
Hinneburg, A., & Keim, D.A. (1998). An efficient approach to clustering in large multimedia databases with noise. In KDD (pp. 58–65).
Huang, J., Smola, A.J., Gretton, A., Borgwardt, K.M., Schölkopf, B. (2006). Correcting sample selection bias by unlabeled data. In NIPS (pp. 601–608).
Indrajit, B., Godbole, S., Joshi, S., Verma, A. (2009). Cross-guided clustering: transfer of relevant supervision across domains for improved clustering. In ICDM (pp. 41–50).
Joachims, T. (1999). Transductive inference for text classification using support vector machines. In ICML (pp. 200–209).
Lee, J.A., & Verleysen, M. (2007). Nonlinear dimensionality reduction. Springer Science.
Ling, X., Xue, G.R., Dai, W., Jiang, Y., Yang, Q., Yu, Y. (2008). Can Chinese web pages be classified with english data source? In WWW (pp. 969–978).
Liu, Q., Liao, X., Carin, H.L., Stack, J.R., Carin, L. (2009). Semisupervised multitask learning. IEEE Transactions on PAMI, 31, 1074–1086.
Liu, Q., Xu, Q., Zheng, V.W., Xue, H., Cao, Z., Yang, Q. (2010). Multi-task learning for cross-platform siRNA efficacy prediction: an in-silico study. BMC Bioinformatics, 11, 181.
Manning, C.D., & Schütze, H. (1999). Foundations of statistical natural language processing. Cambridge, MA: MIT Press.
Pan, S.J., & Yang, Q. (2010). A survey on transfer learning. IEEE Transactions on Knowledge and Data Engineering, 22, 1345–1359.
Parrish, N., & Gupta, M.R. (2011). Bayesian transfer learning for noisy channels. In IEEE statistical signal processing workshop (pp. 269–272).
Quinlan, J.R. (1986). Induction of decision trees. Machine Learning, 1(1), 81–106.
Rubin, D.B. (1987). Multiple imputation for nonresponse in surveys. New York: Wiley.
Rückert, U., & Kramer, S. (2008). Kernel-based inductive transfer. In ECML/PKDD (pp. 220–233).
Schaffer, C. (1992). Sparse data and the effect of overfitting avoidance in decision tree induction. In AAAI (pp. 147–152).
Schaffer, C. (1993). Overfitting avoidance as bias. Machine Learning, 10, 153–178.
Schiffman, S.S., Reynolds, M.L., Young, F.W. (1981). Introduction to multidimensional scaling: Theory, methods, and applications. New York: Erlbaum Associates.
Schlimmer, J.C., & Granger, R.H. (1986). Incremental learning from noisy data. Machine Learning, 1(3), 317–354.
Schwaighofer, A., Tresp, V., Yu, K. (2004). Learning Gaussian process kernels via hierarchical bayes. In NIPS (pp. 1209–1216)
Shao, H., Tong, B., Suzuki, E. (2011). Compact coding for hyperplane classifiers in heterogeneous environment. In ECML/PKDD (3) (pp. 207–222).
Shi, X., Fan, W., Ren, J. (2008). Actively transfer domain knowledge. In ECML/PKDD (pp. 342–357).
Shi, X., Fan, W., Yang, Q., Ren, J. (2009a). Relaxed transfer of different classes via spectral partition. In ECML/PKDD (pp. 366–381).
Shi, Y., Lan, Z., Liu, W., Bi, W. (2009b). Extending semi-supervised learning methods for inductive transfer learning. In ICDM (pp. 483–492).
Tenenbaum, J.B., Silva, V., Langford, J.C. (2000). A global geometric framework for nonlinear dimensionality reduction. Science, 290(5500), 2319–2323.
Teng, C.M. (1999). Correcting noisy data. In ICML (pp. 239–248).
Vapnik, V.N. (1995). The nature of statistical learning theory. New York, NY: Springer.
Yamazaki, K., Kawanabe, M., Watanabe, S., Sugiyama, M., Müller, K. (2007). Asymptotic Bayesian generalization error when training and test distributions are different. In ICML (pp. 1079–1086).
Zheng, V.W., Pan, S.J., Yang, Q., Pan, J.J. (2008). Transferring multi-device localization models using latent multi-task learning. In AAAI (pp. 1427–1432).
Zhu, X. (2005). Semi-supervised learning literature survey. Technical report, Computer Sciences, University of Wisconsin-Madison.
Zhu, X., & Wu, X. (2005). Cost-constrained data acquisition for intelligent data preparation. IEEE Transactions on Knowledge and Data Engineering, 17(11), 1542–1556.
Zhu, X., Wu, X., Chen, Q. (2003). Eliminating class noise in large datasets. In ICML (pp. 920–927).
Zhu, X., Wu, X., Yang, Y. (2004). Error detection and impact-sensitive instance ranking in noisy datasets. In AAAI (pp. 378–384).
Author information
Authors and Affiliations
Corresponding author
Appendix
Appendix
1.1 A.1 DBSCAN algorithm
The DBSCAN algorithm is proposed by Ester et al. (1996). A summary of the DBSCAN algorithm is shown in Algorithm 4.
1.2 A.2 Multidimensional scale (MDS) algorithm
Multidimensional scaling is a statistical technique to visualize the dissimilarity in data (Schiffman et al. 1981). We summarized the MDS algorithm in Algorithm 6. Given a matrix of pair-wise distances, MDS computes the coordinates for the data. Subsequently, the algorithm performs an eigen-decomposition of the data, and then the top d eigenvectors of the distance matrix are selected to represent the coordinates in the new d-dimensional Euclidean space.
1.3 A.3 Results in table format
This section contains results in table formats. By results in tables, readers have more detail information.
Rights and permissions
About this article
Cite this article
Huy, T.N., Tong, B., Shao, H. et al. Transfer learning by centroid pivoted mapping in noisy environment. J Intell Inf Syst 41, 39–60 (2013). https://doi.org/10.1007/s10844-012-0226-3
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10844-012-0226-3