Transfer learning by centroid pivoted mapping in noisy environment | Journal of Intelligent Information Systems Skip to main content
Log in

Transfer learning by centroid pivoted mapping in noisy environment

  • Published:
Journal of Intelligent Information Systems Aims and scope Submit manuscript

Abstract

Transfer learning is a widely investigated learning paradigm that is initially proposed to reuse informative knowledge from related domains, as supervised information in the target domain is scarce while it is sufficiently available in the multiple source domains. One of the challenging issues in transfer learning is how to handle the distribution differences between the source domains and the target domain. Most studies in the research field implicitly assume that data distributions from the source domains and the target domain are similar in a well-designed feature space. However, it is often the case that label assignments for data in the source domains and the target domain are significantly different. Therefore, in reality even if the distribution difference between a source domain and a target domain is reduced, the knowledge from multiple source domains is not well transferred to the target domain unless the label information is carefully considered. In addition, noisy data often emerge in real world applications. Therefore, considering how to handle noisy data in the transfer learning setting is a challenging problem, as noisy data inevitably cause a side effect during the knowledge transfer. Due to the above reasons, in this paper, we are motivated to propose a robust framework against noise in the transfer learning setting. We also explicitly consider the difference in data distributions and label assignments among multiple source domains and the target domain. Experimental results on one synthetic data set, three UCI data sets and one real world text data set in different noise levels demonstrate the effectiveness of our method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (Japan)

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

Notes

  1. http://archive.ics.uci.edu/ml/

  2. http://people.csail.mit.edu/jrennie/20Newsgroups/

References

  • Agrawal, R., Gehrke, J., Gunopulos, D., Raghavan, P. (1998). Automatic subspace clustering of high dimensional data for data mining applications. In SIGMOD conference (pp. 94–105).

  • Ankerst, M., Breunig, M.M., Kriegel, H., Sander, J. (1999). OPTICS: ordering points to identify the clustering structure. SIGMOD Record, 28(2), 49–60.

    Article  Google Scholar 

  • Argyriou, A., Evgeniou, T., Pontil, M. (2006). Multi-task feature learning. In NIPS (pp. 41–48).

  • Blitzer, J., McDonald, R., Pereira, F. (2006). Domain adaptation with structural correspondence learning. In EMNLP (pp. 120–128).

  • Brodley, C.E., & Friedl, M.A. (1999). Identifying mislabeled training data. Journal of Artificial Intelligence Research (JAIR), 11, 131–167.

    MATH  Google Scholar 

  • Cormen, T.H., Stein, C., Rivest, R.L., Leiserson, C.E. (2001). Introduction to algorithms, section 26.2, “The Floyd–Warshall algorithm” (2nd ed., pp. 558–565). McGraw-Hill Higher Education.

  • Cover, T., & Hart, P. (1967). Nearest neighbor pattern classification. IEEE Transactions on Information Theory, 13(1), 21–27.

    Article  MATH  Google Scholar 

  • Dai, W., Yang, Q., Xue, G.R., Yu, Y. (2007). Boosting for transfer learning. In ICML (pp. 193–200).

  • Ester, M., Kriegel, H., Sander, J., Xu, X. (1996). A density-based algorithm for discovering clusters in large spatial databases with noise. In KDD (pp. 226–231).

  • Fellegi, I.P., & Holt, D. (1976). A systematic approach to automatic edit and imputation. Journal of the American Statistical Association, 71(353), 17–35.

    Article  Google Scholar 

  • Ferri, F.J., Albert, J.V., Vidal, E. (1999). Considerations about sample-size sensitivity of a family of edited nearest-reighbor rules. Transactions on Systems, Man, and Cybernetics, Part B, 29(5), 667–672.

    Article  Google Scholar 

  • Frommberger, L. (2007). Generalization and transfer learning in noise-affected robot navigation tasks. In EPIA workshops (pp. 508–519).

  • Gutstein, S., Fuentes, O., Freudenthal, E. (2008). The utility of knowledge transfer for noisy data. In FLAIRS conference (pp. 59–64).

  • Han, J., & Kamber, M. (2000). Data mining: Concepts and techniques (2nd ed.). Morgan Kaufmann.

  • Hickey, R.J. (1996). Noise modelling and evaluating learning from examples. Artificial Intelligence, 82(1–2), 157–179.

    Article  MathSciNet  Google Scholar 

  • Hinneburg, A., & Keim, D.A. (1998). An efficient approach to clustering in large multimedia databases with noise. In KDD (pp. 58–65).

  • Huang, J., Smola, A.J., Gretton, A., Borgwardt, K.M., Schölkopf, B. (2006). Correcting sample selection bias by unlabeled data. In NIPS (pp. 601–608).

  • Indrajit, B., Godbole, S., Joshi, S., Verma, A. (2009). Cross-guided clustering: transfer of relevant supervision across domains for improved clustering. In ICDM (pp. 41–50).

  • Joachims, T. (1999). Transductive inference for text classification using support vector machines. In ICML (pp. 200–209).

  • Lee, J.A., & Verleysen, M. (2007). Nonlinear dimensionality reduction. Springer Science.

  • Ling, X., Xue, G.R., Dai, W., Jiang, Y., Yang, Q., Yu, Y. (2008). Can Chinese web pages be classified with english data source? In WWW (pp. 969–978).

  • Liu, Q., Liao, X., Carin, H.L., Stack, J.R., Carin, L. (2009). Semisupervised multitask learning. IEEE Transactions on PAMI, 31, 1074–1086.

    Article  Google Scholar 

  • Liu, Q., Xu, Q., Zheng, V.W., Xue, H., Cao, Z., Yang, Q. (2010). Multi-task learning for cross-platform siRNA efficacy prediction: an in-silico study. BMC Bioinformatics, 11, 181.

    Article  Google Scholar 

  • Manning, C.D., & Schütze, H. (1999). Foundations of statistical natural language processing. Cambridge, MA: MIT Press.

    MATH  Google Scholar 

  • Pan, S.J., & Yang, Q. (2010). A survey on transfer learning. IEEE Transactions on Knowledge and Data Engineering, 22, 1345–1359.

    Article  Google Scholar 

  • Parrish, N., & Gupta, M.R. (2011). Bayesian transfer learning for noisy channels. In IEEE statistical signal processing workshop (pp. 269–272).

  • Quinlan, J.R. (1986). Induction of decision trees. Machine Learning, 1(1), 81–106.

    Google Scholar 

  • Rubin, D.B. (1987). Multiple imputation for nonresponse in surveys. New York: Wiley.

    Book  Google Scholar 

  • Rückert, U., & Kramer, S. (2008). Kernel-based inductive transfer. In ECML/PKDD (pp. 220–233).

  • Schaffer, C. (1992). Sparse data and the effect of overfitting avoidance in decision tree induction. In AAAI (pp. 147–152).

  • Schaffer, C. (1993). Overfitting avoidance as bias. Machine Learning, 10, 153–178.

    Google Scholar 

  • Schiffman, S.S., Reynolds, M.L., Young, F.W. (1981). Introduction to multidimensional scaling: Theory, methods, and applications. New York: Erlbaum Associates.

    MATH  Google Scholar 

  • Schlimmer, J.C., & Granger, R.H. (1986). Incremental learning from noisy data. Machine Learning, 1(3), 317–354.

    Google Scholar 

  • Schwaighofer, A., Tresp, V., Yu, K. (2004). Learning Gaussian process kernels via hierarchical bayes. In NIPS (pp. 1209–1216)

  • Shao, H., Tong, B., Suzuki, E. (2011). Compact coding for hyperplane classifiers in heterogeneous environment. In ECML/PKDD (3) (pp. 207–222).

  • Shi, X., Fan, W., Ren, J. (2008). Actively transfer domain knowledge. In ECML/PKDD (pp. 342–357).

  • Shi, X., Fan, W., Yang, Q., Ren, J. (2009a). Relaxed transfer of different classes via spectral partition. In ECML/PKDD (pp. 366–381).

  • Shi, Y., Lan, Z., Liu, W., Bi, W. (2009b). Extending semi-supervised learning methods for inductive transfer learning. In ICDM (pp. 483–492).

  • Tenenbaum, J.B., Silva, V., Langford, J.C. (2000). A global geometric framework for nonlinear dimensionality reduction. Science, 290(5500), 2319–2323.

    Article  Google Scholar 

  • Teng, C.M. (1999). Correcting noisy data. In ICML (pp. 239–248).

  • Vapnik, V.N. (1995). The nature of statistical learning theory. New York, NY: Springer.

    Book  MATH  Google Scholar 

  • Yamazaki, K., Kawanabe, M., Watanabe, S., Sugiyama, M., Müller, K. (2007). Asymptotic Bayesian generalization error when training and test distributions are different. In ICML (pp. 1079–1086).

  • Zheng, V.W., Pan, S.J., Yang, Q., Pan, J.J. (2008). Transferring multi-device localization models using latent multi-task learning. In AAAI (pp. 1427–1432).

  • Zhu, X. (2005). Semi-supervised learning literature survey. Technical report, Computer Sciences, University of Wisconsin-Madison.

  • Zhu, X., & Wu, X. (2005). Cost-constrained data acquisition for intelligent data preparation. IEEE Transactions on Knowledge and Data Engineering, 17(11), 1542–1556.

    Article  Google Scholar 

  • Zhu, X., Wu, X., Chen, Q. (2003). Eliminating class noise in large datasets. In ICML (pp. 920–927).

  • Zhu, X., Wu, X., Yang, Y. (2004). Error detection and impact-sensitive instance ranking in noisy datasets. In AAAI (pp. 378–384).

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Thach Nguyen Huy.

Appendix

Appendix

1.1 A.1 DBSCAN algorithm

The DBSCAN algorithm is proposed by Ester et al. (1996). A summary of the DBSCAN algorithm is shown in Algorithm 4.

1.2 A.2 Multidimensional scale (MDS) algorithm

Multidimensional scaling is a statistical technique to visualize the dissimilarity in data (Schiffman et al. 1981). We summarized the MDS algorithm in Algorithm 6. Given a matrix of pair-wise distances, MDS computes the coordinates for the data. Subsequently, the algorithm performs an eigen-decomposition of the data, and then the top d eigenvectors of the distance matrix are selected to represent the coordinates in the new d-dimensional Euclidean space.

1.3 A.3 Results in table format

This section contains results in table formats. By results in tables, readers have more detail information.

Table 1 Experimental results of the synthetic data set
Table 2 Experimental results of mushroom data set
Table 3 Experimental results of kr vs kp data set
Table 4 Experimental results of splice data set
Table 5 Experimental results of rec vs talk data set
Table 6 Experimental results of rec vs sci data set
Table 7 Experimental results of sci vs talk data set

Rights and permissions

Reprints and permissions

About this article

Cite this article

Huy, T.N., Tong, B., Shao, H. et al. Transfer learning by centroid pivoted mapping in noisy environment. J Intell Inf Syst 41, 39–60 (2013). https://doi.org/10.1007/s10844-012-0226-3

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10844-012-0226-3

Keywords

Navigation