Application-Independent Feature Construction from Noisy Samples | SpringerLink
Skip to main content

Application-Independent Feature Construction from Noisy Samples

  • Conference paper
Advances in Knowledge Discovery and Data Mining (PAKDD 2009)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5476))

Included in the following conference series:

  • 3289 Accesses

Abstract

When training classifiers, presence of noise can severely harm their performance. In this paper, we focus on “non-class” attribute noise and we consider how a frequent fault-tolerant (FFT) pattern mining task can be used to support noise-tolerant classification. Our method is based on an application independent strategy for feature construction based on the so-called δ-free patterns. Our experiments on noisy training data shows accuracy improvement when using the computed features instead of the original ones.

This work is partly funded by French ANR contract MDCO-2007 Bingo2.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Zhu, X., Wu, X.: Class noise vs. attribute noise : A quantitative study. Artificial Intelligence Revue 22, 177–210 (2004)

    Article  MATH  Google Scholar 

  2. Rebbapragada, U., Brodley, C.E.: Class noise mitigation through instance weighting. In: Kok, J.N., Koronacki, J., Lopez de Mantaras, R., Matwin, S., Mladenič, D., Skowron, A. (eds.) ECML 2007. LNCS, vol. 4701, pp. 708–715. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  3. Kubica, J., Moore, A.W.: Probabilistic noise identification and data cleaning. In: Proceedings ICDM 2003, pp. 131–138. IEEE Computer Society, Los Alamitos (2003)

    Google Scholar 

  4. Zhang, Y., Wu, X.: Noise modeling with associative corruption rules. In: Proceedings ICDM 2007, pp. 733–738. IEEE Computer Society, Los Alamitos (2007)

    Google Scholar 

  5. Yang, Y., Wu, X., Zhu, X.: Dealing with predictive-but-unpredictable attributes in noisy data sources. In: Boulicaut, J.-F., Esposito, F., Giannotti, F., Pedreschi, D. (eds.) PKDD 2004. LNCS, vol. 3202, pp. 471–483. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  6. Yang, C., Fayyad, U.M., Bradley, P.S.: Efficient discovery of error-tolerant frequent itemsets in high dimensions. In: Proceedings KDD 2001, pp. 194–203. ACM Press, New York (2001)

    Google Scholar 

  7. Besson, J., Pensa, R.G., Robardet, C., Boulicaut, J.F.: Constraint-based mining of fault-tolerant patterns from boolean data. In: Bonchi, F., Boulicaut, J.-F. (eds.) KDID 2005. LNCS, vol. 3933, pp. 55–71. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  8. Pensa, R.G., Robardet, C., Boulicaut, J.F.: Supporting bi-cluster interpretation in 0/1 data by means of local patterns. Intelligent Data Analysis 10, 457–472 (2006)

    Google Scholar 

  9. Boulicaut, J.F., Bykowski, A., Rigotti, C.: Approximation of frequency queries by means of free-sets. In: Zighed, D.A., Komorowski, J., Żytkow, J.M. (eds.) PKDD 2000. LNCS, vol. 1910, pp. 75–85. Springer, Heidelberg (2000)

    Chapter  Google Scholar 

  10. Selmaoui, N., Leschi, C., Gay, D., Boulicaut, J.F.: Feature construction and delta-free sets in 0/1 samples. In: Todorovski, L., Lavrač, N., Jantke, K.P. (eds.) DS 2006. LNCS, vol. 4265, pp. 363–367. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  11. Gay, D., Selmaoui, N., Boulicaut, J.F.: Feature construction based on closedness properties is not that simple. In: Washio, T., Suzuki, E., Ting, K.M., Inokuchi, A. (eds.) PAKDD 2008. LNCS, vol. 5012, pp. 112–123. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  12. Cheng, H., Yan, X., Han, J., Hsu, C.W.: Discriminative frequent pattern analysis for effective classification. In: Proceedings ICDE 2007, pp. 716–725. IEEE Computer Society, Los Alamitos (2007)

    Google Scholar 

  13. Liu, B., Hsu, W., Ma, Y.: Integrating classification and association rule mining. In: Proceedings KDD 1998, pp. 80–86. AAAI Press, Menlo Park (1998)

    Google Scholar 

  14. Dong, G., Li, J.: Efficient mining of emerging patterns: discovering trends and differences. In: Proceedings KDD 1999, pp. 43–52. ACM Press, New York (1999)

    Google Scholar 

  15. Ramamohanarao, K., Fan, H.: Patterns based classifiers. World Wide Web 10, 71–83 (2007)

    Article  Google Scholar 

  16. Fan, H., Ramamohanarao, K.: Noise tolerant classification by chi emerging patterns. In: Dai, H., Srikant, R., Zhang, C. (eds.) PAKDD 2004. LNCS, vol. 3056, pp. 201–206. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  17. Bastide, Y., Taouil, R., Pasquier, N., Stumme, G., Lakhal, L.: Mining frequent patterns with counting inference. SIGKDD Explorations 2, 66–75 (2000)

    Article  MATH  Google Scholar 

  18. Crémilleux, B., Boulicaut, J.F.: Simplest rules characterizing classes generated by delta-free sets. In: Proceedings ES 2002, pp. 33–46. Springer, Heidelberg (2002)

    Google Scholar 

  19. Hébert, C., Crémilleux, B.: Optimized rule mining through a unified framework for interestingness measures. In: Tjoa, A.M., Trujillo, J. (eds.) DaWaK 2006. LNCS, vol. 4081, pp. 238–247. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  20. Fayyad, U.M., Irani, K.B.: Multi-interval discretization of continous-valued attributes for classification learning. In: Proceedings IJCAI 1993, pp. 1022–1027. Morgan Kaufmann, San Francisco (1993)

    Google Scholar 

  21. Quinlan, J.R.: C4.5: Programs for machine learning. Morgan Kaufmann, San Francisco (1993)

    Google Scholar 

  22. Witten, I.H., Frank, E.: Data Mining: Practical machine learning tools and techniques, 2nd edn. Morgan Kaufmann, San Francisco (2005)

    MATH  Google Scholar 

  23. Cerf, L., Gay, D., Selmaoui, N., Boulicaut, J.F.: A parameter free associative classifier. In: Song, I.-Y., Eder, J., Nguyen, T.M. (eds.) DaWaK 2008. LNCS, vol. 5182, pp. 293–304. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  24. Zhang, S., Wu, X., Zhang, C., Lu, J.: Computing the minimum-support for mining frequent patterns. Knowledge and Information Systems 15, 233–257 (2008)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Gay, D., Selmaoui, N., Boulicaut, JF. (2009). Application-Independent Feature Construction from Noisy Samples. In: Theeramunkong, T., Kijsirikul, B., Cercone, N., Ho, TB. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2009. Lecture Notes in Computer Science(), vol 5476. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-01307-2_102

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-01307-2_102

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-01306-5

  • Online ISBN: 978-3-642-01307-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics