Classifying Token Frequencies Using Angular Minkowski p-Distance | SpringerLink
Skip to main content

Classifying Token Frequencies Using Angular Minkowski p-Distance

  • Conference paper
  • First Online:
Rough Sets (IJCRS 2023)

Abstract

Angular Minkowski p-distance is a dissimilarity measure that is obtained by replacing Euclidean distance in the definition of cosine dissimilarity with other Minkowski p-distances. Cosine dissimilarity is frequently used with datasets containing token frequencies, and angular Minkowski p-distance may potentially be an even better choice for certain tasks. In a case study based on the 20-newsgroups dataset, we evaluate classification performance for classical weighted nearest neighbours, as well as fuzzy rough nearest neighbours. In addition, we analyse the relationship between the hyperparameter p, the dimensionality m of the dataset, the number of neighbours k, the choice of weights and the choice of classifier. We conclude that it is possible to obtain substantially higher classification performance with angular Minkowski p-distance with suitable values for p than with classical cosine dissimilarity.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
JPY 3498
Price includes VAT (Japan)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
JPY 10295
Price includes VAT (Japan)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
JPY 12869
Price includes VAT (Japan)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Aggarwal, C.C., Hinneburg, A., Keim, D.A.: On the surprising behavior of distance metrics in high dimensional space. In: Van den Bussche, J., Vianu, V. (eds.) ICDT 2001. LNCS, vol. 1973, pp. 420–434. Springer, Heidelberg (2001). https://doi.org/10.1007/3-540-44503-x_27

    Chapter  Google Scholar 

  2. Dudani, S.A.: An experimental study of moment methods for automatic identification of three-dimensional objects from television images. Ph.D. thesis, The Ohio State University (1973)

    Google Scholar 

  3. Dudani, S.A.: The distance-weighted \(k\)-nearest-neighbor rule. IEEE Trans. Syst. Man Cybern. 6(4), 325–327 (1976)

    Article  Google Scholar 

  4. Fix, E., Hodges, Jr, J.: Discriminatory analysis — nonparametric discrimination: Consistency properties. Technical report 21-49-004, USAF School of Aviation Medicine, Randolph Field, Texas (1951). https://apps.dtic.mil/sti/citations/ADA800276

  5. France, S.L., Carroll, J.D., Xiong, H.: Distance metrics for high dimensional nearest neighborhood recovery: compression and normalization. Inf. Sci. 184(1), 92–110 (2012)

    Article  MathSciNet  Google Scholar 

  6. Hand, D.J., Till, R.J.: A simple generalisation of the area under the ROC curve for multiple class classification problems. Mach. Learn. 45(2), 171–186 (2001)

    Article  Google Scholar 

  7. Jensen, R., Cornelis, C.: A new approach to fuzzy-rough nearest neighbour classification. In: Chan, C.-C., Grzymala-Busse, J.W., Ziarko, W.P. (eds.) RSCTC 2008. LNCS (LNAI), vol. 5306, pp. 310–319. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-88425-5_32

    Chapter  Google Scholar 

  8. Joachims, T.: A probabilistic analysis of the Rocchio algorithm with TFIDF for text categorization. Technical report CMS-CS-96-118, Carnegie Mellon University, School of Computer Science, Pittsburgh (1996)

    Google Scholar 

  9. Kaminska, O., Cornelis, C., Hoste, V.: Fuzzy rough nearest neighbour methods for detecting emotions, hate speech and irony. Inf. Sci. 625, 521–535 (2023)

    Article  Google Scholar 

  10. Lenz, O.U.: Fuzzy rough nearest neighbour classification on real-life datasets. Doctoral thesis, Universiteit Gent (2023)

    Google Scholar 

  11. Pedregosa, F., et al.: Scikit-learn: machine learning in python. J. Mach. Learn. Res. 12(85), 2825–2830 (2011)

    MathSciNet  Google Scholar 

  12. Rosner, B.S.: A new scaling technique for absolute judgments. Psychometrika 21(4), 377–381 (1956)

    Article  MathSciNet  Google Scholar 

  13. Salton, G.: Some experiments in the generation of word and document associations. In: Proceedings of the 1962 Fall Joint Computer Conference. AFIPS Conference Proceedings, vol. 22, pp. 234–250. Spartan Books (1962)

    Google Scholar 

Download references

Acknowledgements

The research reported in this paper was conducted with the financial support of the Odysseus programme of the Research Foundation – Flanders (FWO).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Oliver Urs Lenz .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Lenz, O.U., Cornelis, C. (2023). Classifying Token Frequencies Using Angular Minkowski p-Distance. In: Campagner, A., Urs Lenz, O., Xia, S., Ślęzak, D., Wąs, J., Yao, J. (eds) Rough Sets. IJCRS 2023. Lecture Notes in Computer Science(), vol 14481. Springer, Cham. https://doi.org/10.1007/978-3-031-50959-9_28

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-50959-9_28

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-50958-2

  • Online ISBN: 978-3-031-50959-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics