On kNN Class Weights for Optimising G-Mean and F1-Score

Góra, Grzegorz; Skowron, Andrzej

doi:10.1007/978-3-031-50959-9_29

Grzegorz Góra¹³ &
Andrzej Skowron¹⁴

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 14481))

Included in the following conference series:

International Joint Conference on Rough Sets

490 Accesses
1 Citations

Abstract

We present two novel theorems that allow for estimating the weight parameter in (weighted) kNN while dealing with imbalanced data. More precisely, the theorems for G-mean and F1-score are presented. The theorems assume ‘totally random’ distribution i.e. lack of dependency between features and class value.

These results can be used for setting the default weights of classes for kNN-type classifiers, e.g. for imbalanced learning problems. Moreover, these theorems taken together illustrate the fact that without a precise specification of the particular performance measure we are interested in, the ‘best classifier’ term can be ambiguous or even misleading.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

¥17,985 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: JPY 3498; Price includes VAT (Japan)

eBook: JPY 10295; Price includes VAT (Japan)

Softcover Book: JPY 12869; Price includes VAT (Japan)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

SAMME.C2 algorithm for imbalanced multi-class classification

Article 24 July 2024

Optimal Thresholding of Classifiers to Maximize F1 Measure

Selecting locally specialised classifiers for one-class classification ensembles

Article Open access 26 July 2015

Notes

1.
It corresponds to the case \(F_\beta \text {-} measure\) when \(\beta =1\).
2.
\(I(C(x)=d)=1-L(C(x),d)\), where L is the 0–1 loss function (equal to 0, if C correctly classifies the given example (x, d) and 1, otherwise).
3.
Formally, \(\Pr _{(x,d)\sim \mathcal {D}}(Alg(trnSet)(x)=d)\) is a random variable, where trnSet is fixed. It can also be seen as the conditional probability on \(\mathcal {D}^{n} \times \mathcal {D}\) given a training set trnSet, i.e. \(\Pr _{trnSet\sim \mathcal {D}^{n}, (x,d)\sim \mathcal {D}}(Alg(trnSet)(x)=d \mid trnSet)\).
4.
Here and later for F1-score, we determine the average performance using averaged values of the submeasures. One could also consider computing the average of these performance measures directly, what seems more appropriate. In such case the analogous computations of the proofs seem to be much more hard mathematical task.
5.
In practice, such an assumption is satisfied for a wide range of values of the parameter p.
6.
This assumption is important only from the technical point of view. If it is not satisfied we should consider the percentage of the minority class coming from borderline region (value q from theorem).
7.
Only mammography data set is not publicly available and was supported by Nitesh Chawla [1].
8.
It is reasonable that for imbalanced data p should be not greater than 0.5 since minority class should have greater weight than majority class.

References

Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002). https://doi.org/10.1613/jair.953
Article Google Scholar
Domingos, P.: Unifying instance-based and rule-based induction. Mach. Learn. 24(2), 141–168 (1996). https://doi.org/10.1007/BF00058656
Article MathSciNet Google Scholar
Dubey, H., Pudi, V.: Class based weighted k-nearest neighbor over imbalance dataset. In: Pei, J., Tseng, V.S., Cao, L., Motoda, H., Xu, G. (eds.) PAKDD 2013. LNCS (LNAI), vol. 7819, pp. 305–316. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-37456-2_26
Chapter Google Scholar
G. Góra, Skowron, A., Wojna, A.: Explainability in RIONA algorithm combining rule induction and instance-based learning. In: Ganzha, M., Maciaszek, L.A., Paprzycki, M., Ślȩzak, D. (eds.) Proceedings of the 18th Conference on Computer Science and Intelligence Systems, FedCSIS 2023, Warsaw, Poland, September 17–20, 2023. Annals of Computer Science and Information Systems, vol. 31, pp. 485–496. IEEE (2023). https://annals-csis.org/proceedings/2023/
Góra, G.: Combining instance-based learning and rule-based methods for imbalanced data. Ph.D. thesis, University of Warsaw, Warsaw (2022). https://www.mimuw.edu.pl/sites/default/files/gora_grzegorz_rozprawa_doktorska.pdf
Hamza, K.: The smallest uniform upper bound on the distance between the mean and the median of the binomial and Poisson distributions. Stat. Probab. Lett. 23(1), 21–25 (1995). https://doi.org/10.1016/0167-7152(94)00090-U
Article MathSciNet Google Scholar
He, H., Ma, Y.: Imbalanced Learning: Foundations, Algorithms, and Applications. Wiley-IEEE Press, Piscataway, NJ, 1st edn. (2013). https://doi.org/10.1002/9781118646106
Lichman, M.: UCI Machine Learning Repository (2013). http://archive.ics.uci.edu/ml
Maennel, H., et al.: What do neural networks learn when trained with random labels? (2020). https://doi.org/10.48550/ARXIV.2006.10455
Mitchell, T.M.: Machine Learning. McGraw-Hill, New York (1997)
Google Scholar
Napierała, K.: Improving Rule Classifiers For Imbalanced Data. Ph.D. thesis, Poznań University of Technology, Poznań (2012)
Google Scholar
Napierała, K., Stefanowski, J.: Types of minority class examples and their influence on learning classifiers from imbalanced data. J. Intell. Inf. Syst. 46(3), 563–597 (2016). https://doi.org/10.1007/s10844-015-0368-1
Article Google Scholar
Raeder, T., Forman, G., Chawla, N.V.: Learning from imbalanced data: evaluation matters. In: Holmes, D.E., Jain, L.C. (eds.) Data Mining: Foundations and Intelligent Paradigms. igms. Intelligent Systems Reference Library, vol. 23, pp. 315–331. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-23166-7_12
Chapter Google Scholar
Wang, L., Khan, L., Thuraisingham, B.: An effective evidence theory based k-nearest neighbor (KNN) classification. In: 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology, vol. 1, pp. 797–801. IEEE (2008)
Google Scholar
Wolpert, D.H.: The supervised learning no-free-lunch theorems. In: Roy, R., Klöppen, M., Ovaska, S., Furuhashi, T., Hoffmann, F. (eds.) Soft Computing and Industry, pp. 25–42. Springer, London (2002). https://doi.org/10.1007/978-1-4471-0123-9_3
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

University of Warsaw, Stefana Banacha 2, 02-097, Warszawa, Poland
Grzegorz Góra
Systems Research Institute PAS, Newelska 6, 01-447, Warszawa, Poland
Andrzej Skowron

Authors

Grzegorz Góra
View author publications
You can also search for this author in PubMed Google Scholar
Andrzej Skowron
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Grzegorz Góra .

Editor information

Editors and Affiliations

IRCCS Istituto Ortopedico Galeazzi, Milano, Italy
Andrea Campagner
Ghent University, Ghent, Belgium
Oliver Urs Lenz
Chongqing University of Posts and Telecommunications, Chongqing, China
Shuyin Xia
University of Warsaw, Warsaw, Poland
Dominik Ślęzak
AGH University of Science and Technology, Kraków, Poland
Jarosław Wąs
University of Regina, Regina, SK, Canada
JingTao Yao

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Góra, G., Skowron, A. (2023). On kNN Class Weights for Optimising G-Mean and F1-Score. In: Campagner, A., Urs Lenz, O., Xia, S., Ślęzak, D., Wąs, J., Yao, J. (eds) Rough Sets. IJCRS 2023. Lecture Notes in Computer Science(), vol 14481. Springer, Cham. https://doi.org/10.1007/978-3-031-50959-9_29

Download citation

DOI: https://doi.org/10.1007/978-3-031-50959-9_29
Published: 31 December 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-50958-2
Online ISBN: 978-3-031-50959-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

On kNN Class Weights for Optimising G-Mean and F1-Score

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

SAMME.C2 algorithm for imbalanced multi-class classification

Optimal Thresholding of Classifiers to Maximize F1 Measure

Selecting locally specialised classifiers for one-class classification ensembles

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

On kNN Class Weights for Optimising G-Mean and F1-Score

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

SAMME.C2 algorithm for imbalanced multi-class classification

Optimal Thresholding of Classifiers to Maximize F1 Measure

Selecting locally specialised classifiers for one-class classification ensembles

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation