A Fuzzy Similarity Based Classification with Archimedean-Dombi Aggregation Operator
Abstract:
The term "classification" refers to a supervised learning technique in which samples are given class labels based on predetermined classes. Fuzzy classifiers are renowned for their ability to address the issue of outliers and deliver the performance resilience that is much needed. The major goal of this study is to provide a classification algorithm that is effective and accurate. In this work, we address Archimedean-Dombi aggregation operator by extending the similarity classifier. Earlier, Dombi operators were used to study the similarity classifier. We focus on the application of Archimedean-Dombi operators during the classifier's aggregate similarity calculation. Since Archimedean and Dombi operators are well-known for offering appropriate generalization and flexibility respectively in aggregating data, so a different version of the similarity classifier is created. One real-world medical dataset, namely Parkinson disease data set is used to test the proposed approaches. When compared to older existing operators, the new classifiers have better classification accuracy.
1. Introduction
In this study, the term "classification" refers to a supervised technique where samples are given class labels based on predetermined classes [1]. The issue domain provides well-established class labels where we can assign new samples. The assumption in the traditional meaning is that every sample is a member of a single class, while fuzzy techniques connect samples to classes based on the belongingness degree [2]. There is a substantial body of classification theory that encompasses decades of productive research trends [3], [4]. Work on pattern categorization and other domains has increased as a result of the necessity to create automated systems in most industries [5], [6], [7], [8], [9]. Fuzzy classifiers are renowned for their ability to address the issue of outliers and deliver the performance resilience that is much needed. The fuzzy set theoretic technique is used by the similarity-based classifier that this work examines.
The major goal of this study is to provide categorization algorithms that are effective and accurate. We suggest a generalized variation of a similarity-based classifier (SBC) [10] that combines similarities using Dombi operators. This is an expansion of the research looked at in the study [11], where the ordered weighted averaging (OWA) operator was used to study the similarity classifier. Weight creation for use with quantifiers was a challenge for the SBC employing the OWA operator faced [11]. We investigate Dombi aggregation operators' [12] application in the similarity classifier in light of the fact that they don't call for any weight generating criteria. Dombi aggregation operators were created by parameterizing the triangle norms and conorms that were first introduced in 1982 [13]. These operators can be used for modelling and other applications that call for parameter settings because they have a configurable parameter. Modeling problems involving multiple attribute decision making (MADM) has benefited from the use of Dombi operators [14], [15], [16], [17]. Dombi operators were developed to include neutro- sophic sets and associated fields, and they have been used to solve real-world problems [18], [19], [20], [21].
Real-world situations frequently involve data that has ambiguous boundaries and associated uncertainty. Researchers' top priority has always been to manage these uncertainties. Numerous contributions have been made toward that end, but Zadeh's introduction of the concept of fuzzy sets (FSs) [22] marked the beginning of a real endeavor. Since then, FS theory has been used in various fields, including pattern detection [23], medical diagnosis [24], and decision-making [25]. Numerous extensions of fuzzy sets [26], [27], [28], [29], [30], [31], [32] have been developed while keeping in mind the significance of fuzzy sets. These are all current fields of study that have been used to solve issues in the actual world [33].
“In this study, we incorporate the Archimedean-Dombi operator system into the similarity classifier. Building ideal (mean) vectors for representing every class in the training set is the primary task for a SBC. From there, classification choices for each sample in the testing set are made using these ideal (mean) vectors. Algebraic, Einstein, Hamachar cases are all contained in the operator system of Archimedean-Dombi. Earlier, the generalised mean [34], OWA operator [11], Dombi operator [35], and other aggregation operators were examined with the similarity classifier. We will demonstrate that, when evaluated on actual data sets, the application of Archimedean-Dombi operators produces better results than the previously studied (conventional) similarity classifier. MS-EXCEL software is used for implementations and visualizations.”
The following are a summary of this work's significant contributions:
(i) An algorithm has been developed around a novel categorization model. In this model, an Archimedean-Dombi similarity classifier is used.
(ii) On real-world data sets, the proposed similarity classifier has been applied and tested.
Following is a summary of the remaining paper.
We introduce some significant and essential concepts related to our study in Section 2. In Section3, we design a methodology for similarity based classification with fuzzy Archimedean-Dombi operator. To clarify the created method, we use a case study related to Parkinson’s disease in Section 4. We draw some conclusions from the entire study and provide a summary of the prospects for the future in Section 5.
2. Basic Concepts
Here, we recall all relevant concepts.
In an effort to explore statistical metric spaces, Menger [36] originally introduced t-norms and t-conorms in his work for generalizing the tri-angle inequality from classical metric spaces to statistical metric spaces [37]. The axioms of t-norms and t-conorms that we cited here which were originally developed by Schweizer and Sklar [38] developed. Later, as general aggregation operators, Zimmermann and Zysno [39] examined these procedures. Since then, other t- norm and t-conorm varieties have been created [37]. Throughout the paper we shall use I to denote [0, 1].
Definition 1 [37]: A fuzzy t-norm $g: I \times I \rightarrow I$ is a mapping that holds the postulates in this manner:
(i) $g(q, 1)=q$ for $q \in I$,
(ii) $g(q, r) \leq g\left(q^{\prime}, r^{\prime}\right)$ provided $q \leq q^{\prime}, r \leq r^{\prime}$ for $q, q^{\prime}, r, r^{\prime} \in I$,
(iii) $g(q, r)=g(r, q)$ for $q, r \in I$,
(iv) $g(q, g(r, s))=g(g(q, r), s)$ for $q, r, s \in I$.
Definition 2 [37]: A fuzzy t-conorm $h: I \times I \rightarrow I$ is a mapping that holds the postulates as follows:
(i) $h(q, 0)=t$ for $q \in I$,
(ii) $h(q, r) \leq h\left(q^{\prime}, r^{\prime}\right)$ provided $q \leq q^{\prime}, r \leq r^{\prime}$ for $q, q^{\prime}, r, r^{\prime} \in I$,
(iii) $h(t, r)=h(r, t)$ for $q, r \in I$,
(iv) $h(t, h(r, s))=h(h(t, r), s)$ for $q, r, s \in I$.
Definition 3 [37]: A t-norm mapping $g(q, r)$ is said to be strictly Archimedean t-norm if it is continuous, $g(q, q)<q$ for $q \in(0,1)$ and strictly increasing for $q, r \in(0,1)$.
Definition 4 [37]: A t-conorm mapping $h(t, r)$ is said to be strictly Archimedean t-norm if it is continuous, $h(q, q)<q$ for $q \in(0,1)$ and strictly increasing for $q, r \in(0,1)$.
Definition 5 [37]: Suppose $\theta:(0,1] \rightarrow R$ is a continuous mapping such that θ is strictly decreasing. Then a strictly Archimedean t-norm is given by $\delta\left(x, x^{\prime}\right)=\theta^{-1}\left(\theta(x)+\theta\left(x^{\prime}\right)\right)$ for $x, x^{\prime} \in(0,1]$.
Definition 6 [37]: Suppose $\psi:[0,1) \rightarrow R$ is a continuous mapping such that $\psi(l)=\theta(1-l)$ for $l \in[0,1)$ and ψ is strictly increasing. Then a strictly Archimedean t-conorm is defined by $\rho\left(x, x^{\prime}\right)=\psi^{-1}\left(\psi(x)+\psi\left(x^{\prime}\right)\right)$ for $x, x^{\prime} \in(0,1]$.
Definition 7 [12]: For any two real numbers x and y in [0, 1], the Dombi conjunctive operator is defined as:
Definition 8 [12]: For any two real numbers x and y in [0, 1], the Dombi disjunctive operator is defined as:
A generalization of the idea of equivalency is the concept of similarity. In the area of fuzzy logic, Zadeh [40] provided a concept of similarity relations that is connected to the traditional example of equivalence relations. The concept by Zadeh offers a framework for object comparison and eventual computing of similarities between them. However, there are a number of additional metrics used in data analysis to compare objects, the majority of which are based on distance [2]. The concepts of distance (or metric) and similarity are closely related to one another [41]. Generalizations to more arguments can be done successfully when they are equipped with a similarity relation in a binary situation. Lukasiewicz [42] created a system that allows for the analysis of similarities between numerous objects. The fact that the mean of several similarities in a Lukasiewicz -structure is still a similarity [43] encourages the usage of classifiers in fuzzy set theory. We employ SMs in the generalized Lukasiewicz -structure in the classifier design with Archimedean-Dombi operators. According to popular opinion, SMs offer methods for comparing objects such that the degree of similarity can be expressed numerically. If two items are exactly the same, they have a similarity score of 1, while unrelated objects get a similarity value of 0. Other scores of similarity vary from 0 to 1. Similarities are hence values in the range [0, 1], which is essentially appropriate for the use of fuzzy set theory approaches. The suggested approach is then presented. For any two numbers $x, y \in[0,1]$, the similarity between them is defined as: $s(x, y)=1-|x-y|$ and the generalized similarity between them is defined by $\tilde{s}(x, y)=\sqrt[p]{1-\left|x^p-y^p\right|}$, where, p≥1 is a parameter.
3. Methodology
To define a new similarity based classifier, first we propose fuzzy Archimedean-Dombi operations.
Definition 9: Consider the FSs $\zeta_j=<\mu_j>(j=1,2)$. Assume that
$\Phi_\alpha^\theta(p)=\left(\frac{1-\theta^{-1}(p)}{\theta^{-1}(p)}\right)^\alpha$
where $p \in[0,1]$ and $\alpha \geq 1$. Then, the Archimedean-Dombi (AD) operations on FSs are given below:
(i) $\zeta_1 \otimes_{A D} \zeta_2=\left\langle\theta\left(\left(1+\left\{\Phi_\alpha^\theta\left(\mu_1\right)+\Phi_\alpha^\theta\left(\mu_2\right)\right\}^{\frac{1}{\alpha}}\right)^{-1}\right)\right\rangle$,
(ii) $\xi \circ_{A D} \zeta_1=\left\langle\theta\left(\left(1+\left\{\xi \Phi_\alpha^\theta\left(\mu_1\right)\right\}^{\frac{1}{\alpha}}\right)^{-1}\right]\right\rangle(\xi>0)$.
Theorem 1: Consider the FSs $\zeta_i=<\mu_j>(j=1,2)$ and $\lambda, \lambda_1, \lambda_2>0$. Then we have:
(i) $\zeta_1 \otimes_{A D} \zeta_2=\zeta_2 \otimes_{A D} \zeta_1$,
(ii) $\lambda \circ_{A D}\left(\zeta_1 \otimes_{A D} \zeta_2\right)=\left(\lambda \circ_{A D} \zeta_1\right) \otimes_{A D}\left(\lambda \circ_{A D} \zeta_2\right)$,
(iii) $\left(\lambda_1+\lambda_2\right) \circ_{A D} \zeta_1=\left(\lambda_1 \circ_{A D} \zeta_1\right) \otimes_{A D}\left(\lambda_2 \circ_{A D} \zeta_1\right)$.
Proof: Follows from Definition 9.
Definition 10: Suppose $\zeta_j=<\mu_j>(j=1,2, \ldots, n)$ is a set of FSs. Then we define the fuzzy Archimedean-Dombi geometric (FADG) operator as: $F A D G\left(\zeta_1, \zeta_2, \ldots, \zeta_n\right)=\bigotimes_{A D}^{\substack{j=1 \\ n}} \zeta_j$.
Theorem 2: The aggregated value $F A D G\left(\zeta_1, \zeta_2, \ldots, \zeta_n\right)$ is also an FS. In addition, we get:
Some important properties of the FFADWG operator are given below.
Theorem 3 (Shift invariance): If $\zeta_0\left(\neq \zeta_j\right)$ is an FS, then $\left.F A D G\left(\zeta_1, \zeta_2, \ldots, \zeta_n\right) \zeta_0 \otimes_{A D} \zeta_2, \ldots, \zeta_0 \otimes_{A D} \zeta_n\right)=\zeta_0 \otimes_{A D} F A D G\left(\zeta_1, \zeta_2, \ldots, \zeta_n\right)$.
Theorem 4 (Idempotency): If $\zeta_0\left(\neq \zeta_j\right)$ is an FS, then $F A D G\left(\zeta_1, \zeta_2, \ldots, \zeta_n\right)=\zeta_0$.
Theorem 5 (Boundedness): For a collection of FSs $\zeta_j=\left\langle\mu_j\right\rangle$, we have $\zeta^{-}<F A D G\left(\zeta_1, \zeta_2, \ldots, \zeta_n\right)<\zeta^{+}$, where $\zeta^{-}=\left\langle\min _j \mu_j\right\rangle$ and $\zeta^{+}=\left\langle\max _j \mu_j\right\rangle$.
Theorem 6 (Monotonicity): If $\zeta_j=<\mu_j>$ and $\left.\zeta_j^{\prime}=<\mu_j^{\prime}\right\rangle$ are two collections of FFNs satisfying $\mu_j \leq \mu_j^{\prime}$, $\forall j$, then $F A D G\left(\zeta_1, \zeta_2, \ldots, \zeta_n\right) \prec F A D G\left(\zeta_1^{\prime}, \zeta_2^{\prime}, \ldots, \zeta_n^{\prime}\right)$.
The challenge in classification tasks is to determine which class a test sample belongs to. Class labels may be known in some data sets, which mean that sorting of adding fresh samples to current classes is the only remaining challenge. If not, training and testing components of the dataset are separated. The machine is trained on this portion of the training set, which contains class labels, and parameter settings are recorded. The testing portion is used for categorization after being treated as new data. We'll outline the updated categorization process and go over how to use Archimedean-Dombi operator.
Suppose X is a sample and we assign numerical values to express it’s features. Since we are interested in fuzzy values, so all the given data are transformed to values lying in [0, 1].
Step 1: Divide the test items that are to be classified into R classes (T1, T2, T3,..., TR) in the training set.
Step 2: The determination of a mean (ideal) vector that accurately represents each class, say using the generalized mean, is made.
Step 3: If $u_i=\left(u_i\left(g_1\right), u_i\left(g_2\right), \ldots, u_i\left(g_n\right)\right)$ is the ideal vector for class $\mathrm{S} i$ where $u_i\left(g_i\right)$ is the value under $g_j$ in $\mathrm{T}_i$, then the similarity between a new class $x=\left(x\left(g_1\right), x\left(g_2\right), \ldots, x\left(g_n\right)\right)$ and each of the ideal vectors can be calculated as:
here, p is the SM’s parameter. It is possible to use additional techniques (operators) to determine whether the new item and the ideal vectors are similar. The procedure moves on to the aggregation of the similarity vectors $s_j=\sqrt[p]{1-\left|x\left(g_j\right)^p-u\left(g_j\right)^p\right|}$ and the new object to be categorized across all features.
Step 4: Aggregation of similarities $S_1, S_2, \ldots, S_n$ is carried out using various variants of the Dombi operators previously described. Using the Archimedean-Dombi operator, for instance (signify this with the letter AD), we have:
here, the Archimedean-Dombi operator's parameter is $\alpha>0$. x becomes a member of a class it assumes highest $S_{\text {total }}$, which is calculated for each class.
4. Case Study and It’s Solution
“Here, we selected the Parkinson's disease data set (Source: Max Little of the University of Oxford produced the dataset in conjunction with the National Centre for Voice and Speech, Denver, Colorado, who captured the speech signals. In the initial investigation, feature extraction techniques for common voice abnormalities were reported.
This dataset includes various biological voice measurements taken from 31 individuals, 23 of whom have Parkinson's disease (PD). Each column in the table corresponds to one of the 195 voice recordings from these people, and each column in the table represents a specific voice measure ("name" column). According to the "status" column, which is set to 0 for healthy and 1 for PD, the main goal of the data is to distinguish between healthy individuals and those with PD. The information is in CSV ASCII format. One instance per voice recording is present in each row of the CSV file. Each patient has about six recordings, and the first column lists the patient's name.
Using this dataset (Table 1 and Table 2), our aim is to investigate whether a patient has Parkinson disease or not. The attributes are:
C1: MDVP Fo(Hz),
C2: MDVP Fhi(Hz),
C3: MDVP Flo(Hz) – Minimum vocal fundamental frequency
C4: MDVP Jitter(%),
C5: MDVP Jitter(Abs),
C6: MDVP RAP,
C7: MDVP PPQ,
C8: Jitter DDP,
C9: MDVP Shimmer,
C10: MDVP Shimmer (Db),
C11: Shimmer APQ3,
C12: Shimmer APQ5,
C13: MDVP APQ,
C14: Shimmer DDA,
C15: NHR,
C16: HNR,
C17: RPDE,
C18: D2,
C19: DFA,
C20: spread1,
C21: spread2,
C22: PPE.
Two classes are there: status – Health status of the subject (one) – Parkinson’s, (zero) – healthy
For experimentation, the dataset we considered is divided into two equal portions (training and testing). Here, we denote the samples by Ai and criteria by Cj.
For fuzzification we use the following formula:
$\mu_j=\frac{a_{i j}}{\max _i a_{i j}}\left(\right.$ if all $\left.a_{i j} \geq 0\right)$ and $\mu_j=\frac{a_{i j}}{\min _i a_{i j}}$ (if all $\left.a_{i j} \leq 0\right)$
C1 | C2 | C3 | C4 | C5 | C6 | C7 | C8 | C9 | C10 | C11 | Class |
0.7252 | 0.4518 | 0.6207 | 0.1619 | 0.1464 | 0.1351 | 0.1301 | 0.1351 | 0.1496 | 0.1285 | 0.1727 | 0 |
C12 | C13 | C14 | C15 | C16 | C17 | C18 | C19 | C20 | C21 | C22 | Class |
0.1318 | 0.0977 | 0.1727 | 0.1113 | 0.7465 | 0.6611 | 0.8521 | 0.8347 | 0.4021 | 0.6491 | 0.2644 | 0 |
C1 | C2 | C3 | C4 | C5 | C6 | C7 | C8 | C9 | C10 | C11 | Class |
0.61 | 0.3673 | 0.4518 | 0.2857 | 0.2771 | 0.2703 | 0.2591 | 0.2703 | 0.3453 | 0.3094 | 0.3826 | 1 |
C12 | C13 | C14 | C15 | C16 | C17 | C18 | C19 | C20 | C21 | C22 | Class |
0.3188 | 0.2604 | 0.3825 | 0.2025 | 0.6324 | 0.7743 | 0.8834 | 0.6829 | 0.5843 | 0.7302 | 0.4706 | 1 |
Similarity with “0” class | Similarity with “1” class | |
A1 | 0.144145113 | 0.159773997 |
A2 | 0.575021359 | 0.286578035 |
A3 | 0.552410273 | 0.334837863 |
A4 | 0.509532218 | 0.361486049 |
A5 | 0.544625585 | 0.379381121 |
A6 | 0.351243228 | 0.251780189 |
A7 | 0.520272876 | 0.336881897 |
A8 | 0.276654887 | 0.234348072 |
A9 | 0.152640153 | 0.46719318 |
A10 | 0.161441767 | 0.351336005 |
A11 | 0.337241876 | 0.374562171 |
A12 | 0.290716689 | 0.457036039 |
A13 | 0.032975053 | 0.133548383 |
A14 | 0.552253085 | 0.173997534 |
A15 | 0.517859993 | 0.292758668 |
A16 | 0.185212864 | 0.408563532 |
A17 | 0.370632688 | 0.34024666 |
A18 | 0.508360325 | 0.372319016 |
A19 | 0.597596017 | 0.309236332 |
A20 | 0.593659654 | 0.205367677 |
A21 | 0.160755353 | 0.303007832 |
A22 | 0.212114739 | 0.486641385 |
A23 | 0.563133747 | 0.3139681 |
A24 | 0.207011957 | 0.225787188 |
A25 | 0.375734986 | 0.379791467 |
A26 | 0.231467852 | 0.374775554 |
A27 | 3.14698E-07 | 6.32418E-05 |
A28 | 0.328161506 | 0.313001084 |
A29 | 0.019098729 | 0.068894464 |
A30 | 0.053334993 | 0.165577266 |
A31 | 0.171163793 | 0.373513924 |
A32 | 0.112200466 | 0.342386877 |
A33 | 0.061922531 | 0.376313985 |
A34 | 0.152151945 | 0.364173224 |
A35 | 0.30395204 | 0.470546678 |
A36 | 0.183708592 | 0.492125457 |
A37 | 0.223482397 | 0.398898011 |
A38 | 0.587741358 | 0.249142602 |
A39 | 0.215868202 | 0.369559093 |
A40 | 0.034110559 | 0.117079969 |
A41 | 0.145205121 | 0.389523549 |
A42 | 0.008394177 | 0.041071912 |
A43 | 0.480545492 | 0.375993986 |
A44 | 0.189886774 | 0.268324204 |
A45 | 0.061371942 | 0.210385898 |
A46 | 0.111202655 | 0.295579394 |
A47 | 0.324865904 | 0.340659101 |
A48 | 0.53093962 | 0.225796433 |
A49 | 0.467740157 | 0.473721977 |
A50 | 2.0715E-09 | 2.32273E-06 |
A51 | 0.005558563 | 0.024458049 |
A52 | 0.027095978 | 0.042961774 |
A53 | 0.577666166 | 0.278465404 |
A54 | 0.153294935 | 0.367524584 |
A55 | 0.406543404 | 0.277518885 |
A56 | 0.020123409 | 0.09136754 |
A57 | 0.38071519 | 0.388569166 |
A58 | 0.364022814 | 0.411819054 |
A59 | 0.541349604 | 0.286716465 |
A60 | 0.554301532 | 0.250449533 |
A61 | 0.586407246 | 0.290567186 |
A62 | 0.069131985 | 0.194848487 |
A63 | 0.207103802 | 0.398472474 |
A64 | 0.572431905 | 0.380261651 |
A65 | 0.639265443 | 0.299479348 |
A66 | 0.320370902 | 0.540050009 |
A67 | 0.288204236 | 0.420589354 |
A68 | 0.470748896 | 0.322116455 |
A69 | 0.454133344 | 0.390411042 |
A70 | 0.033432395 | 0.128777762 |
A71 | 0.234026289 | 0.363599557 |
A72 | 0.092666407 | 0.358441774 |
A73 | 0.142770834 | 0.175032274 |
The mean accuracy obtained by our method is 0.71. On the other hand, the mean accuracy obtained by using Dombi disjunctive operator [35], and Dombi product operator [35] are respectively is 0.48 and 0.25. Hence, our model is more accurate.
5. Conclusions
We've introduced a brand-new classification technique that aggregates data using Archimedean-Dombi operator. In the past, Dombi operators were used in conjunction with other operators to model decision-making issues. To the best of our knowledge, this is the first instance in which Archimedean-Dombi operator has been used to categorize medical datasets. Although the Archimedean-Dombi operator has a number of particular cases (algebraic, Einstein, and Hamachar operators), we have only used the simplest one in this work. A different version of the similarity classifier is offered by each operator. On a real-world medical datasets, the performance of the proposed classifier is compared with some existing classifiers. The Dombi classifiers only managed 48% (disjunctive form), and 25% (product case) while the overall mean classification accuracy with the Parkinson illness dataset, compared to the new classifier's 71%. Keep in mind that any advancement in medicine, no matter how little, should be lauded.
Conceptualization, A. Saha; methodology, A. Saha; software, J. Reddy; validation, J. Reddy and R. Kumar; formal analysis, A. Saha; investigation, A. Saha; data curation, J. Reddy and R. Kumar; writing—original draft preparation, J. Reddy and R. Kumar; writing—review and editing, A. Saha; supervision, A. Saha; project administration, A. Saha. All authors have read and agreed to the published version of the manuscript.”
The data used to support the findings of this study are available from the corresponding author upon request.
The authors declare no conflict of interest.