Abstract
This paper aims to present a way of multidimensional data-mining termed correlation–comparison analysis (CCA). It was applied to neural data to demonstrate its utility in neuron-classification problem. The CCA represents a semi-quantitative way of inter-sample comparisons. The methodology comprises the generation of inter-parametric correlation and alpha-error (p value) matrices. The main step is p-comparison for the same parametric pair defined between the two samples. This comparison has a semi-quantitative binary character that does not involve issues, such as false discovery rate (FDR) in multiple comparisons. As a result, the outcomes obtained are: (1) a correlation match, (2) a correlation mismatch of the first kind, the main type of a correlation mismatch, (3) a correlation mismatch of the second kind, the strongest one but very rarely observed in biological systems and obtained on a very small number of parameters. The correlation mismatch of the first kind is the target mismatch, i.e., the mismatch of tracing interest and represents the very reason why the study itself is performed. The application of the CCA led to the effective neuromorphofunctional classification of caudate interneurons into appropriate clusters and their feature-based description. The CCA analysis is a multidimensional bi-sampled classification tool that can be very useful for similar samples to explain their differences.
Graphical Abstract
Correlation–comparison analysis is based on the following transcorrelative relations between correlations belonging two similar samples: (1) the particular inter-parameter correlations are matched between the two samples in terms of their statistical insignificance in both of the samples, or in terms of accomplished statistical significance in both of the samples and both of the correlations are characterized with the same direction, (2) correlations are mismatched in terms of accomplished statistical significance in one of the samples but not in the other one (correlation mismatch of the first kind, the main type of correlation mismatch), (3) correlations are mismatched in terms of accomplished statistical significance in both of the samples but the correlations are characterized with the opposite direction (correlation mismatch of the second kind, the strongest one but very rarely observed in biological systems and obtained for a very few number of parameters).














Similar content being viewed by others
Abbreviations
- SICC:
-
Self-implicative correlation cluster
- SCCA:
-
Single-correlation comparison analysis
- PP:
-
Parametric pair for which the existence of a correlation is observed
- NGC:
-
Network grid-connection plot, NGC2D or NGC3D
- MEP:
-
Matrix elementwise product, elementwise product between correlation coefficients, C-matrices
- ISR:
-
Inter-sample relation
- FDR:
-
False discovery rate
- ISD:
-
Inter-sample difference, difference between observed samples
- ISC:
-
Analytical inter-sample comparison in searching for correlative differences between them
- IPR:
-
Inter-parametric relation
- IPC:
-
Inter-parametric correlation
- ICR:
-
Mutual (inter-)correlative relation
- ICN:
-
(Inter-)correlative network
- ICD:
-
Mutual inter-correlative determination
- ICC:
-
(Inter-)correlative circle
- GM plot:
-
Grid matrix plot/GM2D
- DCA:
-
Differential analysis a correlative sample structure
- CM1:
-
Correlation mismatch of the first kind
- CM2:
-
Correlation mismatch of the second kind
- CMI:
-
Correlative mismatch index, a general quantitative SCCA parameter
- CP:
-
Correlation pair, PP with realized correlation
- CFA:
-
Correlative factor analysis
- CCA:
-
Correlation–comparison analysis
- CCC:
-
Chain of correlative circles
- CC:
-
Correlative chain or correlation in general between two end-point parameters in the chain
- RC:
-
Radius of a circle with the maximal number of intersections with dendrites (NMAX circle); measures a distance of the maximal dendritic complexity from the neurosoma representing this way the position of the maximal dendritic arborization density
- RCst :
-
Standardized RC parameter with respect to the AS parameter
- RT:
-
Radius of a circle with the maximal number of terminal dendrites
- NTD:
-
Number of the third-order dendrites, i.e., dendrites originating directly from the second order dendrites
- NTHOD:
-
Total number of dendrites above the second order
- DAX:
-
Neuron axialization index, a PCA parameter, quantifies how much a neuron is elongated in a 2D space, far mostly on account of the dendritic tree elongation
- NSD:
-
Number of the second order dendrites, i.e., dendrites originating directly from the first order dendrites
- NR:
-
Total number of dendrites above the first order
- NPD:
-
Number of the first order (primary) dendrites, i.e., dendrites originating directly from a neurosoma
- NMIN:
-
Minimal number of intersections with dendrites that a theoretical circle makes centered in the geometrical center of the neurosoma represents one of the neuron complexity parameters it can be thought of as the minimal complexity of a particular neuron
- NMAXst :
-
Standardized NMAX parameter with respect to the NMIN parameter
- NMAX:
-
Maximal number of intersections with dendrites that a theoretical circle makes centered in the geometrical center of the neurosoma represents one of the neuron complexity parameters it can be thought of as the maximal complexity of a particular neuron
- NHOD:
-
Number of the higher order dendrites, i.e., dendrites originating directly from the third- and higher order dendrites
- NDALL:
-
Total number of all dendrites of a neuron
- MS:
-
Index of neurosoma asymmetry, as a reciprocal value of neurosoma circularity, represents the intensity of shape non-roundness and geometrical irregularity, quantifying the degree of difference from a circle in geometrical sense and from its symmetry
- MDCBO:
-
Dendritic branching polarization index, measures how much a dendritic tree is asymmetric on account of its average branching degree
- L st :
-
Standardized L parameter with respect to the AS parameter
- L :
-
Total dendritic length, represents the sum of all individual dendritic lengths
- DWDTHst:
-
Standardized DWDTH parameter with respect to the AS parameter
- DWDTH:
-
Average dendritic width of a neuron
- DSPst :
-
Standardized DSP parameter with respect to the ADF parameter
- DSP:
-
Partial dendritic surface, represents a measure of dendritic non-complex density
- DSI:
-
Dendritic surface of influence or the dendritic dispersion index of a neuron, i.e., a multidirectional dendritic radiation degree or the dendritic radiation index in all directions, represents a measure of dendritic dispersion in a 2D space obtained as the covariance of dendritic space coordinates
- DS:
-
Fractal dimension of a skeletonized neuron image; represents one of the neuron complexity parameters
- DPOL:
-
Dendritic orientation degree; expressed in angle degrees, measures a dendritic orientation in a 2D space, i.e., the orientation of (along) its dominant elongation axis
- DO:
-
Fractal dimension of a neuron outline; one of neuron shape parameters, quantifies the shape of the neuron from one aspect of its observation (outline)
- DN:
-
Fractal dimension of a neuron; one of neuron shape parameters, quantifies the shape of the neuron from one aspect of its observation (the entire image of it)
- DCLD:
-
Dendritic clustering degree (index); measures the tendency of neuron dendritic clustering
- DCBOst:
-
Standardized DCBO parameter with respect to the NMIN parameter
- DCBO:
-
Dendritic centrifugal branching order; represents the average dendritic branching order
- CDF:
-
Complexity of a dendritic field as a complexity parameter, it measures the total curvature and detailed image complexity of the dendritic arborization the most integrative of all complexity parameters
- CDF(NTHOD)st:
-
Standardized CDF parameter with respect to the NTHOD parameter
- CDF(NTD)st :
-
Standardized CDF parameter with respect to the NTD parameter
- CDF(NSD)st :
-
Standardized CDF parameter with respect to the NSD parameter
- CDF(NR)st :
-
Standardized CDF parameter with respect to the NR parameter
- CDF(NPD)st :
-
Standardized CDF parameter with respect to the NPD parameter
- CDF(NHOD)st :
-
Standardized CDF parameter with respect to the NHOD parameter
- CDF(NDALL)st :
-
Standardized CDF parameter with respect to the NDALL parameter
- CDF(DCBO)st :
-
Standardized CDF parameter with respect to the DCBO parameter
- CDF(ADF)st :
-
Standardized CDF parameter with respect to the ADF parameter
- ADF:
-
Surface area of a neuron dendritic arborization field; measures the size of the minimal region occupied by the neuron dendritic tree in a 2D space after removing the neurosoma
- BV:
-
Bursting voltage of an action potential
- BF:
-
Bursting frequency of an action potential
- AS:
-
Surface area of a neurosoma; measures the soma size of the neuron
- APNS:
-
Surface area of a neuron perineuronal space; measures the size of the total minimal region between dendrites
- ANF:
-
Surface area of a neuron field; measures the size of the minimal region occupied by the neuron in a 2D space
- AN:
-
Surface area of a neuron, i.e., the image area occupied by the entire neuron 2D binary image; measures the size of the neuron
- ADT:
-
Surface area of a dendritic tree; measures the dendritic arborization size of the neuron
- NSIN:
-
Neostriate interneurons
- PIN:
-
Putaminal interneurons
- CCL:
-
Caudate clusters of interneurons, namely, the CCL1 and CCL2, obtained by the applied clustering–classifying method
- CIN:
-
Caudate interneurons
- PCL:
-
Putaminal cluster of interneurons, obtained by the applied clustering–classifying method
References
Elidan G. Copula Bayesian networks. Adv Neural Inf Process Syst. 2010;23:559–67. https://doi.org/10.5555/2997189.252.
Bedford T, Cooke RM. Vines—a new graphical model for dependent random variables. Ann Stat. 2002;30:1031–68. https://doi.org/10.1214/aos/1031689016.
Fisher RA. Frequency distribution of the values of the correlation coefficient in samples of an indefinitely large population. Biometrika. 1915;10:507–21. https://doi.org/10.2307/2331838.
Winterbottom A. A note on the derivation of Fisher’s transformation of the correlation coefficient. Am Stat. 2012;33:142–3. https://doi.org/10.1080/00031305.1979.10482682.
Grbatinić I, Rajković N, Milosevic N. Computational RSM modelling of dentate nucleus neuron 2D image surface. Comput Methods Biomech Biomed Eng Imaging Vis. 2016;6:1–8. https://doi.org/10.1080/21681163.2016.1160798.
Grbatinić I, Milosevic N, Maric D. Translaminar neuromorphotopological clustering and classification of dentate nucleus neurons. J Integr Neurosci. 2018;17:105–24. https://doi.org/10.31083/JIN-170044.
Grbatinić I, Krstonošić B, Marić D, Purić N, Milosevic N. Computational RSM modeling of neuromorphofunctional relations of dentate nuclear neurons and dentatostriate inter-cluster mapping with the dentatostriate neural network reconstruction: RLSR/PCR Regression and Canonical Correlation Analysis. Annals of Behavioral Neuroscience, 2019, p. 168–96. https://doi.org/10.18314/abne.v2i1.1674.
Grbatinić I, Milošević N, Krstonošić B. The neuromorphological caudate-putaminal clustering of neostriate interneurons: Kohonen self-organizing maps and supervised artificial neural networks with multivariate analysis. J Theor Biol. 2018;438:96–115. https://doi.org/10.1016/j.jtbi.2017.11.013.
Grbatinić I, Krstonošić B, Marić D, Milošević N. Morphological properties of the two types of caudate interneurons: Kohonen self-organizing maps and correlation-comparison analysis. Microsc Microanal. 2018;24:684–707. https://doi.org/10.1017/s1431927618015337.
Rodgers J, Nicewander A. Thirteen ways to look at the correlation coefficient. Am Stat. 1988;42:59–66. https://doi.org/10.1080/00031305.1988.10475524.
Croxton FE, Klein S, Cowden DJ. Applied general statistics. 3rd ed. London: I Pitnam and Sons; 1968. https://doi.org/10.2307/2284056.
Grbatinić I, Milosevic N. Incipient UV-induced structural changes in neutrophil granulocytes: morphometric and texture analysis of two-dimensional digital images. Microsc Microanal. 2016;1:1–7. https://doi.org/10.1017/S1431927616000532.
Grbatinić I, Milošević N. Classification of adult human dentate nucleus border neurons: Artificial neural networks and multidimensional approach. J Theor Biol. 2016;404:273–84. https://doi.org/10.1016/j.jtbi.2016.06.011.
Riley KF, Hobson MP, Bence, SJ. Mathematical methods for physics and engineering. Cambridge University Press; 2011. http://www.cambridge.org/9780521192736.
Daniel WW, Cross CL. Biostatistics: a foundation for analysis in the health sciences, 11th edition. New York: Wiley; 2018. https://www.wiley.com/en-us/Biostatistics%3A+A+Foundation+for+Analysis+in+the+Health+Sciences%2C+11th+Edition-p-9781119496571.
Wang X, Xu JZ, Conrey A, Mendelsohn L, Shriner D, Pirooznia M, et al. Whole genome sequence-based haplotypes reveal a single origin of the 1393 bp HBB deletion. J Med Genet. 2020;57:567–70. https://doi.org/10.1136/jmedgenet-2019-106698.
Wang K, Yan Z, Zhang S, Bartholdy B, Eaves CJ, Bouhassira EE. Clonal origin in normal adults of all blood lineages and circulating hematopoietic stem cells. Exp Hematol. 2020;83:25–34. https://doi.org/10.1016/j.exphem.2020.01.005.
Lee KH, Yoo JR, Kim YR, Heo ST. Phylogenetic analysis for the origin of typhoid fever outbreak on Jeju Island, Korea, in 2017. Infect Chemotherapy. 2020;52:421–6. https://doi.org/10.3947/ic.2020.52.3.421.
Midro AT, Stasiewicz-Jarocka B, Borys J, Hubert E, Skotnicka B, Hassmann-Poznańska E, et al. Two unrelated families with variable expression of Fraser syndrome due to the same pathogenic variant in the FRAS1 gene. Am J Med Genet A. 2020;182:773–9. https://doi.org/10.1002/ajmg.a.61495.
Hall MA. Correlation-based feature selection for machine learning. The University of Waikato; 2000. PhD thesis. https://www.cs.waikato.ac.nz/~mhall/thesis.pdf.
Prensa L, Giménez-Amaya JM, Parent A. Morphological features of neurons containing calcium-binding proteins in the human striatum. J Comp Neurol. 1998;390:552–63 (ISSN 0021-9967).
Kawaguchi Y. Physiological, morphological, and histochemical characterization of three classes of interneurons in rat neostriatum. J Neurosci. 1993;13:4908–23. https://doi.org/10.1523/jneurosci.13-11-04908.1993.
Acknowledgements
This study was supported by the Ministry of Education, Science and Technological Development, Republic of Serbia, project number III41031.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that there is no conflict of interests regarding the publication of this paper.
Ethical approval
The authors declare that this manuscript is submitted nowhere else except in the SN Computer Science Journal.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Grbatinić, I., Krstonošić, B., Srebro, D. et al. Correlation–Comparison Analysis as a New Way of Data-Mining: Application to Neural Data. SN COMPUT. SCI. 4, 636 (2023). https://doi.org/10.1007/s42979-023-02086-4
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s42979-023-02086-4