Abstract
Dimensionality reduction helps data analysts and machine learning designers to visualize in low dimension structures lying in high dimension. This is a basic but crucial operation, to discover relationship between variables, considering the difficulties to tweek machine learning algorithm. The data have not to be consider as a black-box but can be visualized, leading to better decision making. Inspired from previous works, this article proposes to create a dimensionality reduction method based on space-filling curves (SFCs). Of course, the Hilbert curve was considered (guided by reflected binary gray code pattern) but also alternative high locality SFCs, recently identified. Mapping algorithms working with alternative curves are provided, and illustrated through a numerical example. Mapping a D-dimensional point to a 1D index is usual but developing an algorithm for reverse mapping, i.e. from 1D index to 2D or 3D point is more original and can allow the visualization of data. The work position is specified and justifications are given. A discussion on the choice of parameters (order of curves n and \(n'\) ) is led in order to guide the user to select good parameters (to define a bijection between original data space and projected space). Experiments are conducted to compare our proposition to state of the art approaches (PCA, MDS, t-SNE, UMAP) over seven dataset involving from 3D to 16D and covering diverse topologies. The results show interesting ability on data visualization. Compare to standard techniques, the time computing is low, which is an interesting property in regards to the amount of data today created.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Notes
Inevitably, due to the reduction of dimensionality SFC induce topological breaks, which are taken into account in the estimation of locality preservation (via the Faloutsos criteria).
On a I7-7820HQ Intel® CPU with GCC 6.3.0 and Python 3.5.3.
References
Artac M, Jogan M, Leonardis A (2002) Incremental PCA or on-line visual learning and recognition. In: Proceedings of the 16th International Conference on Pattern Recognition (ICPR’02), vol 3. IEEE Computer Society, Washington, DC, p 30781
Buja A, Swayne DF, Littman ML, Dean N, Hofmann H, Chen L (2008) Data visualization with multidimensional scaling. J Comput Graph Stat 17(2):444–472
Castro J, Burns S (2007) Online data visualization of multidimensional databases using the Hilbert Space–Filling Curve. In: Pixelization paradigm, Springer, Berlin Heidelberg, pp 92–109
Cox MAA, Cox TF (2008) Multidimensional scaling. Springer, Heidelberg, pp 315–347
Djouzi K, Beghdad-Bey K (2019) A review of clustering algorithms for big data. In: 2019 International Conference on Networking and Advanced Systems (ICNAS), pp 1–6
Faloutsos C, Roseman S (1989) Fractals for secondary key retrieval. In: Proceedings of the Eighth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, PODS ’89, ACM, pp 247–252
Franco P, Nguyen G, Mullot R, Ogier J-M (2018) Alternative patterns of the multidimensional Hilbert curve. Multimed Tools Appl 77(7):8419–8440
Genender-Feltheimer A (2018) Visualizing high dimensional and big data. Procedia Comput Sci 140:112–121
Haverkort H, van Walderveen F (2010) Locality and bounding-box quality of two-dimensional space-filling curves. Comput Geom 43(2):131–147
Hilbert D (1891) Ueber die stetige Abbildung einer Line auf ein Flächenstück. Math Ann 38(3):459–460
Konig A (2000) Interactive visualization and analysis of hierarchical neural projections for data mining. IEEE Trans Neural Netw 11(3):615–624
Kotsiantis SB, Zaharakis I, Pintelas P (2007) Supervised machine learning: a review of classification techniques. Emerg Artif Intell Appl Comput Eng 160:3–24
Lawder J, King P (2001) Using state diagrams for Hilbert curve mappings. Int J Comput Math 78(3):327–342
Lee JA, Verleysen M (2007) Nonlinear dimensionality reduction. Springer, New York
McInnes L, Healy J, Melville J (2020) Umap: Uniform manifold approximation and projection for dimension reduction
McInnes L, Healy J, Saul N, Grossberger L (2018) Umap: uniform manifold approximation and projection. J Open Source Softw 3(29):861
Moon B, Jagadish HV, Faloutsos C, Saltz JH (2001) Analysis of the clustering properties of the Hilbert space-filling curve. IEEE Trans Knowl Data Eng 13(1):124–141
Nguyen G (2013) Courbes remplissant l’espace et leur application en traitement d’images. PhD thesis, Université de La Rochelle
Owczarek V, Franco P, Mullot R (2019) A genetic algorithm to solve a space-filling curve problem. In: SLS2019: International Workshop on Stochastic Local Search Algorithms, pp 5–6
Owczarek V, Franco P, Mullot R (2020) Space-filling curve: a robust data mining tool. In: Arai K, Bhatia R, Kapoor S (eds) Proceedings of the Future Technologies Conference (FTC) 2019, Springer, Cham, pp 663–675
Peano G (1890) Sur une courbe, qui remplit toute une aire plane. In: Mathematische Annalen, vol 36. Springer, Vienna, pp 157–160
Pearson K (1901) On lines and planes of closest fit to systems of points in space. Philos Mag Ser 2(11):559–572
Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, Vanderplas J, Passos A, Cournapeau D, Brucher M, Perrot M, Duchesnay E (2011) Scikit-learn: machine learning in Python. J Mach Learn Res 12:2825–2830
Sammon JW (1969) A nonlinear mapping for data structure analysis. IEEE Trans Comput 18(5):401–409
Singh K, Upadhyaya S (2012) Outlier detection: applications and techniques. Int J Comput Sci Issues 9(1):307–323
van der Maaten L (2014) Accelerating t-sne using tree-based algorithms. J Mach Learn Res 15:3221–3245
van der Maaten L, Hinton G (2008) Visualizing data using t-SNE. J Mach Learn Res 9:2579–2605
Van Der Maaten L, Postma E, Van den Herik J (2009) Dimensionality reduction: a comparative review. J Mach Learn Res 10:66–71
Vanschoren J, van Rijn JN, Bischl B, Torgo L (2014) Openml: networked science in machine learning. SIGKDD Explor Newsl 15(2):49–60
Wang Y, Feng K, Chu X, Zhang J, Fu C, Sedlmair M, Yu X, Chen B (2018) A perception-driven approach to supervised dimensionality reduction for visualization. IEEE Trans Vis Comput Graph 24(5):1828–1840
Wold S, Esbensen K, Geladi P (1987) Principal component analysis. Chemom Intell Lab Syst 2(1–3):37–52
Author information
Authors and Affiliations
Corresponding author
Additional information
Responsible editor: Johannes Fürnkranz and Fei Wang.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix
Appendix
For presentation purpose, a focus is made on two dataset (Blob and Tic-Tac-Toe) in Sect. 5. Additional results are presented in this appendix. Tables 7 and 8 regroups all results for Sammon Stress, Topology Preservation and Execution time. Moreover Fig. 5 is a comparative plot between PCA and SFC on the Sphere Dataset and Fig. 6 is between t-SNE and SFC on the Pen digits dataset.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Owczarek, V., Franco, P. & Mullot, R. An alternative for data visualization using space-filling curve. Data Min Knowl Disc 37, 2281–2305 (2023). https://doi.org/10.1007/s10618-023-00943-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10618-023-00943-7