An alternative for data visualization using space-filling curve

Owczarek, Valentin; Franco, Patrick; Mullot, Rémy

doi:10.1007/s10618-023-00943-7

An alternative for data visualization using space-filling curve

Published: 07 August 2023

Volume 37, pages 2281–2305, (2023)
Cite this article

Data Mining and Knowledge Discovery Aims and scope Submit manuscript

Valentin Owczarek¹,
Patrick Franco¹ &
Rémy Mullot¹

287 Accesses
1 Altmetric
Explore all metrics

Abstract

Dimensionality reduction helps data analysts and machine learning designers to visualize in low dimension structures lying in high dimension. This is a basic but crucial operation, to discover relationship between variables, considering the difficulties to tweek machine learning algorithm. The data have not to be consider as a black-box but can be visualized, leading to better decision making. Inspired from previous works, this article proposes to create a dimensionality reduction method based on space-filling curves (SFCs). Of course, the Hilbert curve was considered (guided by reflected binary gray code pattern) but also alternative high locality SFCs, recently identified. Mapping algorithms working with alternative curves are provided, and illustrated through a numerical example. Mapping a D-dimensional point to a 1D index is usual but developing an algorithm for reverse mapping, i.e. from 1D index to 2D or 3D point is more original and can allow the visualization of data. The work position is specified and justifications are given. A discussion on the choice of parameters (order of curves n and \(n'\) ) is led in order to guide the user to select good parameters (to define a bijection between original data space and projected space). Experiments are conducted to compare our proposition to state of the art approaches (PCA, MDS, t-SNE, UMAP) over seven dataset involving from 3D to 16D and covering diverse topologies. The results show interesting ability on data visualization. Compare to standard techniques, the time computing is low, which is an interesting property in regards to the amount of data today created.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

¥17,985 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (Japan)

Instant access to the full article PDF.

Institutional subscriptions

Space-Filling Curve: A Robust Data Mining Tool

Seeing is Learning in High Dimensions: The Synergy Between Dimensionality Reduction and Machine Learning

Article Open access 21 February 2024

Data visualization through non linear dimensionality reduction using feature based Ricci flow embedding

Article 03 January 2022

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Notes

Inevitably, due to the reduction of dimensionality SFC induce topological breaks, which are taken into account in the estimation of locality preservation (via the Faloutsos criteria).
On a I7-7820HQ Intel^® CPU with GCC 6.3.0 and Python 3.5.3.

References

Artac M, Jogan M, Leonardis A (2002) Incremental PCA or on-line visual learning and recognition. In: Proceedings of the 16th International Conference on Pattern Recognition (ICPR’02), vol 3. IEEE Computer Society, Washington, DC, p 30781
Buja A, Swayne DF, Littman ML, Dean N, Hofmann H, Chen L (2008) Data visualization with multidimensional scaling. J Comput Graph Stat 17(2):444–472
Article MathSciNet Google Scholar
Castro J, Burns S (2007) Online data visualization of multidimensional databases using the Hilbert Space–Filling Curve. In: Pixelization paradigm, Springer, Berlin Heidelberg, pp 92–109
Cox MAA, Cox TF (2008) Multidimensional scaling. Springer, Heidelberg, pp 315–347
Djouzi K, Beghdad-Bey K (2019) A review of clustering algorithms for big data. In: 2019 International Conference on Networking and Advanced Systems (ICNAS), pp 1–6
Faloutsos C, Roseman S (1989) Fractals for secondary key retrieval. In: Proceedings of the Eighth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, PODS ’89, ACM, pp 247–252
Franco P, Nguyen G, Mullot R, Ogier J-M (2018) Alternative patterns of the multidimensional Hilbert curve. Multimed Tools Appl 77(7):8419–8440
Article Google Scholar
Genender-Feltheimer A (2018) Visualizing high dimensional and big data. Procedia Comput Sci 140:112–121
Article Google Scholar
Haverkort H, van Walderveen F (2010) Locality and bounding-box quality of two-dimensional space-filling curves. Comput Geom 43(2):131–147
Article MathSciNet MATH Google Scholar
Hilbert D (1891) Ueber die stetige Abbildung einer Line auf ein Flächenstück. Math Ann 38(3):459–460
Article MathSciNet MATH Google Scholar
Konig A (2000) Interactive visualization and analysis of hierarchical neural projections for data mining. IEEE Trans Neural Netw 11(3):615–624
Article Google Scholar
Kotsiantis SB, Zaharakis I, Pintelas P (2007) Supervised machine learning: a review of classification techniques. Emerg Artif Intell Appl Comput Eng 160:3–24
Google Scholar
Lawder J, King P (2001) Using state diagrams for Hilbert curve mappings. Int J Comput Math 78(3):327–342
Article MathSciNet MATH Google Scholar
Lee JA, Verleysen M (2007) Nonlinear dimensionality reduction. Springer, New York
McInnes L, Healy J, Melville J (2020) Umap: Uniform manifold approximation and projection for dimension reduction
McInnes L, Healy J, Saul N, Grossberger L (2018) Umap: uniform manifold approximation and projection. J Open Source Softw 3(29):861
Article Google Scholar
Moon B, Jagadish HV, Faloutsos C, Saltz JH (2001) Analysis of the clustering properties of the Hilbert space-filling curve. IEEE Trans Knowl Data Eng 13(1):124–141
Article Google Scholar
Nguyen G (2013) Courbes remplissant l’espace et leur application en traitement d’images. PhD thesis, Université de La Rochelle
Owczarek V, Franco P, Mullot R (2019) A genetic algorithm to solve a space-filling curve problem. In: SLS2019: International Workshop on Stochastic Local Search Algorithms, pp 5–6
Owczarek V, Franco P, Mullot R (2020) Space-filling curve: a robust data mining tool. In: Arai K, Bhatia R, Kapoor S (eds) Proceedings of the Future Technologies Conference (FTC) 2019, Springer, Cham, pp 663–675
Peano G (1890) Sur une courbe, qui remplit toute une aire plane. In: Mathematische Annalen, vol 36. Springer, Vienna, pp 157–160
Pearson K (1901) On lines and planes of closest fit to systems of points in space. Philos Mag Ser 2(11):559–572
Article MATH Google Scholar
Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, Vanderplas J, Passos A, Cournapeau D, Brucher M, Perrot M, Duchesnay E (2011) Scikit-learn: machine learning in Python. J Mach Learn Res 12:2825–2830
MathSciNet MATH Google Scholar
Sammon JW (1969) A nonlinear mapping for data structure analysis. IEEE Trans Comput 18(5):401–409
Article Google Scholar
Singh K, Upadhyaya S (2012) Outlier detection: applications and techniques. Int J Comput Sci Issues 9(1):307–323
Google Scholar
van der Maaten L (2014) Accelerating t-sne using tree-based algorithms. J Mach Learn Res 15:3221–3245
MathSciNet MATH Google Scholar
van der Maaten L, Hinton G (2008) Visualizing data using t-SNE. J Mach Learn Res 9:2579–2605
MATH Google Scholar
Van Der Maaten L, Postma E, Van den Herik J (2009) Dimensionality reduction: a comparative review. J Mach Learn Res 10:66–71
Google Scholar
Vanschoren J, van Rijn JN, Bischl B, Torgo L (2014) Openml: networked science in machine learning. SIGKDD Explor Newsl 15(2):49–60
Article Google Scholar
Wang Y, Feng K, Chu X, Zhang J, Fu C, Sedlmair M, Yu X, Chen B (2018) A perception-driven approach to supervised dimensionality reduction for visualization. IEEE Trans Vis Comput Graph 24(5):1828–1840
Article Google Scholar
Wold S, Esbensen K, Geladi P (1987) Principal component analysis. Chemom Intell Lab Syst 2(1–3):37–52
Article Google Scholar

Download references

Author information

Authors and Affiliations

Laboratoire L3i, EA 2118, University of La Rochelle, Avenue Michel Crépeau, 17042, La Rochelle Cedex 1, France
Valentin Owczarek, Patrick Franco & Rémy Mullot

Authors

Valentin Owczarek
View author publications
You can also search for this author in PubMed Google Scholar
Patrick Franco
View author publications
You can also search for this author in PubMed Google Scholar
Rémy Mullot
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Valentin Owczarek.

Additional information

Responsible editor: Johannes Fürnkranz and Fei Wang.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

For presentation purpose, a focus is made on two dataset (Blob and Tic-Tac-Toe) in Sect. 5. Additional results are presented in this appendix. Tables 7 and 8 regroups all results for Sammon Stress, Topology Preservation and Execution time. Moreover Fig. 5 is a comparative plot between PCA and SFC on the Sphere Dataset and Fig. 6 is between t-SNE and SFC on the Pen digits dataset.

Table 7 Results for the six datasets on two different measures

Full size table

Table 8 Results for the six datasets on two different measures

Full size table

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Owczarek, V., Franco, P. & Mullot, R. An alternative for data visualization using space-filling curve. Data Min Knowl Disc 37, 2281–2305 (2023). https://doi.org/10.1007/s10618-023-00943-7

Download citation

Received: 30 January 2020
Accepted: 05 September 2022
Published: 07 August 2023
Issue Date: November 2023
DOI: https://doi.org/10.1007/s10618-023-00943-7

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

¥17,985 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (Japan)

Instant access to the full article PDF.

Institutional subscriptions

An alternative for data visualization using space-filling curve

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Space-Filling Curve: A Robust Data Mining Tool

Seeing is Learning in High Dimensions: The Synergy Between Dimensionality Reduction and Machine Learning

Data visualization through non linear dimensionality reduction using feature based Ricci flow embedding

Explore related subjects

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendix

Appendix

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now