An alternative for data visualization using space-filling curve | Data Mining and Knowledge Discovery Skip to main content
Log in

An alternative for data visualization using space-filling curve

  • Published:
Data Mining and Knowledge Discovery Aims and scope Submit manuscript

Abstract

Dimensionality reduction helps data analysts and machine learning designers to visualize in low dimension structures lying in high dimension. This is a basic but crucial operation, to discover relationship between variables, considering the difficulties to tweek machine learning algorithm. The data have not to be consider as a black-box but can be visualized, leading to better decision making. Inspired from previous works, this article proposes to create a dimensionality reduction method based on space-filling curves (SFCs). Of course, the Hilbert curve was considered (guided by reflected binary gray code pattern) but also alternative high locality SFCs, recently identified. Mapping algorithms working with alternative curves are provided, and illustrated through a numerical example. Mapping a D-dimensional point to a 1D index is usual but developing an algorithm for reverse mapping, i.e. from 1D index to 2D or 3D point is more original and can allow the visualization of data. The work position is specified and justifications are given. A discussion on the choice of parameters (order of curves n and \(n'\) ) is led in order to guide the user to select good parameters (to define a bijection between original data space and projected space). Experiments are conducted to compare our proposition to state of the art approaches (PCA, MDS, t-SNE, UMAP) over seven dataset involving from 3D to 16D and covering diverse topologies. The results show interesting ability on data visualization. Compare to standard techniques, the time computing is low, which is an interesting property in regards to the amount of data today created.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (Japan)

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

Notes

  1. Inevitably, due to the reduction of dimensionality SFC induce topological breaks, which are taken into account in the estimation of locality preservation (via the Faloutsos criteria).

  2. On a I7-7820HQ Intel® CPU with GCC 6.3.0 and Python 3.5.3.

References

  • Artac M, Jogan M, Leonardis A (2002) Incremental PCA or on-line visual learning and recognition. In: Proceedings of the 16th International Conference on Pattern Recognition (ICPR’02), vol 3. IEEE Computer Society, Washington, DC, p 30781

  • Buja A, Swayne DF, Littman ML, Dean N, Hofmann H, Chen L (2008) Data visualization with multidimensional scaling. J Comput Graph Stat 17(2):444–472

    Article  MathSciNet  Google Scholar 

  • Castro J, Burns S (2007) Online data visualization of multidimensional databases using the Hilbert Space–Filling Curve. In: Pixelization paradigm, Springer, Berlin Heidelberg, pp 92–109

  • Cox MAA, Cox TF (2008) Multidimensional scaling. Springer, Heidelberg, pp 315–347

  • Djouzi K, Beghdad-Bey K (2019) A review of clustering algorithms for big data. In: 2019 International Conference on Networking and Advanced Systems (ICNAS), pp 1–6

  • Faloutsos C, Roseman S (1989) Fractals for secondary key retrieval. In: Proceedings of the Eighth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, PODS ’89, ACM, pp 247–252

  • Franco P, Nguyen G, Mullot R, Ogier J-M (2018) Alternative patterns of the multidimensional Hilbert curve. Multimed Tools Appl 77(7):8419–8440

    Article  Google Scholar 

  • Genender-Feltheimer A (2018) Visualizing high dimensional and big data. Procedia Comput Sci 140:112–121

    Article  Google Scholar 

  • Haverkort H, van Walderveen F (2010) Locality and bounding-box quality of two-dimensional space-filling curves. Comput Geom 43(2):131–147

    Article  MathSciNet  MATH  Google Scholar 

  • Hilbert D (1891) Ueber die stetige Abbildung einer Line auf ein Flächenstück. Math Ann 38(3):459–460

    Article  MathSciNet  MATH  Google Scholar 

  • Konig A (2000) Interactive visualization and analysis of hierarchical neural projections for data mining. IEEE Trans Neural Netw 11(3):615–624

    Article  Google Scholar 

  • Kotsiantis SB, Zaharakis I, Pintelas P (2007) Supervised machine learning: a review of classification techniques. Emerg Artif Intell Appl Comput Eng 160:3–24

    Google Scholar 

  • Lawder J, King P (2001) Using state diagrams for Hilbert curve mappings. Int J Comput Math 78(3):327–342

    Article  MathSciNet  MATH  Google Scholar 

  • Lee JA, Verleysen M (2007) Nonlinear dimensionality reduction. Springer, New York

  • McInnes L, Healy J, Melville J (2020) Umap: Uniform manifold approximation and projection for dimension reduction

  • McInnes L, Healy J, Saul N, Grossberger L (2018) Umap: uniform manifold approximation and projection. J Open Source Softw 3(29):861

    Article  Google Scholar 

  • Moon B, Jagadish HV, Faloutsos C, Saltz JH (2001) Analysis of the clustering properties of the Hilbert space-filling curve. IEEE Trans Knowl Data Eng 13(1):124–141

    Article  Google Scholar 

  • Nguyen G (2013) Courbes remplissant l’espace et leur application en traitement d’images. PhD thesis, Université de La Rochelle

  • Owczarek V, Franco P, Mullot R (2019) A genetic algorithm to solve a space-filling curve problem. In: SLS2019: International Workshop on Stochastic Local Search Algorithms, pp 5–6

  • Owczarek V, Franco P, Mullot R (2020) Space-filling curve: a robust data mining tool. In: Arai K, Bhatia R, Kapoor S (eds) Proceedings of the Future Technologies Conference (FTC) 2019, Springer, Cham, pp 663–675

  • Peano G (1890) Sur une courbe, qui remplit toute une aire plane. In: Mathematische Annalen, vol 36. Springer, Vienna, pp 157–160

  • Pearson K (1901) On lines and planes of closest fit to systems of points in space. Philos Mag Ser 2(11):559–572

    Article  MATH  Google Scholar 

  • Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, Vanderplas J, Passos A, Cournapeau D, Brucher M, Perrot M, Duchesnay E (2011) Scikit-learn: machine learning in Python. J Mach Learn Res 12:2825–2830

    MathSciNet  MATH  Google Scholar 

  • Sammon JW (1969) A nonlinear mapping for data structure analysis. IEEE Trans Comput 18(5):401–409

    Article  Google Scholar 

  • Singh K, Upadhyaya S (2012) Outlier detection: applications and techniques. Int J Comput Sci Issues 9(1):307–323

    Google Scholar 

  • van der Maaten L (2014) Accelerating t-sne using tree-based algorithms. J Mach Learn Res 15:3221–3245

    MathSciNet  MATH  Google Scholar 

  • van der Maaten L, Hinton G (2008) Visualizing data using t-SNE. J Mach Learn Res 9:2579–2605

    MATH  Google Scholar 

  • Van Der Maaten L, Postma E, Van den Herik J (2009) Dimensionality reduction: a comparative review. J Mach Learn Res 10:66–71

    Google Scholar 

  • Vanschoren J, van Rijn JN, Bischl B, Torgo L (2014) Openml: networked science in machine learning. SIGKDD Explor Newsl 15(2):49–60

    Article  Google Scholar 

  • Wang Y, Feng K, Chu X, Zhang J, Fu C, Sedlmair M, Yu X, Chen B (2018) A perception-driven approach to supervised dimensionality reduction for visualization. IEEE Trans Vis Comput Graph 24(5):1828–1840

    Article  Google Scholar 

  • Wold S, Esbensen K, Geladi P (1987) Principal component analysis. Chemom Intell Lab Syst 2(1–3):37–52

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Valentin Owczarek.

Additional information

Responsible editor: Johannes Fürnkranz and Fei Wang.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Appendix

For presentation purpose, a focus is made on two dataset (Blob and Tic-Tac-Toe) in Sect. 5. Additional results are presented in this appendix. Tables 7 and 8 regroups all results for Sammon Stress, Topology Preservation and Execution time. Moreover Fig. 5 is a comparative plot between PCA and SFC on the Sphere Dataset and Fig. 6 is between t-SNE and SFC on the Pen digits dataset.

Table 7 Results for the six datasets on two different measures
Table 8 Results for the six datasets on two different measures
Fig. 5
figure 5

Example of data projection on the 3-D Sphere Dataset. Where PCA fail to separate the two classes, SFC get a strict separation and yet PCA reached higher theoretical score (cf. Table 7)

Fig. 6
figure 6

Example of data projection on the 16-D Pen digits Dataset. As for the Tic-Tac-Toe dataset (cf. Figure 4), the clusters formed by t-SNE are more compact

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Owczarek, V., Franco, P. & Mullot, R. An alternative for data visualization using space-filling curve. Data Min Knowl Disc 37, 2281–2305 (2023). https://doi.org/10.1007/s10618-023-00943-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10618-023-00943-7

Keywords

Navigation