HiperView: real-time monitoring of dynamic behaviors of high-performance computing centers | The Journal of Supercomputing Skip to main content

Advertisement

Log in

HiperView: real-time monitoring of dynamic behaviors of high-performance computing centers

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

This paper presents HiperView, a visual analytics framework monitoring and characterizing the health status of high-performance computing systems through a RESTful interface in real time. The primary objectives of this visual analytical system are: (1) to provide a graphical interface for tracking the health status of a large number of data center hosts in real-time statistics, (2) to help users visually analyze unusual behavior of a series of events that may have temporal and spatial correlation, and (3) to assist in performing preliminary troubleshooting and maintenance with a visual layout that reflects the actual physical locations. Two use cases were analyzed in detail to assess the effectiveness of the HiperView on a medium-scale, Redfish-enabled production high-performance computing system with a total of 10 racks and 467 hosts. The visualization apparatus has been proven to offer the necessary support for system automation and control. Our framework’s visual components and interfaces are designed to potentially handle a larger-scale data center of thousands of hosts with hundreds of various health services per host.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (Japan)

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

Notes

  1. A computer room air conditioning (CRAC) unit is a device that monitors and maintains the temperature, air distribution, and humidity in a network room or data center.

References

  1. Allcock W, Felix E, Lowe M, Rheinheimer R, Fullop J (2011) Challenges of hpc monitoring. In: SC ’11: Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis, pp 1–6. https://doi.org/10.1145/2063348.2063378

  2. Amar R, Eagan J, Stasko J (2005) Low-level components of analytic activity in information visualization. In: Proc. of the IEEE Symposium on Information Visualization, pp 15–24

  3. Andrienko N, Andrienko G, Gatalsky P (2003) Exploratory spatio-temporal visualization: an analytical review. J Vis Lang Comput 14(6):503–541

    Article  Google Scholar 

  4. Andrienko N, Lammarsch T, Andrienko G, Fuchs G, Keim D, Miksch S, Rind A (2018) Viewing visual analytics as model building. In: Computer graphics forum, vol 37. Wiley Online Library, pp 275–299

  5. Barth W (2008) Nagios: system and network monitoring. No Starch Press, San Francisco

    Google Scholar 

  6. Betke E, Kunkel J (2017) Real-time i/o-monitoring of hpc applications with siox, elasticsearch, grafana and fuse. In: International Conference on High Performance Computing, pp 174–186. Springer

  7. Bostock M, Ogievetsky V, Heer J (2011) D3 data-driven documents. IEEE Trans Vis Comput Graph 17(12):2301–2309

    Article  Google Scholar 

  8. Buyya R (2000) Parmon: a portable and scalable monitoring system for clusters. Softw Pract Exp 30(7):723–739

    Article  Google Scholar 

  9. Carasso D (2012) Exploring splunk. CITO Research, New York

    Google Scholar 

  10. Ceneda D, Gschwandtner T, May T, Miksch S, Schulz H, Streit M, Tominski C (2017) Characterizing guidance in visual analytics. IEEE Trans Vis Comput Graph 23(1):111–120. https://doi.org/10.1109/TVCG.2016.2598468

    Article  Google Scholar 

  11. Dang T, Wilkinson L (2013) TimeExplorer: similarity search time series by their signatures. In: Proc. International Symp. on Visual Computing, pp 280–289

  12. Dang TN, Anand A, Wilkinson L (2013) TimeSeer: scagnostics for high-dimensional time series. IEEE Trans Vis Comput Graph 19(3):470–483. https://doi.org/10.1109/TVCG.2012.128

    Article  Google Scholar 

  13. Dang TN, Wilkinson L (2014) Scagexplorer: exploring scatterplots by their scagnostics. In: 2014 IEEE Pacific Visualization Symposium, pp 73–80. https://doi.org/10.1109/PacificVis.2014.42

  14. Dang TN, Wilkinson L (2014) Transforming scagnostics to reveal hidden features. IEEE Trans Vis Comput Graph 20(12):1624–1632

    Article  Google Scholar 

  15. Eurotech: Industry solutions (2019) https://www.eurotech.com/en/hpc/industry+solutions. Accessed 10 Oct 2020

  16. Grafana: The open platform for beautiful analytics and monitoring (2019). https://grafana.com/. Accessed 10 Oct 2020

  17. Guyon I, Elisseeff A (2003) An introduction to variable and feature selection. J Mach Learn Res 3: 1157–1182. URl http://dl.acm.org/citation.cfm?id=944919.944968. Accessed 10 Oct 2020

  18. HPCC: High Performance Computing Center (2021) http://www.depts.ttu.edu/hpcc/. Accessed 10 Oct 2020

  19. Hugh Greenberg ND (2018) Tivan: a scalable data collection and analytics cluster (2018). In: The 2nd Industry/University Joint International Workshop on Data Center Automation, Analytics, and Control (DAAC)

  20. Inc, A.: Amazon cloudwatch (2012) http://aws.amazon.com/cloudwatch/. Accessed 10 Oct 2020

  21. Jia C, Cai Y, Yu YT, Tse T (2016) 5w+1h pattern: a perspective of systematic mapping studies and a case study on cloud software testing. J Syst Softw 116:206–219

    Article  Google Scholar 

  22. Keim DA, Panse C, Sips M (2004) Information visualization: scope, techniques and opportunities for geovisualization. In: Dykes J (ed) Exploring geovisualization. Elsevier, Oxford, pp 1–17

    Google Scholar 

  23. Li J, Ali G, Nguyen N, Hass J, Sill A, Dang T, Chen Y (2020) Monster: an out-of-the-box monitoring tool for high performance computing systems. In: 2020 IEEE International Conference on Cluster Computing (CLUSTER), pp 119–129. IEEE

  24. Massie ML, Chun BN, Culler DE (2004) The ganglia distributed monitoring system: design, implementation, and experience. Parallel Comput 30(7):817–840

    Article  Google Scholar 

  25. Meyer M, Munzner T, Pfister H (2009) Mizbee: a multiscale synteny browser. IEEE Trans Vis Comput Graph 15(6):897–904. https://doi.org/10.1109/TVCG.2009.167

    Article  Google Scholar 

  26. Misra G, Agrawal S, Kurkure N, Pawar S, Mathur K (2011) Chreme: a web based application execution tool for using hpc resources. In: International Conference on High Performance Computing, pp 12–14

  27. Nguyen N, Dang T (2019) Hiperviz: Interactive visualization of CPU temperatures in high performance computing centers. In: Proceedings of the Practice and Experience in Advanced Research Computing on Rise of the Machines (Learning), PEARC ’19. Association for Computing Machinery, New York, NY, USA. https://doi.org/10.1145/3332186.3337959

  28. Nguyen N, Hass J, Chen Y, Li J, Sill A, Dang T (2020) Radarviewer: visualizing the dynamics of multivariate data. In: Practice and Experience in Advanced Research Computing, PEARC ’20, pp 555–556. Association for Computing Machinery, New York, NY, USA. https://doi.org/10.1145/3311790.3404538

  29. Organization SD (2013) Distributed management task force. https://www.dmtf.org/standards/redfish. Accessed 10 Oct 2020

  30. Palmas G, Bachynskyi M, Oulasvirta A, Seidel HP, Weinkauf T (2014) An edge-bundling layout for interactive parallel coordinates. In: 2014 IEEE Pacific Visualization Symposium, pp 57–64. https://doi.org/10.1109/PacificVis.2014.40

  31. Pike WA, Stasko J, Chang R, O’Connell TA (2009) The science of interaction. Inf Vis 8(4):263–274. https://doi.org/10.1057/ivs.2009.22

    Article  Google Scholar 

  32. Roberts PF (2013) IPMI: the most dangerous protocol you’ve never heard of. https://www.computerworld.com/article/2708437/ipmi--the-most-dangerous-protocol-you-ve-never-heard-of.html. Retrieved 27 Mar 2021

  33. Saary MJ (2008) Radar plots: a useful way for presenting multivariate health care data. J Clin Epidemiol 61(4):311–317

    Article  Google Scholar 

  34. Seo J, Shneiderman B (2004) A rank-by-feature framework for unsupervised multidimensional data exploration using low dimensional projections. In: Information Visualization, 2004. INFOVIS 2004. IEEE Symposium on, pp 65–72. IEEE

  35. Shneiderman B (1996) The eyes have it: a task by data type taxonomy for information visualizations. In: Proceedings of the 1996 IEEE Symposium on Visual Languages, VL ’96, p 336. IEEE Computer Society, Washington, DC, USA. URL http://dl.acm.org/citation.cfm?id=832277.834354

  36. Stearley J, Corwell S, Lord K (2010) Bridging the gaps: Joining information sources with splunk. In: SLAML

  37. Wilkinson L (2017) Visualizing big data outliers through distributed aggregation. IEEE Trans Vis Comput Graph 24(1):256–266

    Article  Google Scholar 

  38. Wilkinson L, Anand A, Grossman R (2005) Graph-theoretic scagnostics. In: Proceedings of the IEEE Information Visualization 2005, pp 157–164. IEEE Computer Society Press

  39. Wilkinson L, Anand A, Grossman R (2006) High-dimensional visual analytics: Interactive exploration guided by pairwise views of point distributions. IEEE Trans Vis Comput Graph 12(6):1363–1372

    Article  Google Scholar 

  40. Zadrozny P, Kodali R (2013) Big data analytics using Splunk: deriving operational intelligence from social media, machine data, existing data warehouses, and other real-time streaming sources. Apress, New York

    Book  Google Scholar 

Download references

Acknowledgements

The authors acknowledge the High-Performance Computing Center (HPCC) at Texas Tech University [18] in Lubbock for providing HPC resources and data that have contributed to the research results reported within this paper. The authors are thankful to the anonymous reviewers for their valuable feedback and suggestions that improved this paper significantly. This research is supported in part by the National Science Foundation under grant CNS-1362134, OAC-1835892, and through the IUCRC-CAC (Cloud and Autonomic Computing) Dell Inc. membership contribution.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tommy Dang.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Dang, T., Nguyen, N. & Chen, Y. HiperView: real-time monitoring of dynamic behaviors of high-performance computing centers. J Supercomput 77, 11807–11826 (2021). https://doi.org/10.1007/s11227-021-03724-5

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-021-03724-5

Keywords

Navigation