Abstract
This paper presents HiperView, a visual analytics framework monitoring and characterizing the health status of high-performance computing systems through a RESTful interface in real time. The primary objectives of this visual analytical system are: (1) to provide a graphical interface for tracking the health status of a large number of data center hosts in real-time statistics, (2) to help users visually analyze unusual behavior of a series of events that may have temporal and spatial correlation, and (3) to assist in performing preliminary troubleshooting and maintenance with a visual layout that reflects the actual physical locations. Two use cases were analyzed in detail to assess the effectiveness of the HiperView on a medium-scale, Redfish-enabled production high-performance computing system with a total of 10 racks and 467 hosts. The visualization apparatus has been proven to offer the necessary support for system automation and control. Our framework’s visual components and interfaces are designed to potentially handle a larger-scale data center of thousands of hosts with hundreds of various health services per host.
Similar content being viewed by others
Notes
A computer room air conditioning (CRAC) unit is a device that monitors and maintains the temperature, air distribution, and humidity in a network room or data center.
References
Allcock W, Felix E, Lowe M, Rheinheimer R, Fullop J (2011) Challenges of hpc monitoring. In: SC ’11: Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis, pp 1–6. https://doi.org/10.1145/2063348.2063378
Amar R, Eagan J, Stasko J (2005) Low-level components of analytic activity in information visualization. In: Proc. of the IEEE Symposium on Information Visualization, pp 15–24
Andrienko N, Andrienko G, Gatalsky P (2003) Exploratory spatio-temporal visualization: an analytical review. J Vis Lang Comput 14(6):503–541
Andrienko N, Lammarsch T, Andrienko G, Fuchs G, Keim D, Miksch S, Rind A (2018) Viewing visual analytics as model building. In: Computer graphics forum, vol 37. Wiley Online Library, pp 275–299
Barth W (2008) Nagios: system and network monitoring. No Starch Press, San Francisco
Betke E, Kunkel J (2017) Real-time i/o-monitoring of hpc applications with siox, elasticsearch, grafana and fuse. In: International Conference on High Performance Computing, pp 174–186. Springer
Bostock M, Ogievetsky V, Heer J (2011) D3 data-driven documents. IEEE Trans Vis Comput Graph 17(12):2301–2309
Buyya R (2000) Parmon: a portable and scalable monitoring system for clusters. Softw Pract Exp 30(7):723–739
Carasso D (2012) Exploring splunk. CITO Research, New York
Ceneda D, Gschwandtner T, May T, Miksch S, Schulz H, Streit M, Tominski C (2017) Characterizing guidance in visual analytics. IEEE Trans Vis Comput Graph 23(1):111–120. https://doi.org/10.1109/TVCG.2016.2598468
Dang T, Wilkinson L (2013) TimeExplorer: similarity search time series by their signatures. In: Proc. International Symp. on Visual Computing, pp 280–289
Dang TN, Anand A, Wilkinson L (2013) TimeSeer: scagnostics for high-dimensional time series. IEEE Trans Vis Comput Graph 19(3):470–483. https://doi.org/10.1109/TVCG.2012.128
Dang TN, Wilkinson L (2014) Scagexplorer: exploring scatterplots by their scagnostics. In: 2014 IEEE Pacific Visualization Symposium, pp 73–80. https://doi.org/10.1109/PacificVis.2014.42
Dang TN, Wilkinson L (2014) Transforming scagnostics to reveal hidden features. IEEE Trans Vis Comput Graph 20(12):1624–1632
Eurotech: Industry solutions (2019) https://www.eurotech.com/en/hpc/industry+solutions. Accessed 10 Oct 2020
Grafana: The open platform for beautiful analytics and monitoring (2019). https://grafana.com/. Accessed 10 Oct 2020
Guyon I, Elisseeff A (2003) An introduction to variable and feature selection. J Mach Learn Res 3: 1157–1182. URl http://dl.acm.org/citation.cfm?id=944919.944968. Accessed 10 Oct 2020
HPCC: High Performance Computing Center (2021) http://www.depts.ttu.edu/hpcc/. Accessed 10 Oct 2020
Hugh Greenberg ND (2018) Tivan: a scalable data collection and analytics cluster (2018). In: The 2nd Industry/University Joint International Workshop on Data Center Automation, Analytics, and Control (DAAC)
Inc, A.: Amazon cloudwatch (2012) http://aws.amazon.com/cloudwatch/. Accessed 10 Oct 2020
Jia C, Cai Y, Yu YT, Tse T (2016) 5w+1h pattern: a perspective of systematic mapping studies and a case study on cloud software testing. J Syst Softw 116:206–219
Keim DA, Panse C, Sips M (2004) Information visualization: scope, techniques and opportunities for geovisualization. In: Dykes J (ed) Exploring geovisualization. Elsevier, Oxford, pp 1–17
Li J, Ali G, Nguyen N, Hass J, Sill A, Dang T, Chen Y (2020) Monster: an out-of-the-box monitoring tool for high performance computing systems. In: 2020 IEEE International Conference on Cluster Computing (CLUSTER), pp 119–129. IEEE
Massie ML, Chun BN, Culler DE (2004) The ganglia distributed monitoring system: design, implementation, and experience. Parallel Comput 30(7):817–840
Meyer M, Munzner T, Pfister H (2009) Mizbee: a multiscale synteny browser. IEEE Trans Vis Comput Graph 15(6):897–904. https://doi.org/10.1109/TVCG.2009.167
Misra G, Agrawal S, Kurkure N, Pawar S, Mathur K (2011) Chreme: a web based application execution tool for using hpc resources. In: International Conference on High Performance Computing, pp 12–14
Nguyen N, Dang T (2019) Hiperviz: Interactive visualization of CPU temperatures in high performance computing centers. In: Proceedings of the Practice and Experience in Advanced Research Computing on Rise of the Machines (Learning), PEARC ’19. Association for Computing Machinery, New York, NY, USA. https://doi.org/10.1145/3332186.3337959
Nguyen N, Hass J, Chen Y, Li J, Sill A, Dang T (2020) Radarviewer: visualizing the dynamics of multivariate data. In: Practice and Experience in Advanced Research Computing, PEARC ’20, pp 555–556. Association for Computing Machinery, New York, NY, USA. https://doi.org/10.1145/3311790.3404538
Organization SD (2013) Distributed management task force. https://www.dmtf.org/standards/redfish. Accessed 10 Oct 2020
Palmas G, Bachynskyi M, Oulasvirta A, Seidel HP, Weinkauf T (2014) An edge-bundling layout for interactive parallel coordinates. In: 2014 IEEE Pacific Visualization Symposium, pp 57–64. https://doi.org/10.1109/PacificVis.2014.40
Pike WA, Stasko J, Chang R, O’Connell TA (2009) The science of interaction. Inf Vis 8(4):263–274. https://doi.org/10.1057/ivs.2009.22
Roberts PF (2013) IPMI: the most dangerous protocol you’ve never heard of. https://www.computerworld.com/article/2708437/ipmi--the-most-dangerous-protocol-you-ve-never-heard-of.html. Retrieved 27 Mar 2021
Saary MJ (2008) Radar plots: a useful way for presenting multivariate health care data. J Clin Epidemiol 61(4):311–317
Seo J, Shneiderman B (2004) A rank-by-feature framework for unsupervised multidimensional data exploration using low dimensional projections. In: Information Visualization, 2004. INFOVIS 2004. IEEE Symposium on, pp 65–72. IEEE
Shneiderman B (1996) The eyes have it: a task by data type taxonomy for information visualizations. In: Proceedings of the 1996 IEEE Symposium on Visual Languages, VL ’96, p 336. IEEE Computer Society, Washington, DC, USA. URL http://dl.acm.org/citation.cfm?id=832277.834354
Stearley J, Corwell S, Lord K (2010) Bridging the gaps: Joining information sources with splunk. In: SLAML
Wilkinson L (2017) Visualizing big data outliers through distributed aggregation. IEEE Trans Vis Comput Graph 24(1):256–266
Wilkinson L, Anand A, Grossman R (2005) Graph-theoretic scagnostics. In: Proceedings of the IEEE Information Visualization 2005, pp 157–164. IEEE Computer Society Press
Wilkinson L, Anand A, Grossman R (2006) High-dimensional visual analytics: Interactive exploration guided by pairwise views of point distributions. IEEE Trans Vis Comput Graph 12(6):1363–1372
Zadrozny P, Kodali R (2013) Big data analytics using Splunk: deriving operational intelligence from social media, machine data, existing data warehouses, and other real-time streaming sources. Apress, New York
Acknowledgements
The authors acknowledge the High-Performance Computing Center (HPCC) at Texas Tech University [18] in Lubbock for providing HPC resources and data that have contributed to the research results reported within this paper. The authors are thankful to the anonymous reviewers for their valuable feedback and suggestions that improved this paper significantly. This research is supported in part by the National Science Foundation under grant CNS-1362134, OAC-1835892, and through the IUCRC-CAC (Cloud and Autonomic Computing) Dell Inc. membership contribution.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Dang, T., Nguyen, N. & Chen, Y. HiperView: real-time monitoring of dynamic behaviors of high-performance computing centers. J Supercomput 77, 11807–11826 (2021). https://doi.org/10.1007/s11227-021-03724-5
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-021-03724-5