{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2024,9,20]],"date-time":"2024-09-20T04:18:49Z","timestamp":1726805929872},"reference-count":52,"publisher":"Springer Science and Business Media LLC","issue":"3","license":[{"start":{"date-parts":[[2024,8,1]],"date-time":"2024-08-01T00:00:00Z","timestamp":1722470400000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2024,8,1]],"date-time":"2024-08-01T00:00:00Z","timestamp":1722470400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["J Grid Computing"],"published-print":{"date-parts":[[2024,9]]},"abstract":"Abstract<\/jats:title>The increasing use of multiple Workflow Management Systems (WMS) employing various workflow languages and shared workflow repositories enhances the open-source bioinformatics ecosystem. Efficient resource utilization in these systems is crucial for keeping costs low and improving processing times, especially for large-scale bioinformatics workflows running in cloud environments. Recognizing this, our study introduces a novel reference architecture, Cloud Monitoring Kit (CMK), for a multi-platform monitoring system. Our solution is designed to generate uniform, aggregated metrics from containerized workflow tasks scheduled by different WMS. Central to the proposed solution is the use of task labeling methods, which enable convenient grouping and aggregating of metrics independent of the WMS employed. This approach builds upon existing technology, providing additional benefits of modularity and capacity to seamlessly integrate with other data processing or collection systems. We have developed and released an open-source implementation of our system, which we evaluated on Amazon Web Services (AWS) using a transcriptomics data analysis workflow executed on two scientific WMS. The findings of this study indicate that CMK provides valuable insights into resource utilization. In doing so, it paves the way for more efficient management of resources in containerized scientific workflows running in public cloud environments, and it provides a foundation for optimizing task configurations, reducing costs, and enhancing scheduling decisions. Overall, our solution addresses the immediate needs of bioinformatics workflows and offers a scalable and adaptable framework for future advancements in cloud-based scientific computing.<\/jats:p>","DOI":"10.1007\/s10723-024-09777-z","type":"journal-article","created":{"date-parts":[[2024,8,1]],"date-time":"2024-08-01T10:04:00Z","timestamp":1722506640000},"update-policy":"http:\/\/dx.doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":0,"title":["CMK: Enhancing Resource Usage Monitoring across Diverse Bioinformatics Workflow Management Systems"],"prefix":"10.1007","volume":"22","author":[{"given":"Robert","family":"Nica","sequence":"first","affiliation":[]},{"given":"Stefan","family":"G\u00f6tz","sequence":"additional","affiliation":[]},{"given":"Germ\u00e1n","family":"Molt\u00f3","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2024,8,1]]},"reference":[{"key":"9777_CR1","unstructured":"Amazon Web Services (AWS). https:\/\/aws.amazon.com\/ (2023)"},{"key":"9777_CR2","unstructured":"Google Cloud. https:\/\/cloud.google.com\/ (2023)"},{"key":"9777_CR3","unstructured":"Microsoft Azure. https:\/\/azure.microsoft.com\/ (2023)"},{"key":"9777_CR4","doi-asserted-by":"publisher","unstructured":"Siddiqui, T., Siddiqui, S.A., Khan, N.A.: Comprehensive Analysis of Container Technology. 2019 4th International Conference on Information Systems and Computer Networks, ISCON 2019, 218\u2013223 (2019). https:\/\/doi.org\/10.1109\/ISCON47742.2019.9036238","DOI":"10.1109\/ISCON47742.2019.9036238"},{"issue":"6","key":"9777_CR5","doi-asserted-by":"publisher","first-page":"40","DOI":"10.1109\/MCSE.2017.2421459","volume":"19","author":"JS Hale","year":"2017","unstructured":"Hale, J.S., Li, L., Richardson, C.N., Wells, G.N.: Containers for portable, productive, and performant scientific computing. Comput. Sci. Eng. 19(6), 40\u201350 (2017). https:\/\/doi.org\/10.1109\/MCSE.2017.2421459","journal-title":"Comput. Sci. Eng."},{"key":"9777_CR6","doi-asserted-by":"publisher","unstructured":"Felter, W., Ferreira, A., Rajamony, R., Rubio, J.: An updated performance comparison of virtual machines and Linux containers. ISPASS 2015 - IEEE International Symposium on Performance Analysis of Systems and Software, 171\u2013172 (2015). https:\/\/doi.org\/10.1109\/ISPASS.2015.7095802","DOI":"10.1109\/ISPASS.2015.7095802"},{"key":"9777_CR7","doi-asserted-by":"publisher","unstructured":"Giorgi, F.M., Ceraolo, C., Mercatelli, D.: The R Language: An Engine for Bioinformatics and Data Science. Life (Basel, Switzerland) 12(5) (2022). https:\/\/doi.org\/10.3390\/LIFE12050648","DOI":"10.3390\/LIFE12050648"},{"issue":"1","key":"9777_CR8","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1186\/1471-2105-9-82\/TABLES\/1","volume":"9","author":"M Fourment","year":"2008","unstructured":"Fourment, M., Gillings, M.R.: A comparison of common programming languages used in bioinformatics. BMC Bioinform. 9(1), 1\u20139 (2008). https:\/\/doi.org\/10.1186\/1471-2105-9-82\/TABLES\/1","journal-title":"BMC Bioinform."},{"issue":"7604","key":"9777_CR9","doi-asserted-by":"publisher","first-page":"452","DOI":"10.1038\/533452A","volume":"533","author":"M Baker","year":"2016","unstructured":"Baker, M., Penny, D.: Is there a reproducibility crisis? Nature 533(7604), 452\u2013454 (2016). https:\/\/doi.org\/10.1038\/533452A","journal-title":"Nature"},{"key":"9777_CR10","doi-asserted-by":"publisher","unstructured":"Amstutz, P., Crusoe, M.R., Tijani\u0107, N., Chapman, B., Chilton, J., Heuer, M., Kartashov, A., Leehr, D., M\u00e9nager, H., Nedeljkovich, M., Scales, M., Soiland-Reyes, S., Stojanovic, L.: Common Workflow Language, v1.0. Figshare (2016). https:\/\/doi.org\/10.6084\/M9.FIGSHARE.3115156","DOI":"10.6084\/M9.FIGSHARE.3115156"},{"key":"9777_CR11","doi-asserted-by":"publisher","unstructured":"Voss, K., Auwera, G.V.d., Gentry, J., Voss, K., Auwera, G., Gentry, J.: Full-stack genomics pipelining with GATK4 + WDL + Cromwell. ISCB Comm. J. 6 (2017). https:\/\/doi.org\/10.7490\/F1000RESEARCH.1114634.1","DOI":"10.7490\/F1000RESEARCH.1114634.1"},{"issue":"1\u20132","key":"9777_CR12","doi-asserted-by":"publisher","first-page":"108","DOI":"10.1162\/DINT_A_00033","volume":"2","author":"C Goble","year":"2020","unstructured":"Goble, C., Cohen-Boulakia, S., Soiland-Reyes, S., Garijo, D., Gil, Y., Crusoe, M.R., Peters, K., Schober, D.: FAIR Computational workflows. Data Intell. 2(1\u20132), 108\u2013121 (2020). https:\/\/doi.org\/10.1162\/DINT_A_00033","journal-title":"Data Intell."},{"issue":"6","key":"9777_CR13","doi-asserted-by":"publisher","first-page":"881","DOI":"10.1007\/S00778-017-0486-1","volume":"26","author":"M Herschel","year":"2017","unstructured":"Herschel, M., Diestelk\u00e4mper, R., Ben Lahmar, H.: A survey on provenance: What for? What form? What from? VLDB J. 26(6), 881\u2013906 (2017). https:\/\/doi.org\/10.1007\/S00778-017-0486-1","journal-title":"VLDB J."},{"issue":"11","key":"9777_CR14","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1093\/GIGASCIENCE\/GIZ095","volume":"8","author":"FZ Khan","year":"2019","unstructured":"Khan, F.Z., Soiland-Reyes, S., Sinnott, R.O., Lonie, A., Goble, C., Crusoe, M.R.: Sharing interoperable workflow provenance: A review of best practices and their practical application in CWLProv. GigaSci 8(11), 1\u201327 (2019). https:\/\/doi.org\/10.1093\/GIGASCIENCE\/GIZ095","journal-title":"GigaSci"},{"key":"9777_CR15","doi-asserted-by":"publisher","unstructured":"Missier, P., Belhajjame, K., Cheney, J.: The W3C PROV family of specifications for modelling provenance metadata. ACM Int. Conf. Proc. Ser. 773\u2013776 (2013). https:\/\/doi.org\/10.1145\/2452376.2452478","DOI":"10.1145\/2452376.2452478"},{"key":"9777_CR16","doi-asserted-by":"publisher","unstructured":"O\u2019Connor, B.D., Yuen, D., Chung, V., Duncan, A.G., Liu, X.K., Patricia, J., Paten, B., Stein, L., Ferretti, V.: The Dockstore: enabling modular, community-focused sharing of Docker-based genomics tools and workflows. F1000Research 6:52 6, 52 (2017). https:\/\/doi.org\/10.12688\/f1000research.10137.1","DOI":"10.12688\/f1000research.10137.1"},{"key":"9777_CR17","doi-asserted-by":"publisher","unstructured":"Goble, C., Soiland-Reyes, S., Bacall, F., Owen, S., Williams, A., Eguinoa, I., Droesbeke, B., Leo, S., Pireddu, L., Rodr\u00edguez-Navas, L., Fern\u00e1ndez, J.M., Capella-Gutierrez, S., M\u00e9nager, H., Gr\u00fcning, B., Serrano-Solano, B., Ewels, P., Coppens, F.: Implementing FAIR digital objects in the EOSC-Life workflow collaboratory (2021). https:\/\/doi.org\/10.5281\/ZENODO.4605654 . https:\/\/zenodo.org\/record\/4605654","DOI":"10.5281\/ZENODO.4605654"},{"key":"9777_CR18","doi-asserted-by":"publisher","unstructured":"Vivian, J., Rao, A.A., Nothaft, F.A., Ketchum, C., Armstrong, J., Novak, A., Pfeil, J., Narkizian, J., Deran, A.D., Musselman-Brown, A., Schmidt, H., Amstutz, P., Craft, B., Goldman, M., Rosenbloom, K., Cline, M., O\u2019Connor, B., Hanna, M., Birger, C., Kent, W.J., Patterson, D.A., Joseph, A.D., Zhu, J., Zaranek, S., Getz, G., Haussler, D., Paten, B.: Toil enables reproducible, open source, big biomedical data analyses. Nature Publishing Group (2017). https:\/\/doi.org\/10.1038\/nbt.3772","DOI":"10.1038\/nbt.3772"},{"key":"9777_CR19","unstructured":"chanzuckerberg\/miniwdl: Workflow Description Language developer tools & local runner. https:\/\/github.com\/chanzuckerberg\/miniwdl (2023)"},{"issue":"4","key":"9777_CR20","doi-asserted-by":"publisher","first-page":"316","DOI":"10.1038\/NBT.3820","volume":"35","author":"P Di Tommaso","year":"2017","unstructured":"Di Tommaso, P., Chatzou, M., Floden, E.W., Barja, P.P., Palumbo, E., Notredame, C.: Nextflow enables reproducible computational workflows. Nat. Biotechnol. 35(4), 316\u2013319 (2017). https:\/\/doi.org\/10.1038\/NBT.3820","journal-title":"Nat. Biotechnol."},{"key":"9777_CR21","unstructured":"AWS Batch. https:\/\/aws.amazon.com\/batch\/ (2023)"},{"key":"9777_CR22","unstructured":"Azure Batch. https:\/\/azure.microsoft.com\/en-us\/products\/batch\/ (2023)"},{"key":"9777_CR23","unstructured":"Google Batch. https:\/\/cloud.google.com\/batch\/ (2023)"},{"issue":"10","key":"9777_CR24","doi-asserted-by":"publisher","first-page":"1451","DOI":"10.1101\/gr.4086505","volume":"15","author":"B Giardine","year":"2005","unstructured":"Giardine, B., Riemer, C., Hardison, R.C., Burhans, R., Elnitski, L., Shah, P., Zhang, Y., Blankenberg, D., Albert, I., Taylor, J., Miller, W., Kent, W.J., Nekrutenko, A.: Galaxy: A platform for interactive large-scale genome analysis. Genome Res. 15(10), 1451\u20131455 (2005). https:\/\/doi.org\/10.1101\/gr.4086505","journal-title":"Genome Res."},{"key":"9777_CR25","unstructured":"TES specification. https:\/\/github.com\/ga4gh\/task-execution-schemas (2023)"},{"key":"9777_CR26","unstructured":"Funnel. https:\/\/ohsu-comp-bio.github.io\/funnel\/ (2023)"},{"key":"9777_CR27","unstructured":"WES Specification. https:\/\/github.com\/ga4gh\/workflow-execution-service-schemas (2023)"},{"key":"9777_CR28","doi-asserted-by":"publisher","first-page":"44","DOI":"10.1007\/10968987_3","volume":"2862","author":"AB Yoo","year":"2003","unstructured":"Yoo, A.B., Jette, M.A., Grondona, M.: SLURM: Simple linux utility for resource management. Lect. Notes Comput. Sci. (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 2862, 44\u201360 (2003). https:\/\/doi.org\/10.1007\/10968987_3","journal-title":"Lect. Notes Comput. Sci. (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)"},{"key":"9777_CR29","unstructured":"HashiCorp State of Cloud Strategy Survey. https:\/\/www.hashicorp.com\/state-of-the-cloud (2022)"},{"issue":"18","key":"9777_CR30","doi-asserted-by":"publisher","first-page":"3453","DOI":"10.1093\/BIOINFORMATICS\/BTZ054","volume":"35","author":"A Tyryshkina","year":"2019","unstructured":"Tyryshkina, A., Coraor, N., Nekrutenko, A.: Predicting runtimes of bioinformatics tools based on historical data: Five years of Galaxy usage. Bioinformatics 35(18), 3453\u20133460 (2019). https:\/\/doi.org\/10.1093\/BIOINFORMATICS\/BTZ054","journal-title":"Bioinformatics"},{"key":"9777_CR31","doi-asserted-by":"publisher","unstructured":"Fahad, A.M., Ahmed, A.A., Kahar, M.N.M.: The importance of monitoring cloud computing: An intensive review. IEEE Region 10 Annual International Conference, Proceedings\/TENCON 2017-December, 2858\u20132863 (2017). https:\/\/doi.org\/10.1109\/TENCON.2017.8228349","DOI":"10.1109\/TENCON.2017.8228349"},{"key":"9777_CR32","doi-asserted-by":"publisher","first-page":"480","DOI":"10.1007\/978-3-030-24322-7_59\/FIGURE","volume":"3","author":"MN Birje","year":"2020","unstructured":"Birje, M.N., Bulla, C.: Commercial and open source cloud monitoring tools: A review. Learn. Anal. Intell. Syst. 3, 480\u2013490 (2020). https:\/\/doi.org\/10.1007\/978-3-030-24322-7_59\/FIGURE","journal-title":"Learn. Anal. Intell. Syst."},{"issue":"3","key":"9777_CR33","doi-asserted-by":"publisher","first-page":"473","DOI":"10.1007\/S10723-018-09471-X\/METRICS","volume":"17","author":"R da Rosa Righi","year":"2019","unstructured":"da Rosa Righi, R., Lehmann, M., Gomes, M.M., Nobre, J.C., Costa, C.A., Rigo, S.J., Lena, M., Mohr, R.F., Oliveira, L.R.B.: A survey on global management view: toward combining system monitoring, resource management, and load prediction. J. Grid Comput. 17(3), 473\u2013502 (2019). https:\/\/doi.org\/10.1007\/S10723-018-09471-X\/METRICS","journal-title":"J. Grid Comput."},{"issue":"4","key":"9777_CR34","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1093\/GIGASCIENCE\/GIZ052","volume":"8","author":"T Ohta","year":"2019","unstructured":"Ohta, T., Tanjo, T., Ogasawara, O.: Accumulating computational resource usage of genomic data analysis workflow to optimize cloud computing instance selection. GigaScience 8(4), 1\u201311 (2019). https:\/\/doi.org\/10.1093\/GIGASCIENCE\/GIZ052","journal-title":"GigaScience"},{"key":"9777_CR35","doi-asserted-by":"publisher","unstructured":"Bader, J., Witzke, J., Becker, S., Loser, A., Lehmann, F., Doehler, L., Vu, A.D., Kao, O.: Towards advanced monitoring for scientific workflows. Proceedings - 2022 IEEE International Conference on Big Data. Big Data 2709\u20132715 (2022). https:\/\/doi.org\/10.1109\/BIGDATA55660.2022.10020864","DOI":"10.1109\/BIGDATA55660.2022.10020864"},{"key":"9777_CR36","unstructured":"Telegraf | InfluxData. https:\/\/influxdata.com\/telegraf (2024)"},{"key":"9777_CR37","unstructured":"Elasticsearch: The Official Distributed Search & Analytics Engine | Elastic. https:\/\/www.elastic.co\/elasticsearch (2024)"},{"key":"9777_CR38","unstructured":"Cloud monitoring | Dynatrace. https:\/\/www.dynatrace.com\/platform\/cloud-monitoring\/ (2023)"},{"key":"9777_CR39","unstructured":"Cloud Monitoring as a Service | Datadog. https:\/\/www.datadoghq.com\/ (2023)"},{"key":"9777_CR40","unstructured":"InfluxDB Cloud | InfluxData. https:\/\/www.influxdata.com\/products\/influxdb-cloud\/ (2023)"},{"key":"9777_CR41","unstructured":"Grafana: The open observability platform | Grafana Labs. https:\/\/grafana.com\/ (2024)"},{"key":"9777_CR42","unstructured":"Nomad by HashiCorp. https:\/\/www.nomadproject.io\/ (2024)"},{"key":"9777_CR43","unstructured":"Fully Managed Container Solution - Amazon Elastic Container Service (Amazon ECS) - Amazon Web Services. https:\/\/aws.amazon.com\/ecs\/ (2024)"},{"key":"9777_CR44","unstructured":"Infrastructure As Code Provisioning Tool - AWS CloudFormation - AWS. https:\/\/aws.amazon.com\/cloudformation\/ (2024)"},{"key":"9777_CR45","unstructured":"What is Amazon SNS? - Amazon Simple Notification Service. https:\/\/docs.aws.amazon.com\/sns\/latest\/dg\/welcome.html (2024)"},{"issue":"10","key":"9777_CR46","doi-asserted-by":"publisher","first-page":"1161","DOI":"10.1038\/s41592-021-01254-9","volume":"18","author":"L Wratten","year":"2021","unstructured":"Wratten, L., Wilm, A., G\u00f6ke, J.: Reproducible, scalable, and shareable analysis pipelines with bioinformatics workflow managers. Nat. Methods 18(10), 1161\u20131168 (2021). https:\/\/doi.org\/10.1038\/s41592-021-01254-9","journal-title":"Nat. Methods"},{"key":"9777_CR47","unstructured":"Genomics Workflows on AWS. https:\/\/docs.opendata.aws\/genomics-workflows\/quick-start.html (2023)"},{"key":"9777_CR48","unstructured":"IEEE SA - IEEE 1003.1-2001 (POSIX). https:\/\/standards.ieee.org\/ieee\/1003.1\/1389\/ (2021)"},{"key":"9777_CR49","doi-asserted-by":"publisher","unstructured":"Bage, A.P., Saxena, S., Singh, Y.: A brief review on lightweight practice of docker vulnerabilities. Software Engineering Approaches to Enable Digital Transformation Technologies 18\u201324 (2023). https:\/\/doi.org\/10.1201\/9781003441601-2","DOI":"10.1201\/9781003441601-2"},{"key":"9777_CR50","unstructured":"OmicsBox - Bioinformatics Made Easy, BioBam Bioinformatics. https:\/\/www.biobam.com\/omicsbox\/ (2023)"},{"issue":"10","key":"9777_CR51","doi-asserted-by":"publisher","first-page":"3420","DOI":"10.1093\/NAR\/GKN176","volume":"36","author":"S G\u00f6tz","year":"2008","unstructured":"G\u00f6tz, S., Garc\u00eda-G\u00f3mez, J.M., Terol, J., Williams, T.D., Nagaraj, S.H., Nueda, M.J., Robles, M., Tal\u00f3n, M., Dopazo, J., Conesa, A.: High-throughput functional annotation and data mining with the Blast2GO suite. Nucleic Acids Res. 36(10), 3420\u20133435 (2008). https:\/\/doi.org\/10.1093\/NAR\/GKN176","journal-title":"Nucleic Acids Res."},{"key":"9777_CR52","unstructured":"OpenTofu. https:\/\/opentofu.org\/ (2024)"}],"container-title":["Journal of Grid Computing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s10723-024-09777-z.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s10723-024-09777-z\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s10723-024-09777-z.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,9,19]],"date-time":"2024-09-19T10:14:06Z","timestamp":1726740846000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s10723-024-09777-z"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,8,1]]},"references-count":52,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2024,9]]}},"alternative-id":["9777"],"URL":"https:\/\/doi.org\/10.1007\/s10723-024-09777-z","relation":{},"ISSN":["1570-7873","1572-9184"],"issn-type":[{"type":"print","value":"1570-7873"},{"type":"electronic","value":"1572-9184"}],"subject":[],"published":{"date-parts":[[2024,8,1]]},"assertion":[{"value":"26 February 2024","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"23 July 2024","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"1 August 2024","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"Not applicable","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Ethics approval and consent to participate"}},{"value":"Not applicable","order":3,"name":"Ethics","group":{"name":"EthicsHeading","label":"Consent for publication"}},{"value":"The authors declare no competing interests.","order":4,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing interests"}}],"article-number":"62"}}