Planning of distributed data production for High Energy and Nuclear Physics

Makatun, Dzmitry; Lauret, Jérôme; Rudová, Hana

doi:10.1007/s10586-018-2834-3

Planning of distributed data production for High Energy and Nuclear Physics

Journal Article · Sat Aug 25 00:00:00 EDT 2018 · Cluster Computing

DOI:https://doi.org/10.1007/s10586-018-2834-3· OSTI ID:1480983

^[1]; Lauret, Jérôme ^[2]; Rudová, Hana ^[3]

Czech Technical Univ. in Prague, Prague (Czech Republic); Nuclear Physics Institute of the Czech Academy of Sciences, Prague (Czech Republic)
Brookhaven National Lab. (BNL), Upton, NY (United States)
Masaryk Univ., Brno (Czech Republic)

Modern experiments in High Energy and Nuclear Physics heavily rely on distributed computations using multiple computational facilities across the world. One of the essential types of the computations is a distributed data production where petabytes of raw files from a single source has to be processed once (per production campaign) using thousands of CPUs at distant locations and the output has to be transferred back to that source. The data distribution over a large system does not necessary match the distribution of storage, network and CPU capacity. Therefore, bottlenecks may appear and lead to increased latency and degraded performance. In this paper we propose a new scheduling approach for distributed data production which is based on the network flow maximization model. In our approach a central planner defines how much input and output data should be transferred over each network link in order to maximize the computational throughput. Such plans are created periodically for a fixed planning time interval using up-to-date information on network, storage and CPU resources. The centrally created plans are executed in a distributed manner by dedicated services running at participating sites. In conclusion, our simulations based on the log records from the data production framework of the experiment STAR (Solenoid Tracker at RHIC) have shown that the proposed model systematically provides a better performance compared to the simulated traditional techniques.

View Accepted Manuscript (DOE)

Research Organization:: Brookhaven National Laboratory (BNL), Upton, NY (United States)

Sponsoring Organization:: USDOE Office of Science (SC), Nuclear Physics (NP)

Grant/Contract Number:: SC0012704

OSTI ID:: 1480983

Report Number(s):: BNL-209348-2018-JAAM

Journal Information:: Cluster Computing, Vol. 21, Issue 4; ISSN 1386-7857

Publisher:: SpringerCopyright Statement

Country of Publication:: United States

Language:: English

References (65)

MapReduce: simplified data processing on large clusters Dean, Jeffrey; Ghemawat, Sanjay; Mehta, Brijesh Communications of the ACM, Vol. 51, Issue 1 https://doi.org/10.1145/1327452.1327492	journal	January 2008
Survey on Grid Resource Allocation Mechanisms Qureshi, Muhammad Bilal; Dehnavi, Maryam Mehri; Min-Allah, Nasro Journal of Grid Computing, Vol. 12, Issue 2 https://doi.org/10.1007/s10723-014-9292-9	journal	April 2014
Rucio – The next generation of large scale distributed system for ATLAS Data Management Garonne, V.; Vigne, R.; Stewart, G. Journal of Physics: Conference Series, Vol. 513, Issue 4 https://doi.org/10.1088/1742-6596/513/4/042021	journal	June 2014
The Hadoop Distributed File System Shvachko, Konstantin; Kuang, Hairong; Radia, Sanjay 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST) https://doi.org/10.1109/MSST.2010.5496972	conference	May 2010
A Survey of Information-Centric Networking Research Xylomenos, George; Ververidis, Christopher N.; Siris, Vasilios A. IEEE Communications Surveys & Tutorials, Vol. 16, Issue 2 https://doi.org/10.1109/SURV.2013.070813.00063	journal	July 2014
A survey of information-centric networking Ahlgren, Bengt; Dannewitz, Christian; Imbrenda, Claudio IEEE Communications Magazine, Vol. 50, Issue 7 https://doi.org/10.1109/MCOM.2012.6231276	journal	July 2012
MonALISA: An agent based, dynamic service system to monitor, control and optimize distributed systems Legrand, I.; Newman, H.; Voicu, R. Computer Physics Communications, Vol. 180, Issue 12 https://doi.org/10.1016/j.cpc.2009.08.003	journal	December 2009
The Globus Striped GridFTP Framework and Server Allcock, W.; Bresnahan, J.; Kettimuthu, R. ACM/IEEE SC 2005 Conference (SC'05) https://doi.org/10.1109/SC.2005.72	conference	January 2005
A taxonomy of Data Grids for distributed data sharing, management, and processing Venugopal, Srikumar; Buyya, Rajkumar; Ramamohanarao, Kotagiri ACM Computing Surveys, Vol. 38, Issue 1 https://doi.org/10.1145/1132952.1132955	journal	June 2006
A quality of service architecture that combines resource reservation and application adaptation Foster, I.; Roy, A.; Sander, V. IEEE Communications Society Workshop on Quality of Service, 2000 Eighth International Workshop on Quality of Service. IWQoS 2000 (Cat. No.00EX400) https://doi.org/10.1109/IWQOS.2000.847954	conference	January 2000
STAR detector overview Ackermann, K. H.; Adams, N.; Adler, C. Nuclear Instruments and Methods in Physics Research Section A: Accelerators, Spectrometers, Detectors and Associated Equipment, Vol. 499, Issue 2-3 https://doi.org/10.1016/S0168-9002(02)01960-5	journal	March 2003
Software-Defined Networking: A Comprehensive Survey Kreutz, Diego; Ramos, Fernando M. V.; Esteves Verissimo, Paulo Proceedings of the IEEE, Vol. 103, Issue 1 https://doi.org/10.1109/JPROC.2014.2371999	journal	January 2015
On power-law relationships of the Internet topology Faloutsos, Michalis; Faloutsos, Petros; Faloutsos, Christos ACM SIGCOMM Computer Communication Review, Vol. 29, Issue 4 https://doi.org/10.1145/316194.316229	journal	October 1999
Pegasus, a workflow management system for science automation Deelman, Ewa; Vahi, Karan; Juve, Gideon Future Generation Computer Systems, Vol. 46 https://doi.org/10.1016/j.future.2014.10.008	journal	May 2015
Data replication strategies with performance objective in data grid systems: a survey Mokadem, Riad; Hameurlain, Abdelkader International Journal of Grid and Utility Computing, Vol. 6, Issue 1 https://doi.org/10.1504/IJGUC.2015.066395	journal	January 2015
Quincy: fair scheduling for distributed computing clusters Isard, Michael; Prabhakaran, Vijayan; Currey, Jon Proceedings of the ACM SIGOPS 22nd symposium on Operating systems principles - SOSP '09 https://doi.org/10.1145/1629575.1629601	conference	January 2009
Simulations and study of a new scheduling approach for distributed data production Makatun, Dzmitry; Lauret, Jérȏme; Rudová, Hana Journal of Physics: Conference Series, Vol. 762 https://doi.org/10.1088/1742-6596/762/1/012023	journal	October 2016
Flow-based load balancing in multipathed layer-2 networks using OpenFlow and multipath-TCP Bredel, Michael; Bozakov, Zdravko; Barczyk, Artur Proceedings of the third workshop on Hot topics in software defined networking - HotSDN '14 https://doi.org/10.1145/2620728.2620770	conference	January 2014
AliEn: ALICE environment on the GRID Bagnasco, S.; Betev, L.; Buncic, P. Journal of Physics: Conference Series, Vol. 119, Issue 6 https://doi.org/10.1088/1742-6596/119/6/062012	journal	July 2008
Adaptation and Policy-Based Resource Allocation for Efficient Bulk Data Transfers in High Performance Computing Environments Chervenak, Ann L.; Sim, Alex; Gu, Junmin 2014 Fourth International Workshop on Network-Aware Data Management (NDM) https://doi.org/10.1109/NDM.2014.7	conference	November 2014
Dynamic replication strategies in data grid systems: a survey Tos, Uras; Mokadem, Riad; Hameurlain, Abdelkader The Journal of Supercomputing, Vol. 71, Issue 11 https://doi.org/10.1007/s11227-015-1508-7	journal	August 2015
Heuristics for scheduling parameter sweep applications in grid environments Casanova, H.; Legrand, A.; Zagorodnov, D. Proceedings 9th Heterogeneous Computing Workshop (HCW 2000) (Cat. No.PR00556) https://doi.org/10.1109/HCW.2000.843757	conference	January 2000
Decoupling computation and data scheduling in distributed data-intensive applications Ranganathan, K.; Foster, I. Proceedings 11th IEEE International Symposium on High Performance Distributed Computing https://doi.org/10.1109/HPDC.2002.1029935	conference	January 2002
A Metaheuristic for Optimizing the Performance and the Fairness in Job Scheduling Systems Klusáček, Dalibor; Rudová, Hana Studies in Computational Intelligence https://doi.org/10.1007/978-3-319-19833-0_1	book	January 2015
The ATLAS Distributed Data Management project: Past and Future Garonne, Vincent; Stewart, Graeme A.; Lassnig, Mario Journal of Physics: Conference Series, Vol. 396, Issue 3 https://doi.org/10.1088/1742-6596/396/3/032045	journal	December 2012
One click dataset transfer: toward efficient coupling of distributed storage resources and CPUs Zerola, Michal; Lauret, Jérôme; Barták, Roman Journal of Physics: Conference Series, Vol. 368 https://doi.org/10.1088/1742-6596/368/1/012022	journal	June 2012
The future of PanDA in ATLAS distributed computing De, K.; Klimentov, A.; Maeno, T. Journal of Physics: Conference Series, Vol. 664, Issue 6 https://doi.org/10.1088/1742-6596/664/6/062035	journal	December 2015
Efficient Data Staging Using Performance-Based Adaptation and Policy-Based Resource Allocation Chervenak, Ann L.; Sim, Alex; Gu, Junmin 2014 22nd Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP), 2014 22nd Euromicro International Conference on Parallel, Distributed, and Network-Based Processing https://doi.org/10.1109/PDP.2014.49	conference	February 2014
A Taxonomy of Job Scheduling on Distributed Computing Systems Lopes, Raquel V.; Menasce, Daniel IEEE Transactions on Parallel and Distributed Systems, Vol. 27, Issue 12 https://doi.org/10.1109/TPDS.2016.2537821	journal	December 2016
DIRAC pilot framework and the DIRAC Workload Management System Casajus, Adrian; Graciani, Ricardo; Paterson, Stuart Journal of Physics: Conference Series, Vol. 219, Issue 6 https://doi.org/10.1088/1742-6596/219/6/062049	journal	April 2010
The only constant is change: incorporating time-varying network reservations in data centers Xie, Di; Ding, Ning; Hu, Y. Charlie ACM SIGCOMM Computer Communication Review, Vol. 42, Issue 4 https://doi.org/10.1145/2377677.2377718	journal	September 2012
Toward Scalable Systems for Big Data Analytics: A Technology Tutorial No authors listed IEEE Access, Vol. 2 https://doi.org/10.1109/ACCESS.2014.2332453	journal	January 2014
DIRAC optimized workload management Paterson, S. K.; Tsaregorodtsev, A. Journal of Physics: Conference Series, Vol. 119, Issue 6 https://doi.org/10.1088/1742-6596/119/6/062040	journal	July 2008
A Survey of Software-Defined Networking: Past, Present, and Future of Programmable Networks Nunes, Bruno Astuto A.; Mendonca, Marc; Nguyen, Xuan-Nam IEEE Communications Surveys & Tutorials, Vol. 16, Issue 3 https://doi.org/10.1109/SURV.2014.012214.00180	journal	October 2014
A survey of dynamic replication and replica selection strategies based on data mining techniques in data grids Hamrouni, T.; Slimani, S.; Charrada, F. Ben Engineering Applications of Artificial Intelligence, Vol. 48 https://doi.org/10.1016/j.engappai.2015.11.002	journal	February 2016
A Taxonomy of Workflow Management Systems for Grid Computing Yu, Jia; Buyya, Rajkumar Journal of Grid Computing, Vol. 3, Issue 3-4 https://doi.org/10.1007/s10723-005-9010-8	journal	September 2005
GridSim: a toolkit for the modeling and simulation of distributed resource management and scheduling for Grid computing Buyya, Rajkumar; Murshed, Manzur Concurrency and Computation: Practice and Experience, Vol. 14, Issue 13-15 https://doi.org/10.1002/cpe.710	journal	November 2002
Scientific workflow management and the Kepler system Ludäscher, Bertram; Altintas, Ilkay; Berkley, Chad Concurrency and Computation: Practice and Experience, Vol. 18, Issue 10 https://doi.org/10.1002/cpe.994	journal	January 2006
Taxonomies of workflow scheduling problem and techniques in the cloud Smanchat, Sucha; Viriyapant, Kanchana Future Generation Computer Systems, Vol. 52 https://doi.org/10.1016/j.future.2015.04.019	journal	November 2015
Dynamic replica placement and selection strategies in data grids— A comprehensive survey Kingsy Grace, R.; Manimegalai, R. Journal of Parallel and Distributed Computing, Vol. 74, Issue 2 https://doi.org/10.1016/j.jpdc.2013.10.009	journal	February 2014
Borg, Omega, and Kubernetes Burns, Brendan; Grant, Brian; Oppenheimer, David Communications of the ACM, Vol. 59, Issue 5 https://doi.org/10.1145/2890784	journal	April 2016
Distributed computing in practice: the Condor experience Thain, Douglas; Tannenbaum, Todd; Livny, Miron Concurrency and Computation: Practice and Experience, Vol. 17, Issue 2-4, p. 323-356 https://doi.org/10.1002/cpe.938	journal	January 2005
Commissioning the HTCondor-CE for the Open Science Grid Bockelman, B.; Cartwright, T.; Frey, J. Journal of Physics: Conference Series, Vol. 664, Issue 6 https://doi.org/10.1088/1742-6596/664/6/062003	journal	December 2015
Offloading peak processing to virtual farm by STAR experiment at RHIC Balewski, Jan; Lauret, Jerome; Olson, Doug Journal of Physics: Conference Series, Vol. 368 https://doi.org/10.1088/1742-6596/368/1/012011	journal	June 2012
A taxonomy and survey on scheduling algorithms for scientific workflows in IaaS cloud computing environments: Workflow Scheduling Algorithms for Clouds Rodriguez, Maria Alejandra; Buyya, Rajkumar Concurrency and Computation: Practice and Experience, Vol. 29, Issue 8 https://doi.org/10.1002/cpe.4041	journal	December 2016
DENS: data center energy-efficient network-aware scheduling Kliazovich, Dzmitry; Bouvry, Pascal; Khan, Samee Ullah Cluster Computing, Vol. 16, Issue 1 https://doi.org/10.1007/s10586-011-0177-4	journal	September 2011
The Google file system Ghemawat, Sanjay; Gobioff, Howard; Leung, Shun-Tak ACM SIGOPS Operating Systems Review, Vol. 37, Issue 5 https://doi.org/10.1145/1165389.945450	journal	December 2003
Dryad: distributed data-parallel programs from sequential building blocks Isard, Michael; Budiu, Mihai; Yu, Yuan ACM SIGOPS Operating Systems Review, Vol. 41, Issue 3 https://doi.org/10.1145/1272998.1273005	journal	June 2007
Workflow Management in Condor Couvares, Peter; Kosar, Tevfik; Roy, Alain Workflows for e-Science, p. 357-375 https://doi.org/10.1007/978-1-84628-757-2_22	book	January 2007
Data Scheduling in Data Grids and Data Centers: A Short Taxonomy of Problems and Intelligent Resolution Techniques Kołodziej, Joanna; Khan, Samee Ullah Transactions on Computational Collective Intelligence X https://doi.org/10.1007/978-3-642-38496-7_7	book	January 2013
Exploiting Replication and Data Reuse to Efficiently Schedule Data-Intensive Applications on Grids Santos-Neto, Elizeu; Cirne, Walfredo; Brasileiro, Francisco Job Scheduling Strategies for Parallel Processing https://doi.org/10.1007/11407522_12	book	January 2005
Bandwidth-centric allocation of independent tasks on heterogeneous platforms Beaumont, O.; Carter, L.; Ferrante, J. Proceedings 16th International Parallel and Distributed Processing Symposium. IPDPS 2002 https://doi.org/10.1109/IPDPS.2002.1015568	conference	January 2002
Network flows for data distribution and computation Makatun, Dzmitry; Lauret, Jerome; Rudova, Hana 2016 IEEE Symposium Series on Computational Intelligence (SSCI) https://doi.org/10.1109/SSCI.2016.7850083	conference	December 2016
Borg, Omega, and Kubernetes: Lessons learned from three container-management systems over a decade Burns, Brendan; Grant, Brian; Oppenheimer, David Queue, Vol. 14, Issue 1 https://doi.org/10.1145/2898442.2898444	journal	January 2016
On power-law relationships of the Internet topology Faloutsos, Michalis; Faloutsos, Petros; Faloutsos, Christos Proceedings of the conference on Applications, technologies, architectures, and protocols for computer communication - SIGCOMM '99 https://doi.org/10.1145/316188.316229	conference	January 1999
DENS: Data Center Energy-Efficient Network-Aware Scheduling Kliazovich, Dzmitry; Bouvry, Pascal; Khan, Samee Ullah Int'l Conference on Cyber, Physical and Social Computing (CPSCom), 2010 IEEE/ACM Int'l Conference on Green Computing and Communications & Int'l Conference on Cyber, Physical and Social Computing https://doi.org/10.1109/greencom-cpscom.2010.31	conference	December 2010
The Google file system Ghemawat, Sanjay; Gobioff, Howard; Leung, Shun-Tak Proceedings of the nineteenth ACM symposium on Operating systems principles - SOSP '03 https://doi.org/10.1145/945445.945450	conference	January 2003
The Only Constant is Change Archuleta, Martha Journal of Nutrition Education and Behavior, Vol. 41, Issue 3 https://doi.org/10.1016/j.jneb.2009.03.123	journal	May 2009
The only constant is change: incorporating time-varying network reservations in data centers Xie, Di; Ding, Ning; Hu, Y. Charlie Proceedings of the ACM SIGCOMM 2012 conference on Applications, technologies, architectures, and protocols for computer communication - SIGCOMM '12 https://doi.org/10.1145/2342356.2342397	conference	January 2012
On Power-Law Relationships of the Internet Topology Faloutsos, Michalis; Faloutsos, Petros; Faloutsos, Christos Carnegie Mellon University https://doi.org/10.1184/r1/6607877	text	January 1984
On Power-Law Relationships of the Internet Topology Faloutsos, Michalis; Faloutsos, Petros; Faloutsos, Christos Carnegie Mellon University https://doi.org/10.1184/r1/6607877.v1	text	January 1984
Dryad: distributed data-parallel programs from sequential building blocks Isard, Michael; Budiu, Mihai; Yu, Yuan Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems 2007 - EuroSys '07 https://doi.org/10.1145/1272996.1273005	conference	January 2007
GridSim: A Toolkit for the Modeling and Simulation of Distributed Resource Management and Scheduling for Grid Computing Buyya, Rajkumar; Murshed, Manzur arXiv https://doi.org/10.48550/arxiv.cs/0203019	text	January 2002
A Taxonomy of Workflow Management Systems for Grid Computing Yu, Jia; Buyya, Rajkumar arXiv https://doi.org/10.48550/arxiv.cs/0503025	preprint	January 2005
A Taxonomy of Data Grids for Distributed Data Sharing, Management and Processing Venugopal, Srikumar; Buyya, Rajkumar; Ramamohanarao, Kotagiri arXiv https://doi.org/10.48550/arxiv.cs/0506034	preprint	January 2005

Similar Records

SDN-NGenIA, a Software Defined Next Generation Integrated Architecture for HEP and Data Intensive Science

Technical Report · Thu Feb 14 00:00:00 EST 2019 · OSTI ID:1593835

Newman, Harvey B.; Vlimant, Jean-Roch; Balcas, Justas; +1 more

Grid Data Access on Widely Distributed Worker Nodes Using Scalla and SRM

Conference · Thu Nov 10 00:00:00 EST 2011 · J.Phys.Conf.Ser.119:072019,2008 · OSTI ID:1029161

Jakl, Pavel; Lauret, Jerome; Hanushevsky, Andrew; +3 more

Setting up a STAR Tier 2 Site at Golias/Prague Farm

Journal Article · Fri May 28 00:00:00 EDT 2010 · Journal of Physics: Conference Series · OSTI ID:1020833

Lauret, J; Lauret, J; Chaloupka, P; +3 more

Related Subjects

73 NUCLEAR PHYSICS AND RADIATION PHYSICS
Load balancing
Job scheduling
Planning
Network flow
Distributed computing
Large scale computing
Grid
Data intensive applications
Data production
Big data

Planning of distributed data production for High Energy and Nuclear Physics

Citation Formats

References (65)

Similar Records

Related Subjects