Planning of distributed data production for High Energy and Nuclear Physics (Journal Article) | OSTI.GOV
Skip to main content
U.S. Department of Energy
Office of Scientific and Technical Information

Planning of distributed data production for High Energy and Nuclear Physics

Journal Article · · Cluster Computing
 [1];  [2];  [3]
  1. Czech Technical Univ. in Prague, Prague (Czech Republic); Nuclear Physics Institute of the Czech Academy of Sciences, Prague (Czech Republic)
  2. Brookhaven National Lab. (BNL), Upton, NY (United States)
  3. Masaryk Univ., Brno (Czech Republic)

Modern experiments in High Energy and Nuclear Physics heavily rely on distributed computations using multiple computational facilities across the world. One of the essential types of the computations is a distributed data production where petabytes of raw files from a single source has to be processed once (per production campaign) using thousands of CPUs at distant locations and the output has to be transferred back to that source. The data distribution over a large system does not necessary match the distribution of storage, network and CPU capacity. Therefore, bottlenecks may appear and lead to increased latency and degraded performance. In this paper we propose a new scheduling approach for distributed data production which is based on the network flow maximization model. In our approach a central planner defines how much input and output data should be transferred over each network link in order to maximize the computational throughput. Such plans are created periodically for a fixed planning time interval using up-to-date information on network, storage and CPU resources. The centrally created plans are executed in a distributed manner by dedicated services running at participating sites. In conclusion, our simulations based on the log records from the data production framework of the experiment STAR (Solenoid Tracker at RHIC) have shown that the proposed model systematically provides a better performance compared to the simulated traditional techniques.

Research Organization:
Brookhaven National Laboratory (BNL), Upton, NY (United States)
Sponsoring Organization:
USDOE Office of Science (SC), Nuclear Physics (NP)
Grant/Contract Number:
SC0012704
OSTI ID:
1480983
Report Number(s):
BNL-209348-2018-JAAM
Journal Information:
Cluster Computing, Vol. 21, Issue 4; ISSN 1386-7857
Publisher:
SpringerCopyright Statement
Country of Publication:
United States
Language:
English

References (65)

MapReduce: simplified data processing on large clusters journal January 2008
Survey on Grid Resource Allocation Mechanisms journal April 2014
Rucio – The next generation of large scale distributed system for ATLAS Data Management journal June 2014
The Hadoop Distributed File System conference May 2010
A Survey of Information-Centric Networking Research journal July 2014
A survey of information-centric networking journal July 2012
MonALISA: An agent based, dynamic service system to monitor, control and optimize distributed systems journal December 2009
The Globus Striped GridFTP Framework and Server conference January 2005
A taxonomy of Data Grids for distributed data sharing, management, and processing journal June 2006
A quality of service architecture that combines resource reservation and application adaptation
  • Foster, I.; Roy, A.; Sander, V.
  • IEEE Communications Society Workshop on Quality of Service, 2000 Eighth International Workshop on Quality of Service. IWQoS 2000 (Cat. No.00EX400) https://doi.org/10.1109/IWQOS.2000.847954
conference January 2000
STAR detector overview
  • Ackermann, K. H.; Adams, N.; Adler, C.
  • Nuclear Instruments and Methods in Physics Research Section A: Accelerators, Spectrometers, Detectors and Associated Equipment, Vol. 499, Issue 2-3 https://doi.org/10.1016/S0168-9002(02)01960-5
journal March 2003
Software-Defined Networking: A Comprehensive Survey journal January 2015
On power-law relationships of the Internet topology journal October 1999
Pegasus, a workflow management system for science automation journal May 2015
Data replication strategies with performance objective in data grid systems: a survey journal January 2015
Quincy: fair scheduling for distributed computing clusters conference January 2009
Simulations and study of a new scheduling approach for distributed data production journal October 2016
Flow-based load balancing in multipathed layer-2 networks using OpenFlow and multipath-TCP conference January 2014
AliEn: ALICE environment on the GRID journal July 2008
Adaptation and Policy-Based Resource Allocation for Efficient Bulk Data Transfers in High Performance Computing Environments conference November 2014
Dynamic replication strategies in data grid systems: a survey journal August 2015
Heuristics for scheduling parameter sweep applications in grid environments conference January 2000
Decoupling computation and data scheduling in distributed data-intensive applications conference January 2002
A Metaheuristic for Optimizing the Performance and the Fairness in Job Scheduling Systems book January 2015
The ATLAS Distributed Data Management project: Past and Future journal December 2012
One click dataset transfer: toward efficient coupling of distributed storage resources and CPUs journal June 2012
The future of PanDA in ATLAS distributed computing journal December 2015
Efficient Data Staging Using Performance-Based Adaptation and Policy-Based Resource Allocation
  • Chervenak, Ann L.; Sim, Alex; Gu, Junmin
  • 2014 22nd Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP), 2014 22nd Euromicro International Conference on Parallel, Distributed, and Network-Based Processing https://doi.org/10.1109/PDP.2014.49
conference February 2014
A Taxonomy of Job Scheduling on Distributed Computing Systems journal December 2016
DIRAC pilot framework and the DIRAC Workload Management System journal April 2010
The only constant is change: incorporating time-varying network reservations in data centers journal September 2012
Toward Scalable Systems for Big Data Analytics: A Technology Tutorial journal January 2014
DIRAC optimized workload management journal July 2008
A Survey of Software-Defined Networking: Past, Present, and Future of Programmable Networks journal October 2014
A survey of dynamic replication and replica selection strategies based on data mining techniques in data grids journal February 2016
A Taxonomy of Workflow Management Systems for Grid Computing journal September 2005
GridSim: a toolkit for the modeling and simulation of distributed resource management and scheduling for Grid computing journal November 2002
Scientific workflow management and the Kepler system
  • Ludäscher, Bertram; Altintas, Ilkay; Berkley, Chad
  • Concurrency and Computation: Practice and Experience, Vol. 18, Issue 10 https://doi.org/10.1002/cpe.994
journal January 2006
Taxonomies of workflow scheduling problem and techniques in the cloud journal November 2015
Dynamic replica placement and selection strategies in data grids— A comprehensive survey journal February 2014
Borg, Omega, and Kubernetes journal April 2016
Distributed computing in practice: the Condor experience
  • Thain, Douglas; Tannenbaum, Todd; Livny, Miron
  • Concurrency and Computation: Practice and Experience, Vol. 17, Issue 2-4, p. 323-356 https://doi.org/10.1002/cpe.938
journal January 2005
Commissioning the HTCondor-CE for the Open Science Grid journal December 2015
Offloading peak processing to virtual farm by STAR experiment at RHIC journal June 2012
A taxonomy and survey on scheduling algorithms for scientific workflows in IaaS cloud computing environments: Workflow Scheduling Algorithms for Clouds journal December 2016
DENS: data center energy-efficient network-aware scheduling journal September 2011
The Google file system journal December 2003
Dryad: distributed data-parallel programs from sequential building blocks journal June 2007
Workflow Management in Condor book January 2007
Data Scheduling in Data Grids and Data Centers: A Short Taxonomy of Problems and Intelligent Resolution Techniques book January 2013
Exploiting Replication and Data Reuse to Efficiently Schedule Data-Intensive Applications on Grids book January 2005
Bandwidth-centric allocation of independent tasks on heterogeneous platforms conference January 2002
Network flows for data distribution and computation conference December 2016
Borg, Omega, and Kubernetes: Lessons learned from three container-management systems over a decade journal January 2016
On power-law relationships of the Internet topology
  • Faloutsos, Michalis; Faloutsos, Petros; Faloutsos, Christos
  • Proceedings of the conference on Applications, technologies, architectures, and protocols for computer communication - SIGCOMM '99 https://doi.org/10.1145/316188.316229
conference January 1999
DENS: Data Center Energy-Efficient Network-Aware Scheduling
  • Kliazovich, Dzmitry; Bouvry, Pascal; Khan, Samee Ullah
  • Int'l Conference on Cyber, Physical and Social Computing (CPSCom), 2010 IEEE/ACM Int'l Conference on Green Computing and Communications & Int'l Conference on Cyber, Physical and Social Computing https://doi.org/10.1109/greencom-cpscom.2010.31
conference December 2010
The Google file system conference January 2003
The Only Constant is Change journal May 2009
The only constant is change: incorporating time-varying network reservations in data centers
  • Xie, Di; Ding, Ning; Hu, Y. Charlie
  • Proceedings of the ACM SIGCOMM 2012 conference on Applications, technologies, architectures, and protocols for computer communication - SIGCOMM '12 https://doi.org/10.1145/2342356.2342397
conference January 2012
On Power-Law Relationships of the Internet Topology text January 1984
On Power-Law Relationships of the Internet Topology text January 1984
Dryad: distributed data-parallel programs from sequential building blocks conference January 2007
GridSim: A Toolkit for the Modeling and Simulation of Distributed Resource Management and Scheduling for Grid Computing text January 2002
A Taxonomy of Workflow Management Systems for Grid Computing preprint January 2005
A Taxonomy of Data Grids for Distributed Data Sharing, Management and Processing preprint January 2005