12th Workshop on Parallel Programming and Run-Time Management Techniques for Many-core Architectures and 10th Workshop on Design Tools and Architectures for Multicore Embedded Computing Platforms (PARMA-DITAM 2021)

OASIcs, Volume 88

12th Workshop on Parallel Programming and Run-Time Management Techniques for Many-core Architectures and 10th Workshop on Design Tools and Architectures for Multicore Embedded Computing Platforms (PARMA-DITAM 2021)



Thumbnail PDF

Event

PARMA-DITAM 2021, January 19, 2021, Budapest, Hungary

Editors

João Bispo
  • University of Porto, Portugal
Stefano Cherubin
  • Codeplay Software Ltd, London, United Kingdom
José Flich
  • Universitat Politècnica de València, Spain

Publication Details

  • published at: 2021-03-02
  • Publisher: Schloss Dagstuhl – Leibniz-Zentrum für Informatik
  • ISBN: 978-3-95977-181-8
  • DBLP: db/conf/hipeac/parma2021

Access Numbers

Documents

No documents found matching your filter selection.
Document
Complete Volume
OASIcs, Volume 88, PARMA-DITAM 2021, Complete Volume

Authors: João Bispo, Stefano Cherubin, and José Flich


Abstract
OASIcs, Volume 88, PARMA-DITAM 2021, Complete Volume

Cite as

12th Workshop on Parallel Programming and Run-Time Management Techniques for Many-core Architectures and 10th Workshop on Design Tools and Architectures for Multicore Embedded Computing Platforms (PARMA-DITAM 2021). Open Access Series in Informatics (OASIcs), Volume 88, pp. 1-78, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021)


Copy BibTex To Clipboard

@Proceedings{bispo_et_al:OASIcs.PARMA-DITAM.2021,
  title =	{{OASIcs, Volume 88, PARMA-DITAM 2021, Complete Volume}},
  booktitle =	{12th Workshop on Parallel Programming and Run-Time Management Techniques for Many-core Architectures and 10th Workshop on Design Tools and Architectures for Multicore Embedded Computing Platforms (PARMA-DITAM 2021)},
  pages =	{1--78},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-181-8},
  ISSN =	{2190-6807},
  year =	{2021},
  volume =	{88},
  editor =	{Bispo, Jo\~{a}o and Cherubin, Stefano and Flich, Jos\'{e}},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/OASIcs.PARMA-DITAM.2021},
  URN =		{urn:nbn:de:0030-drops-136352},
  doi =		{10.4230/OASIcs.PARMA-DITAM.2021},
  annote =	{Keywords: OASIcs, Volume 88, PARMA-DITAM 2021, Complete Volume}
}
Document
Front Matter
Front Matter, Table of Contents, Preface, Conference Organization

Authors: João Bispo, Stefano Cherubin, and José Flich


Abstract
Front Matter, Table of Contents, Preface, Conference Organization

Cite as

12th Workshop on Parallel Programming and Run-Time Management Techniques for Many-core Architectures and 10th Workshop on Design Tools and Architectures for Multicore Embedded Computing Platforms (PARMA-DITAM 2021). Open Access Series in Informatics (OASIcs), Volume 88, pp. 0:i-0:x, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021)


Copy BibTex To Clipboard

@InProceedings{bispo_et_al:OASIcs.PARMA-DITAM.2021.0,
  author =	{Bispo, Jo\~{a}o and Cherubin, Stefano and Flich, Jos\'{e}},
  title =	{{Front Matter, Table of Contents, Preface, Conference Organization}},
  booktitle =	{12th Workshop on Parallel Programming and Run-Time Management Techniques for Many-core Architectures and 10th Workshop on Design Tools and Architectures for Multicore Embedded Computing Platforms (PARMA-DITAM 2021)},
  pages =	{0:i--0:x},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-181-8},
  ISSN =	{2190-6807},
  year =	{2021},
  volume =	{88},
  editor =	{Bispo, Jo\~{a}o and Cherubin, Stefano and Flich, Jos\'{e}},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/OASIcs.PARMA-DITAM.2021.0},
  URN =		{urn:nbn:de:0030-drops-136364},
  doi =		{10.4230/OASIcs.PARMA-DITAM.2021.0},
  annote =	{Keywords: Front Matter, Table of Contents, Preface, Conference Organization}
}
Document
Towards Adaptive Multi-Alternative Process Network

Authors: Hasna Bouraoui, Chadlia Jerad, and Jeronimo Castrillon


Abstract
With the increase of voice-controlled systems, speech based recognition applications are gaining more attention. Such applications need to adapt to hardware platforms to offer the required performance. Given the streaming nature of these applications, dataflow models are a common choice for model-based design and execution on parallel embedded platforms. However, most of today’s models are built on top of classical static dataflow with adaptivity extensions to express data parallelism. In this paper, we define and describe an approach for algorithmic adaptivity to express richer sets of variants and trade-offs. For this, we introduce multi-Alternative Process Network (mAPN), a high-level abstract representation where several process networks of the same application coexist. We describe an algorithm for automatic generation of all possible alternatives. The mAPN is enriched with meta-data serving to endow the alternatives with annotations in terms of a specific metric, helping to extract the most suitable alternative depending on the available computational resources and application/user constraints. We motivate the approach by the automatic subtitling application (ASA) as use case and run the experiments on an mAPN sample consisting of 12 randomly selected possible variants.

Cite as

Hasna Bouraoui, Chadlia Jerad, and Jeronimo Castrillon. Towards Adaptive Multi-Alternative Process Network. In 12th Workshop on Parallel Programming and Run-Time Management Techniques for Many-core Architectures and 10th Workshop on Design Tools and Architectures for Multicore Embedded Computing Platforms (PARMA-DITAM 2021). Open Access Series in Informatics (OASIcs), Volume 88, pp. 1:1-1:11, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021)


Copy BibTex To Clipboard

@InProceedings{bouraoui_et_al:OASIcs.PARMA-DITAM.2021.1,
  author =	{Bouraoui, Hasna and Jerad, Chadlia and Castrillon, Jeronimo},
  title =	{{Towards Adaptive Multi-Alternative Process Network}},
  booktitle =	{12th Workshop on Parallel Programming and Run-Time Management Techniques for Many-core Architectures and 10th Workshop on Design Tools and Architectures for Multicore Embedded Computing Platforms (PARMA-DITAM 2021)},
  pages =	{1:1--1:11},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-181-8},
  ISSN =	{2190-6807},
  year =	{2021},
  volume =	{88},
  editor =	{Bispo, Jo\~{a}o and Cherubin, Stefano and Flich, Jos\'{e}},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/OASIcs.PARMA-DITAM.2021.1},
  URN =		{urn:nbn:de:0030-drops-136378},
  doi =		{10.4230/OASIcs.PARMA-DITAM.2021.1},
  annote =	{Keywords: High level process network, algorithmic adaptivity, automatic subtitling application}
}
Document
BifurKTM: Approximately Consistent Distributed Transactional Memory for GPUs

Authors: Samuel Irving, Lu Peng, Costas Busch, and Jih-Kwon Peir


Abstract
We present BifurKTM, the first read-optimized Distributed Transactional Memory system for GPU clusters. The BifurKTM design includes: GPU KoSTM, a new software transactional memory conflict detection scheme that exploits relaxed consistency to increase throughput; and KoDTM, a Distributed Transactional Memory model that combines the Data- and Control- flow models to greatly reduce communication overheads. Despite the allure of huge speedups, GPUs are limited in use due to their programmability and extreme sensitivity to workload characteristics. These become daunting concerns when considering a distributed GPU cluster, wherein a programmer must design algorithms to hide communication latency by exploiting data regularity, high compute intensity, etc. The BifurKTM design allows GPU programmers to exploit a new workload characteristic: the percentage of the workload that is Read-Only (e.g. reads but does not modify shared memory), even when this percentage is not known in advance. Programmers designate transactions that are suitable for Approximate Consistency, in which transactions "appear" to execute at the most convenient time for preventing conflicts. By leveraging Approximate Consistency for Read-Only transactions, the BifurKTM runtime system offers improved performance, application flexibility, and programmability without introducing any errors into shared memory. Our experiments show that Approximate Consistency can improve BkTM performance by up to 34x in applications with moderate network communication utilization and a read-intensive workload. Using Approximate Consistency, BkTM can reduce GPU-to-GPU network communication by 99%, reduce the number of aborts by up to 100%, and achieve an average speedup of 18x over a similarly sized CPU cluster while requiring minimal effort from the programmer.

Cite as

Samuel Irving, Lu Peng, Costas Busch, and Jih-Kwon Peir. BifurKTM: Approximately Consistent Distributed Transactional Memory for GPUs. In 12th Workshop on Parallel Programming and Run-Time Management Techniques for Many-core Architectures and 10th Workshop on Design Tools and Architectures for Multicore Embedded Computing Platforms (PARMA-DITAM 2021). Open Access Series in Informatics (OASIcs), Volume 88, pp. 2:1-2:15, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021)


Copy BibTex To Clipboard

@InProceedings{irving_et_al:OASIcs.PARMA-DITAM.2021.2,
  author =	{Irving, Samuel and Peng, Lu and Busch, Costas and Peir, Jih-Kwon},
  title =	{{BifurKTM: Approximately Consistent Distributed Transactional Memory for GPUs}},
  booktitle =	{12th Workshop on Parallel Programming and Run-Time Management Techniques for Many-core Architectures and 10th Workshop on Design Tools and Architectures for Multicore Embedded Computing Platforms (PARMA-DITAM 2021)},
  pages =	{2:1--2:15},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-181-8},
  ISSN =	{2190-6807},
  year =	{2021},
  volume =	{88},
  editor =	{Bispo, Jo\~{a}o and Cherubin, Stefano and Flich, Jos\'{e}},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/OASIcs.PARMA-DITAM.2021.2},
  URN =		{urn:nbn:de:0030-drops-136386},
  doi =		{10.4230/OASIcs.PARMA-DITAM.2021.2},
  annote =	{Keywords: GPU, Distributed Transactional Memory, Approximate Consistency}
}
Document
The Impact of Precision Tuning on Embedded Systems Performance: A Case Study on Field-Oriented Control

Authors: Gabriele Magnani, Daniele Cattaneo, Michele Chiari, and Giovanni Agosta


Abstract
Field Oriented Control (FOC) is an industry-standard strategy for controlling induction motors and other kinds of AC-based motors. This control scheme has a very high arithmetic intensity when implemented digitally - in particular it requires the use of trigonometric functions. This requirement contrasts with the necessity of increasing the control step frequency when required, and the minimization of power consumption in applications where conserving battery life is paramount such as drones. However, it also makes FOC well suited for optimization using precision tuning techniques. Therefore, we exploit the state-of-the-art FixM methodology to optimize a miniapp simulating a typical FOC application by applying precision tuning of trigonometric functions. The FixM approach itself was extended in order to implement additional algorithm choices to enable a trade-off between execution time and code size. With the application of FixM on the miniapp, we achieved a speedup up to 278%, at a cost of an error in the output less than 0.1%.

Cite as

Gabriele Magnani, Daniele Cattaneo, Michele Chiari, and Giovanni Agosta. The Impact of Precision Tuning on Embedded Systems Performance: A Case Study on Field-Oriented Control. In 12th Workshop on Parallel Programming and Run-Time Management Techniques for Many-core Architectures and 10th Workshop on Design Tools and Architectures for Multicore Embedded Computing Platforms (PARMA-DITAM 2021). Open Access Series in Informatics (OASIcs), Volume 88, pp. 3:1-3:13, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021)


Copy BibTex To Clipboard

@InProceedings{magnani_et_al:OASIcs.PARMA-DITAM.2021.3,
  author =	{Magnani, Gabriele and Cattaneo, Daniele and Chiari, Michele and Agosta, Giovanni},
  title =	{{The Impact of Precision Tuning on Embedded Systems Performance: A Case Study on Field-Oriented Control}},
  booktitle =	{12th Workshop on Parallel Programming and Run-Time Management Techniques for Many-core Architectures and 10th Workshop on Design Tools and Architectures for Multicore Embedded Computing Platforms (PARMA-DITAM 2021)},
  pages =	{3:1--3:13},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-181-8},
  ISSN =	{2190-6807},
  year =	{2021},
  volume =	{88},
  editor =	{Bispo, Jo\~{a}o and Cherubin, Stefano and Flich, Jos\'{e}},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/OASIcs.PARMA-DITAM.2021.3},
  URN =		{urn:nbn:de:0030-drops-136390},
  doi =		{10.4230/OASIcs.PARMA-DITAM.2021.3},
  annote =	{Keywords: Approximate Computing, Field-oriented control, Precision Tuning}
}
Document
Resource Aware GPU Scheduling in Kubernetes Infrastructure

Authors: Aggelos Ferikoglou, Dimosthenis Masouros, Achilleas Tzenetopoulos, Sotirios Xydis, and Dimitrios Soudris


Abstract
Nowadays, there is an ever-increasing number of artificial intelligence inference workloads pushed and executed on the cloud. To effectively serve and manage the computational demands, data center operators have provisioned their infrastructures with accelerators. Specifically for GPUs, support for efficient management lacks, as state-of-the-art schedulers and orchestrators, threat GPUs only as typical compute resources ignoring their unique characteristics and application properties. This phenomenon combined with the GPU over-provisioning problem leads to severe resource under-utilization. Even though prior work has addressed this problem by colocating applications into a single accelerator device, its resource agnostic nature does not manage to face the resource under-utilization and quality of service violations especially for latency critical applications. In this paper, we design a resource aware GPU scheduling framework, able to efficiently colocate applications on the same GPU accelerator card. We integrate our solution with Kubernetes, one of the most widely used cloud orchestration frameworks. We show that our scheduler can achieve 58.8% lower end-to-end job execution time 99%-ile, while delivering 52.5% higher GPU memory usage, 105.9% higher GPU utilization percentage on average and 44.4% lower energy consumption on average, compared to the state-of-the-art schedulers, for a variety of ML representative workloads.

Cite as

Aggelos Ferikoglou, Dimosthenis Masouros, Achilleas Tzenetopoulos, Sotirios Xydis, and Dimitrios Soudris. Resource Aware GPU Scheduling in Kubernetes Infrastructure. In 12th Workshop on Parallel Programming and Run-Time Management Techniques for Many-core Architectures and 10th Workshop on Design Tools and Architectures for Multicore Embedded Computing Platforms (PARMA-DITAM 2021). Open Access Series in Informatics (OASIcs), Volume 88, pp. 4:1-4:12, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021)


Copy BibTex To Clipboard

@InProceedings{ferikoglou_et_al:OASIcs.PARMA-DITAM.2021.4,
  author =	{Ferikoglou, Aggelos and Masouros, Dimosthenis and Tzenetopoulos, Achilleas and Xydis, Sotirios and Soudris, Dimitrios},
  title =	{{Resource Aware GPU Scheduling in Kubernetes Infrastructure}},
  booktitle =	{12th Workshop on Parallel Programming and Run-Time Management Techniques for Many-core Architectures and 10th Workshop on Design Tools and Architectures for Multicore Embedded Computing Platforms (PARMA-DITAM 2021)},
  pages =	{4:1--4:12},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-181-8},
  ISSN =	{2190-6807},
  year =	{2021},
  volume =	{88},
  editor =	{Bispo, Jo\~{a}o and Cherubin, Stefano and Flich, Jos\'{e}},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/OASIcs.PARMA-DITAM.2021.4},
  URN =		{urn:nbn:de:0030-drops-136403},
  doi =		{10.4230/OASIcs.PARMA-DITAM.2021.4},
  annote =	{Keywords: cloud computing, GPU scheduling, kubernetes, heterogeneity}
}
Document
Invited Paper
HPC Application Cloudification: The StreamFlow Toolkit (Invited Paper)

Authors: Iacopo Colonnelli, Barbara Cantalupo, Roberto Esposito, Matteo Pennisi, Concetto Spampinato, and Marco Aldinucci


Abstract
Finding an effective way to improve accessibility to High-Performance Computing facilities, still anchored to SSH-based remote shells and queue-based job submission mechanisms, is an open problem in computer science. This work advocates a cloudification of HPC applications through a cluster-as-accelerator pattern, where computationally demanding portions of the main execution flow hosted on a Cloud Finding an effective way to improve accessibility to High-Performance Computing facilities, still anchored to SSH-based remote shells and queue-based job submission mechanisms, is an open problem in computer science. This work advocates a cloudification of HPC applications through a cluster-as-accelerator pattern, where computationally demanding portions of the main execution flow hosted on a Cloud infrastructure can be offloaded to HPC environments to speed them up. We introduce StreamFlow, a novel Workflow Management System that supports such a design pattern and makes it possible to run the steps of a standard workflow model on independent processing elements with no shared storage. We validated the proposed approach’s effectiveness on the CLAIRE COVID-19 universal pipeline, i.e. a reproducible workflow capable of automating the comparison of (possibly all) state-of-the-art pipelines for the diagnosis of COVID-19 interstitial pneumonia from CT scans images based on Deep Neural Networks (DNNs).

Cite as

Iacopo Colonnelli, Barbara Cantalupo, Roberto Esposito, Matteo Pennisi, Concetto Spampinato, and Marco Aldinucci. HPC Application Cloudification: The StreamFlow Toolkit (Invited Paper). In 12th Workshop on Parallel Programming and Run-Time Management Techniques for Many-core Architectures and 10th Workshop on Design Tools and Architectures for Multicore Embedded Computing Platforms (PARMA-DITAM 2021). Open Access Series in Informatics (OASIcs), Volume 88, pp. 5:1-5:13, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021)


Copy BibTex To Clipboard

@InProceedings{colonnelli_et_al:OASIcs.PARMA-DITAM.2021.5,
  author =	{Colonnelli, Iacopo and Cantalupo, Barbara and Esposito, Roberto and Pennisi, Matteo and Spampinato, Concetto and Aldinucci, Marco},
  title =	{{HPC Application Cloudification: The StreamFlow Toolkit}},
  booktitle =	{12th Workshop on Parallel Programming and Run-Time Management Techniques for Many-core Architectures and 10th Workshop on Design Tools and Architectures for Multicore Embedded Computing Platforms (PARMA-DITAM 2021)},
  pages =	{5:1--5:13},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-181-8},
  ISSN =	{2190-6807},
  year =	{2021},
  volume =	{88},
  editor =	{Bispo, Jo\~{a}o and Cherubin, Stefano and Flich, Jos\'{e}},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/OASIcs.PARMA-DITAM.2021.5},
  URN =		{urn:nbn:de:0030-drops-136419},
  doi =		{10.4230/OASIcs.PARMA-DITAM.2021.5},
  annote =	{Keywords: cloud computing, distributed computing, high-performance computing, streamflow, workflow management systems}
}

Filters


Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail