


default search action
33rd ICS 2019: Phoenix, AZ, USA
- Rudolf Eigenmann, Chen Ding, Sally A. McKee:
Proceedings of the ACM International Conference on Supercomputing, ICS 2019, Phoenix, AZ, USA, June 26-28, 2019. ACM 2019, ISBN 978-1-4503-6079-1
HPC applications
- Milinda Fernando, David Neilsen
, Eric W. Hirschmann, Hari Sundar:
A scalable framework for adaptive computational general relativity on heterogeneous clusters. 1-12 - Kunpeng Wang, Shizhen Xu, Haohuan Fu, Hongkun Yu, Wenlai Zhao, Guangwen Yang:
Parallelizing cryo-EM 3D reconstruction on GPU cluster with a partitioned and streamed model. 13-23 - Jianqiao Liu, Michael P. Robson, Thomas Quinn, Milind Kulkarni:
Efficient GPU tree walks for effective distributed n-body simulations. 24-34 - Michael Gowanlock:
Hybrid CPU/GPU clustering in shared memory on the billion point scale. 35-45
Accelerator programming
- Abdul Dakkak, Cheng Li, Jinjun Xiong
, Isaac Gelado, Wen-Mei W. Hwu:
Accelerating reduction and scan using tensor core units. 46-57 - Wei Zhang, Weihao Cui, Kaihua Fu, Quan Chen, Daniel Edward Mawhirter, Bo Wu, Chao Li, Minyi Guo:
Laius: Towards latency awareness and improved utilization of spatial multitasking accelerators in datacenters. 58-68 - Simon Zhang, Mengbai Xiao, Chengxin Guo, Liang Geng, Hao Wang, Xiaodong Zhang:
HYPHA: a framework based on separation of parallelisms to accelerate persistent homology matrix reduction. 69-81 - Fan Ni, Song Jiang, Hong Jiang, Jian Huang, Xingbo Wu:
SDC: a software defined cache for efficient data indexing. 82-93
HPC algorithms: linear algebra and solvers
- Zhen Xie, Guangming Tan, Weifeng Liu
, Ninghui Sun:
IA-SpGEMM: an input-aware auto-tuning framework for parallel sparse matrix-matrix multiplication. 94-105 - Jieyang Chen, Nan Xiong, Xin Liang, Dingwen Tao
, Sihuan Li, Kaiming Ouyang, Kai Zhao
, Nathan DeBardeleben, Qiang Guan, Zizhong Chen
:
TSM2: optimizing tall-and-skinny matrix-matrix multiplication on GPUs. 106-116 - Jakub Kurzak, Mark Gates
, Ali Charara, Asim YarKhan
, Jack J. Dongarra:
Least squares solvers for distributed-memory machines with GPU accelerators. 117-126 - Piyush Sao, Ramakrishnan Kannan, Xiaoye Sherry Li, Richard W. Vuduc
:
A communication-avoiding 3D sparse triangular solver. 127-137 - Paul R. Eller, Torsten Hoefler, William Gropp:
Using performance models to understand scalable Krylov solver performance at scale for structured grid problems. 138-149 - Kurt A. O'Hearn, Abdullah Alperen, Hasan Metin Aktulga
:
Performance optimization of reactive molecular dynamics simulations with dynamic charge distribution models on distributed memory platforms. 150-159
HPC computer architectures / accelerators
- Pradeep V. Kotipalli, Ranvijay Singh, Paul Wood, Ignacio Laguna, Saurabh Bagchi:
AMPT-GA: automatic mixed precision floating point tuning for GPU applications. 160-170 - Kyushick Lee, Michael B. Sullivan, Siva Kumar Sastry Hari, Timothy Tsai, Stephen W. Keckler, Mattan Erez
:
GPU snapshot: checkpoint offloading for GPU-dense systems. 171-183 - Haonan Wang, Mohamed Assem Ibrahim
, Sparsh Mittal, Adwait Jog:
Address-stride assisted approximate load value prediction in GPUs. 184-194 - Hussein Elnawawy, Rangeen Basu Roy Chowdhury, Amro Awad
, Gregory T. Byrd
:
Diligent TLBs: a mechanism for exploiting heterogeneity in TLB miss behavior. 195-205 - Xin Jin, Yaoyang Zhou, Bowen Huang, Zihao Yu, Xusheng Zhan, Huizhe Wang, Sa Wang, Ningmei Yu, Ninghui Sun, Yungang Bao:
QoSMT: supporting precise performance control for simultaneous multithreading architecture. 206-216 - Yuechen Chen
, Ahmed Louri:
An online quality management framework for approximate communication in network-on-chips. 217-226
HPC algorithms: graphs and tensors
- Jiajia Li
, Bora Uçar, Ümit V. Çatalyürek, Jimeng Sun
, Kevin J. Barker
, Richard W. Vuduc
:
Efficient and effective sparse tensor reordering. 227-237 - Venkatesan T. Chakaravarthy, Shivmaran S. Pandian, Saurabh Raje, Yogish Sabharwal:
On optimizing distributed non-negative Tucker decomposition. 238-249 - Roozbeh Karimi, David M. Koppelman, Chris J. Michael
:
GPU road network graph contraction and SSSP query. 250-260 - Hengjie Wang, Aparna Chandramowlishwaran
:
Multi-criteria partitioning of multi-block structured grids. 261-271
Modeling / resource management
- Quan Chen, Zhenning Wang, Jingwen Leng, Chao Li, Wenli Zheng, Minyi Guo:
Avalon: towards QoS awareness and improved utilization through multi-resource management in datacenters. 272-283 - Hao Xu
, Qingsen Wang, Shuang Song, Lizy Kurian John, Xu Liu:
Can we trust profiling results?: understanding and fixing the inaccuracy in modern profilers. 284-295 - Dimitrios Chasapis, Miquel Moretó
, Martin Schulz
, Barry Rountree, Mateo Valero, Marc Casas
:
Power efficient job scheduling by predicting the impact of processor manufacturing variability. 296-307 - Hadi Zamani, Yuanlai Liu, Devashree Tripathy
, Laxmi N. Bhuyan, Zizhong Chen
:
GreenMM: energy efficient GPU matrix multiplication through undervolting. 308-318
Parallel programming
- Huihui Sun, Florian Fey, Jie Zhao
, Sergei Gorlatch:
WCCV: improving the vectorization of IF-statements with warp-coherent conditions. 319-329 - Mohammad Norouzi Arab, Felix Wolf, Ali Jannesari
:
Automatic construct selection and variable classification in OpenMP. 330-341 - Mihail Popov, Alexandra Jimborean
, David Black-Schaffer:
Efficient thread/page/parallelism autotuning for NUMA systems. 342-353 - Philip Pfaffe, Tobias Grosser
, Martin Peter Tillmann
:
Efficient hierarchical online-autotuning: a case study on polyhedral accelerator mapping. 354-366
Distributed systems
- Abdelhalim Amer, Charles Archer, Michael Blocksome, Chongxiao Cao, Michael Chuvelev, Hajime Fujita, Maria Garzaran, Yanfei Guo, Jeff R. Hammond, Shintaro Iwasaki, Kenneth J. Raffenetti
, Mikhail Shiryaev, Min Si, Kenjiro Taura
, Sagar Thapaliya, Pavan Balaji:
Software combining to mitigate multithreaded MPI contention. 367-379 - Emilio Castillo, Nikhil Jain, Marc Casas
, Miquel Moretó
, Martin Schulz
, Ramón Beivide, Mateo Valero, Abhinav Bhatele:
Optimizing computation-communication overlap in asynchronous task-based programs. 380-391 - Donghe Kang, Vedang Patel, Ashwati Nair
, Spyros Blanas, Yang Wang, Srinivasan Parthasarathy
:
Henosis: workload-driven small array consolidation and placement for HDF5 applications on heterogeneous data stores. 392-402 - Cunlu Li, Dezun Dong, Xiangke Liao, John Kim
, Changhyun Kim:
DeepHiR: improving high-radix router throughput with deep hybrid memory buffer microarchitecture. 403-413
Machine learning acceleration
- Aleksandar Zlateski, Zhen Jia, Kai Li, Frédo Durand:
The anatomy of efficient FFT and winograd convolutions on modern CPUs. 414-424 - Karan Aggarwal, Uday Bondhugula:
Optimizing the linear fascicle evaluation algorithm for many-core systems. 425-437 - Lin Ning
, Xipeng Shen
:
Deep reuse: streamline CNN inference on the fly via coarse-grained computation reuse. 438-448 - Bradley McDanel, Sai Qian Zhang, H. T. Kung, Xin Dong:
Full-stack optimization for accelerating CNNs using powers-of-two weights with FPGA validation. 449-460 - Tong Geng, Tianqi Wang, Chunshu Wu, Chen Yang, Wei Wu
, Ang Li, Martin C. Herbordt:
O3BNN: an out-of-order architecture for high-performance binarized neural network inference with fine-grained pruning. 461-472 - Lei Zhao, Quan Deng
, Youtao Zhang, Jun Yang:
RFAcc: a 3D ReRAM associative array based random forest accelerator. 473-483
Correctness, efficiency and security
- Bo Fang, Hassan Halawa, Karthik Pattabiraman, Matei Ripeanu, Sriram Krishnamoorthy
:
BonVoision: leveraging spatial data smoothness for recovery from memory soft errors. 484-496 - Qiumin Xu, Hoda Naghibijouybari
, Shibo Wang, Nael B. Abu-Ghazaleh
, Murali Annavaram
:
GPUGuard: mitigating contention based side and covert channel attacks on GPUs. 497-509 - Yongbin Gu
, Lizhong Chen:
Dynamically linked MSHRs for adaptive miss handling in GPUs. 510-521

manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.