HiPC 2021 Technical Program

28^th IEEE INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING, DATA, & ANALYTICS

TECHNICAL PROGRAM

See the Program-at-a-Glance for session schedule

These papers are part of the HiPC 2021 proceedings that will be distributed to all registrants. The proceedings will also contain a file of presentation slides for the paper. The papers will be presented virtually by authors in live sessions to be held on Friday, the 17th and Saturday the 18th.

Keynote 1

Towards an Integral System for Processing Big Graphs at Scale

Jingren Zhou (Alibaba Group)

Session Chair: Yogesh Simmhan

Keynote 2

AI4IO: A Suite of AI-based Tools for IO-aware HPC Resource Management

Michela Taufer (University of Tennessee Knoxville)

Session Chair: Ana Lucia Verbanescu

Keynote 3

Improving Efficiency and Performance Through Faster Scheduling Mechanisms

Adam Belay (Massachusetts Institute of Technology)

Session Chair: Viktor Prasanna

Technical Session 1: Scalable Algorithms and Systems for Data Science – part 1

Parallel Actors and Learners: A Framework for Generating Scalable RL Implementations

Chi Zhang (University of Southern California, USA), Sanmukh Kuppannagari (University of Southern California, USA), Viktor Prasanna (University of Southern California, USA)

DEISA Dask-Enabled In Situ Analytics

Amal Gueroudji (French Alternative energies and atomic energy commission, CEA, France), Julien Bigot (CEA, Maison de la Simulation, France), Bruno Raffin (Inria, France)

A Model of Graph Transactional Coverage Patterns with Applications to Drug Discovery

A Srinivas Reddy (International Institute of Information Technology Hyderabad, India), P Krishna Reddy (International Institute of Information Technology Hyderabad, India), Anirban Mondal (Ashoka University, India), Deva Priyakumar (International Institute of Information Technology Hyderabad, India)

Faster Parallel Training of Word Embeddings

Eliza Wszola (ETH Zurich, Switzerland), Martin Jaggi (EPFL, Switzerland), Markus Pueschel (ETH Zurich, Switzerland)

CMAP-LAP: Configurable Massively Parallel Solver for Lattice Problems

Nariaki Tateiwa (Kyushu University, Japan), Yuji Shinano (Zuse Institute Berlin, Germany), Keiichiro Yamamura (Kyushu University, Japan), Akihiro Yoshida (Kyushu University, Japan), Shizuo Kaji (Kyushu University, Japan), Masaya Yasuda (Rikkyo University, Japan), Katsuki Fujisawa (Kyushu University, Japan),

MulConn: User-Transparent I/O Subsystem for High-Performance Parallel File Systems

Hwajung Kim (Seoul National University, South Korea), Jiwoo Bang (Seoul National University, South Korea), Dong Kyu Sung (Seoul National University, South Korea), Hyeonsang Eom (Seoul National University, South Korea), Heonyoung Yeom (Seoul National University, South Korea), Hanul Sung (Sangmyung University, South Korea)

Technical Session 2: HPC Algorithms

Monte Carlo Tree Search for Task Mapping onto Heterogeneous Platforms

Ta-Yang Wang (University of Southern California, USA), William Chang (University of Southern California, USA), Ajitesh Srivastava (University of Southern California, USA), Rajgopal Kannan (US Army Research Lab, USA), Viktor Prasanna (University of Southern California, USA)

Shared-memory implementation of the Karp-Sipser kernelization process

Johannes Langguth (Simula Research Laboratory, Norway), Ioannis Panagiotas (Sorbonne University, France), Bora Ucar (CNRS and LIP ENS Lyon, France)

How to Avoid Zero-Spacing in Fractionally-Strided Convolution? A Hardware-Algorithm Co-Design Methodology

Yuan Meng (University of Southern California, USA), Sanmukh Kuppannagari (University of Southern California, USA), Rajgopal Kannan (US Army Research Lab, USA), Viktor Prasanna (University of Southern California, USA)

PPBT: A High Performance Parallel Search Tree

Jiawen Guan (ShanghaiTech University, China), Rui Fan (ShanghaiTech University, China)

Deciding Non-Compressible Blocks in Sparse Direct Solvers using Incomplete Factorization

Esragul Korkmaz (Inria, France), Mathieu Faverge (Bordeaux-INP, France), Gregoire Pichon (University Claude Bernard Lyon 1, France), Pierre Ramet (University of Bordeaux, France)

Technical Session 3: HPC Applications

Efficient Parallel Algorithms for Computing Percolation Centrality

Kishore Kothapalli (International Institute of Information Technology, Hyderabad, India), Athreya Chandramouli (IIIT Hyderabad, India), Sayantan Jana (IIIT Hyderabad, India)

Accelerating JPEG Decompression on GPUs

André Weißenberger (Johannes Gutenberg University, Germany), Bertil Schmidt (Johannes Gutenberg University, Germany)

Towards Zero-Waste Recovery and Zero-Overhead Checkpointing in Ensemble Data Assimilation

Kai Keller (Barcelona Supercomputing Center, Spain), Adrian Cristal Kestelman (Barcelona Supercomputing Center, Spain), Leonardo Bautista Gomez (Barcelona Supercomputing Center, Spain)

Predictive Analysis of Large-Scale Coupled CFD Simulations with the CPX Mini-App

Archie Powell (University of Warwick, United Kingdom), Kabir Choudry (University of Warwick, United Kingdom), Arun Prabhakar (University of Warwick, United Kingdom), Istvan Reguly (Pazmany Peter Catholic University, Hungary), Dario Amirante (University of Surrey, United Kingdom), Stephen Jarvis (University of Birmingham, United Kingdom), Gihan Mudalige (University of Warwick, United Kingdom)

The 16, 384-node Parallelism of 3D-CNN Training on An Arm CPU based Supercomputer

Akihiro Tabuchi (Fujitsu Limited, Japan), Koichi Shirahata (Fujitsu Limited, Japan), Masafumi Yamazaki (Fujitsu Limited, Japan), Akihiko Kasagi (Fujitsu Limited, Japan), Takumi Honda (Fujitsu Limited, Japan), Kouji Kurihara (Fujitsu Limited, Japan), Kentaro Kawakami (Fujitsu Limited, Japan), Tsuguchika Tabaru (Fujitsu Limited, Japan), Naoto Fukumoto (Fujitsu Limited, Japan), Akiyoshi Kuroda (RIKEN Center for Computational Science, Japan), Takaaki Fukai (RIKEN Center for Computational Science, Japan), Kento Sato (RIKEN Center for Computational Science, Japan)

Technical Session 4: HPC Architecture and System Software

iPUG for multiple Graphcore IPUs: Optimizing performance and scalability of parallel breadth-first search

Luk Burchard (Simula Research Laboratory, Norway), Xing Cai (Simula Research Laboratory, Norway), Johannes Langguth (Simula Research Laboratory, Norway)

Empirical Analysis of Architectural Primitives for NVRAM Consistency

Arun Kp (Indian Institute of Technology Kanpur, India), Debadatta Mishra (Indian Institute of Technology Kanpur, India), Biswabandan Panda (Indian Institute of Technology, Bombay, India)

JACC: An OpenACC Runtime Framework with Kernel-Level and Multi-GPU Parallelization

Kazuaki Matsumura (Barcelona Supercomputing Center (BSC-CNS), Spain), Simon Garcia De Gonzalo (Barcelona Supercomputing Center (BSC-CNS), Spain), Antonio J. Peña (Barcelona Supercomputing Center (BSC-CNS), Spain)

Technical Session 5: HPC Algorithms and Architecture

Anti-Section Transitive Closure

Oded Green (NVIDIA/Georgia Institute of Technology, USA), Zhihui Du (New Jersey Institute of Technology, USA), Sanyamee Patel (New Jersey Institute of Technology, USA), Zehui Xie (Stevens Institute of Technology, USA), Hang Liu (Stevens Institute of Technology, USA), David A. Bader (New Jersey Institute of Technology, USA)

Column-Segmented Sparse Matrix-Matrix Multiplication

Xiaojing An (Georgia Institute of Technology, USA), Ümit Çatalyürek (Georgia Institute of Technology, USA)

Multi-Stage Memory Efficient Strassen’s Matrix Multiplication on GPU

Arjun Gopala Krishnan (Concordia University, Montreal, Canada), Dhrubajyoti Goswami (Concordia University, Montreal, Canada)

Optimizing k-path selection for randomized interconnection networks

Md Nahid Newaz (Oakland University, USA), Md Atiqul Mollah (Oakland University, USA)

Dynamic Voltage and Frequency Scaling to ImproveEnergy-Efficiency of Hardware Accelerators

Siqin Liu (Ohio University, USA), Avinash Karanth (Ohio University, USA)

Technical Session 6: HPC System Software

Adaptive Placement of Data Analysis Tasks For Staging Based In-Situ Processing

Zhe Wang (Rutgers University, USA), Pradeep Subedi (University of Utah, USA), Matthieu Dorier (Argonne National Laboratory, USA), Philip E. Davis (Rutgers University, USA), Manish Parashar (University of Utah, USA)

HEALS: A Parallel eALS Recommendation System on CPU/GPU Heterogeneous Platforms

Qihan Wang (College of William and Mary, USA), Bin Ren (William & Mary, USA), Wei Niu (William & Mary, USA), Ruoming Jin (Kent State University, USA), Chen Li (iLambda, Inc, USA)

Shrinking Sample Search Algorithm for Automatic Tuning of GPU Kernels

Xiang Li (The Ohio State University, United States), Gagan Agrawal (Augustaa University, United States)

Towards Architecture-aware Hierarchical Communication Trees on Modern HPC Systems

Bharath Ramesh (The Ohio State University, USA), Jahanzeb Maqbool Hashmi (The Ohio State University, USA), Shulei Xu (The Ohio State University, USA), Aamir Shafi (The Ohio State University, USA), Mahdieh Ghazimirsaeed (The Ohio State University, USA), Mohammadreza Bayatpour (The Ohio State University, USA), Hari Subramoni (The Ohio State University, USA), Dhabaleswar Panda (The Ohio State University, USA)

Technical Session 7: Scalable Algorithms and Systems for Data Science – part 2

DistMILE: A Distributed Multi-Level Framework for Scalable Graph Embedding

Yuntian He (The Ohio State University, USA), Saket Gurukar (The Ohio State University, USA), Pouya Kousha (The Ohio State University, USA), Hari Subramoni (The Ohio State University, USA), Dhabaleswar K. Panda (The Ohio State University, USA), Srinivasan Parthasarathy (The Ohio State University, USA)

Model-based Reinforcement Learning for Elastic Stream Processing in Edge Computing

Jinlai Xu (University of Pittsburgh, USA), Balaji Palanisamy (University of Pittsburgh, USA)

Layout Aware Hardware Assisted Design for Derived Data Types in MPI

Kaushik Suresh (The Ohio State University, USA), Bharath Ramesh (The Ohio State University, USA), Chen Chun Chen (The Ohio State University, USA), Seyedeh Mahdieh Ghazimirsaeed (The Ohio State University, USA), Mohammadreza Bayatpour (The Ohio State University, USA), Aamir Shafi (The Ohio State University, USA), Hari Subramoni (The Ohio State University, USA), Dhabaleswar Panda (The Ohio State University, USA)

Parallel Algorithms for Efficient Computation of High-Order Line Graphs of Hypergraphs

Xu Tony Liu (Washington State University, USA), Jesun Firoz (Pacific Northwest National Laboratory, USA), Andrew Lumsdaine (Pacific Northwest National Lab, USA), Cliff Joslyn (Pacific Northwest National Lab, USA), Sinan Aksoy (Pacific Northwest National Lab, USA), Brenda Praggastis (Pacific Northwest National Lab, USA), Assefaw Gebremedhin (Washington State University, USA)

Technical Session 8: Scalable Algorithms and Systems for Data Science – part 3

Asynchronous I/O Strategy for Large-Scale Deep Learning Applications

Sunwoo Lee (Northwestern University, USA), Qiao Kang (Northwestern University, USA), Kewei Wang (Northwestern University, USA), Jan Balewski (National Energy Research Scientific Computing Center, USA), Alex Sim (Lawrence Berkeley National Laboratory, USA), Ankit Agrawal (Northwestern University, USA), Alok Choudhary (Northwestern University, USA), Peter Nugent (Lawrence Berkeley National Laboratory, USA), Kesheng Wu (Lawrence Berkeley National Laboratory, USA), Wei-Keng Liao (Northwestern University, USA)

SYMBIOMON: A High Performance, Composable Monitoring Service

Srinivasan Ramesh (University of Oregon, USA), Robert Ross (Argonne National Laboratory, USA), Matthieu Dorier (Argonne National Laboratory, USA), Allen Malony (University of Oregon, USA), Philip Carns (Argonne National Laboratory, USA), Kevin Huck (University of Oregon, USA)

Load-balancing Parallel I/O of Compressed Hierarchical Layouts

Ke Fan (The University of Alabama at Birmingham, USA), Duong Hoang (The University of Utah, USA), Steve Petruzza (Utah State University, USA), Thomas Gilray (University of Alabama at Birmingham, USA), Valerio Pascucci (The University of Utah, USA), Sidharth Kumar (University of Alabama at Birmingham, USA)

CUDA-DClust+: Revisiting Early GPU-Accelerated DBSCAN Clustering Designs

Madhav Poudel (Northern Arizona University, USA), Michael Gowanlock (Northern Arizona University, USA)

HiPC 2021 Short Papers

(to be presented in a virtual poster session on Saturday, December 18th)

Static Graphs for Coding Productivity in OpenACC

Leonel Toledo (Barcelona Supercomputing Center (BSC), Spain), Pedro Valero-Lara (Oak Ridge National Laboratory, USA), Jeffrey S. Vetter (Oak Ridge National Laboratory, USA), Antonio J. Peña (Barcelona Supercomputing Center (BSC), Spain)

Performance of Local Push Algorithms for Personalized PageRank on Multi-core Platforms

Madhav Aggarwal (University of Southern California, India), Bingyi Zhang (University of Southern California, USA), Viktor Prasanna (University of Southern California, USA)

BEE Orchestrator: Running Complex Scientific Workflows on Multiple Systems

Jake Tronge (Kent State University, USA), Patricia Grubel (Los Alamos National Laboratory, USA), Timothy Randles (Los Alamos National Laboratory, USA), Quincy Wofford (Los Alamos National Laboratory, USA), Rusty Davis (Los Alamos National Laboratory, USA), Steven Anaya (Los Alamos National Laboratory, USA), Qiang Guan (Kent State University, USA)

OpenACC Multi-GPU Approach for WSM6 Microphysics

Hércules Cardoso da Silva (UFMS, Brazil), Marco A. Stefanes (UFMS, Brazil), Vinícius Capistrano (UFMS, Brazil)

Large-Message Nonblocking MPI_Iallgather and MPI_Ibcast Offload via BlueField-2 DPU

Nick Sarkauskas (The Ohio State University, USA), Mohammadreza Bayatpour (The Ohio State University, USA), Tu Tran (The Ohio State University, USA), Bharath Ramesh (The Ohio State University, USA), Hari Subramoni (The Ohio State University, USA), Dhabaleswar Panda (The Ohio State University, USA)

Optimizing Multi-Range based Error-Bounded Lossy Compression for Scientific Datasets

Yuanjian Liu (University of Chicago, USA), Sheng Di (Argonne National Laboratory, USA), Kai Zhao (University of California, UC Riverside, USA), Sian Jin (Washington State University, USA), Cheng Wang (Argonne National Laboratory, USA), Kyle Chard (University of Chicago, USA), Dingwen Tao (Washington State University, USA), Ian Foster (University of Chicago, USA), Franck Cappello (Argonne National Laboratory, USA)

An In-Depth I/O Pattern Analysis in HPC Systems

Jiwoo Bang (Seoul National University, South Korea), Chungyong Kim (Seoul National University, South Korea), Kesheng Wu (Lawrence Berkeley National Laboratory, USA), Alex Sim (Lawrence Berkeley National Laboratory, USA), Suren Byna (Lawrence Berkeley National Laboratory, USA), Hanul Sung (Sangmyung University, South Korea), Hyeonsang Eom (Seoul National University, South Korea)

FaaSter: Accelerated Functions-as-a-Service with Heterogeneous GPUs

Anshuj Garg (Indian Institute of Technology Bombay, India), Sriram Yenamandra (Georgia Institute of Technology, India), Purushottam Kulkarni (Indian Institute of Technology Bombay, Mumbai, India), Umesh Bellur (IIT Bombay, India)

RSP-Hist: Approximate Histograms for Big Data Exploration on Hadoop Clusters

Salman Salloum (Shenzhen University, China), Joshua Zhexue Huang (Shenzhen University, China)

A Programming API Implementation for Secure Data Analytics Applications with Homomorphic Encryption on GPUs

Shuangsheng Lou (The Ohio State University, USA), Gagan Agrawal (Augusta University, USA)

A Fused Inference Design for Pattern-Based Sparse CNN on Edge Devices

Jia Guo (The Ohio State University, USA), Radu Teodorescu (Ohio State University, USA), Gagan Agrawal (Augustaa University, USA)

Cloud-Based Urgent Computing for Forest Fire Spread Prediction under Data Uncertainties

Edigley Fraga (Universitat Autònoma de Barcelona, Brazil), Ana Cortes (Departament d’ Arquitectura de Computadors i Sistemes Operatius. Universitat Autònoma de Barcelona, Spain), Porfidio Hernández (Universitat Autònoma de Barcelona, Spain), Tomas Margalef (Universitat Autònoma de Barcelona, Spain)

Exploring Thread Coarsening on FPGA

Mostafa Eghbali Zarch (North Carolina State University, USA), Reece Neff (North Carolina State University, USA), Michela Becchi (North Carolina State University, USA)

PILOT: a Runtime System to Manage Multi-tenant GPU Unified Memory Footprint

John Ravi (North Carolina State University, USA), Tri Nguyen (North Carolina State University, USA), Huiyang Zhou (North Carolina State University, USA), Michela Becchi (North Carolina State University, USA)

A computational technique for parallel solution of diagonally dominant banded linear systems

S Chandra Sekhara Rao (Indian Institute of Technology Delhi, India), Rabia Kamra (Punjab Engineering College (Deemed to be University), India)

Rev 8 December 2021

For corrections: [email protected]

HiPC 2021 Technical Program

28th IEEE INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING, DATA, & ANALYTICS

TECHNICAL PROGRAM

28^th IEEE INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING, DATA, & ANALYTICS