Workshop on Data Fabric for Hybrid Clouds (WDFHC)

Co-located with the 29th IEEE INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING, DATA, & ANALYTICS (HiPC)

Description

A large number of organizations have adopted the hybrid-cloud paradigm to optimize business processes. Hybrid clouds span public and private clouds, different public cloud providers as well as edge and cloud resources. A hybrid cloud architecture enables enterprises to scale computing resources, optimize cost and use the best in the class tools. However, it also poses a set of challenges around data discovery, access, governance, usage and quality. The distributed nature of the data makes it difficult for application builders to find, retrieve and use the relevant data. This is especially the case for data-rich machine learning (ML) and artificial intelligence (AI) workloads that are gaining dominance within enterprises.

Data fabric (or a data mesh) is a data management design to make data curation and access transparent in a distributed environment, and alleviate these challenges. A data fabric enables access to the right data to authorized users in an interoperable format while respecting the regulatory and governance constructs. It continuously links enterprise-wide data through their active and passive metadata from different databases, data lakes and warehouses; orchestrates data placement and integration; and uses semantic inferencing and knowledge graphs to support decision-making by ML and real-time applications.

Gartner places Data Fabrics at the peak of the emerging technologies hype cycle for 2021, and as such it is an evolving concept. In this workshop, we aim to get academia and industry researchers together to share their perspectives on challenges to data management in a hybrid cloud environment, and the role of data fabrics in address these challenges. We propose to explore questions such as: (1) What are the requirements from Data Fabrics to operate in a multi-cloud ecosystem? (2) What are the design principles for Data Fabrics to meet their objectives? (3) What are the core components needed for Data Fabric to support AI/ML applications? We expect participants to share research ideas and practical use-cases/experiences to address these questions.

Accepted Papers

David Demicco, Matthew Cole, Shengdun Wang and Aravind Prakash. A Security Analysis of Labeling-Based Control-Flow Integrity Schemes
Avinash Maurya, Jaiaid Mobin and M. Mustafa Rafique. Towards Data Gravity and Compliance Aware Distributed Deep Learning on Hybrid Clouds
Manu Awasthi. Verifiable and Practical Compliance for Data Privacy Laws

Workshop Schedule

2-215PM: Welcome by Chairs

215-3PM: Invited Talk

From Container to Multicloud – Meeting the Data Management Challenges!

Speaker: Sanil Kumar D., SODA Foundation Open Source and Huawei Technologies

Abstract: We will explore the current state and trend on container and multicloud. Based on this trend, what are the key technical areas we are focusing on at SODA Foundation through its open source projects and initiatives? We will also discuss high level architecture and plan for these projects. What are the ways, the developers and academia could collaborate with these open source projects and SODA Foundation for data management technology research and development?

Bio: Sanil is Chief Architect at Huawei Technologies India and Co-Chair, Architecture Lead at SODA Foundation. Sanil has 22+ years of product development expertise in Linux, Open Source, ARM Ecosystem, Cloud, Data Management, Edge Computing, Blockchain and Distributed Computing. He incubated, maintains and supports multiple open source projects like KubeEdge, SODA Projects & Centaurus). He has multiple patents, technical publications, & speaking sessions at international conferences. He mentors MNCs and young developers in their open source journey. He is a governing board member of CCICI and works with industry organizations like IEEE, Linux Foundation, SNIA, OSI and OTF.

He believes that open source is a great avenue for young developers to global technology research.

3-330pm: Verifiable and Practical Compliance for Data Privacy Laws, Manu Awasthi

330-345pm: BREAK

345-415PM: A Security Analysis of Labeling-Based Control-Flow Integrity Schemes, David Demicco, Matthew Cole, Shengdun Wang and Aravind Prakash

415-445PM: Towards Data Gravity and Compliance Aware Distributed Deep Learning on Hybrid Clouds, Avinash Maurya, Jaiaid Mobin and M. Mustafa Rafique

445-545PM: Closing Keynote (Virtual)

Speaker: Prof. DK Panda, Ohio State University

Title: Designing Middleware for HPC and AI in Hybrid Clouds: Challenges and Opportunities

Abstract: The field of computing has been evolving over the years with the needs for High-Performance Computing (HPC), Deep Learning (DL), Machine Learning (ML) on heterogenous architectures. The modern computing environments are being spread over edges to clouds/HPC centers. This talk will focus on challenges and opportunities on designing HPC and AI middleware on these systems with both scale-up and scale-out strategies. The first part of the talk will focus two broad approaches/solutions to address these challenges: 1) High-Performance MVAPICH, DL/ML, and Bigdata software for the on-premise clusters as well as three major cloud providers (Azure, AWS, and Oracle) and 2) E4S-based framework to run unified versions in hybrid cloud environment. Next, we will provide an Overview of the new NSF-AI Institute (ICICLE, http://icicle.ai) for developing next-generation cyberinfrastructure for computing continuum (edges to clouds/HPC centers). Finally, we will show examples of how middleware can be designed to run applications from the Digital Agriculture area (one of the use-inspired science cases in the ICICLE institute) to run from edges to clouds.

Bio: DK Panda is a Professor and University Distinguished Scholar of Computer Science and Engineering at the Ohio State University. He has published over 500 papers in high-end computing and networking. The MVAPICH2 (High Performance MPI and PGAS over InfiniBand, Omni-Path, Slingshot, iWARP and RoCE) libraries, designed and developed by his research group (http://mvapich.cse.ohio-state.edu), are currently being used by more than 3,290 organizations worldwide (in 90 countries). More than 1.64M downloads of this software have taken place from the project’s site. This software is empowering several InfiniBand clusters (including the 7th, 19th, 34th, and 46th ranked ones) in the TOP500 list. MPI-driven solutions for providing high-performance and scalable deep learning for TensorFlow and PyTorch frameworks are available from http://hidl.cse.ohio-state.edu. Solutions to accelerate Big Data applications are available from http://hibd.cse.ohio-state.edu. Prof. Panda leads one of the recently funded NSF AI Institutes – ICICLE (https://icicle.osu.edu) to design intelligent cyberinfrastructure for next-generation systems. Prof. Panda is an IEEE Fellow and a recipient of the 2022 IEEE Charles Babbage Award. More details about Prof. Panda are available at http://www.cse.ohio-state.edu/~panda.

545-6PM: Workshop Closing

Deadlines

Abstract Submission Deadline: September 23, 2022 (optional)
Paper Submission Deadline: ~~September 30, 2022~~ October 15, 2022 (updated)
Paper Notification: November 5, 2022
Camera-ready Deadline: November 15, 2022

Organizers

General Chairs

Yogesh Simmhan, Indian Institute of Science, Bangalore ([email protected])
Sameep Mehta, IBM Research India ([email protected])

Program Chairs

Kalapriya Kannan, HPE ([email protected])
Subhajit Sidhanta, IIT Bhilai ([email protected])

Program Committee Members

Soumajit Pramanik, IIT Bhilai
Sushanta Karmakar, IIT Guwahati
Nirnay Ghosh, IIEST Shibpur
Madhumita Mallick, Huawei Technologies India
U Das, Rubrik, Inc.
Preeti Syal, Amazon Web Services
Pranay Lohia, Microsoft
Chiranjib Sur, Shell
Bivas Mitra, IIT Kharagpur
Arghya Kusum Das, University of Alaska Fairbanks
Sayan Goswami, Ahmedabad University
Vishwesh Jatala, IIT Bhilai