Skip to main content
Skip to "About government"
Language selection
Français
Government of Canada /
Gouvernement du Canada
Search
Search the website
Search
Menu
Main
Menu
Jobs and the workplace
Immigration and citizenship
Travel and tourism
Business and industry
Benefits
Health
Taxes
Environment and natural resources
National security and defence
Culture, history and sport
Policing, justice and emergencies
Transport and infrastructure
Canada and the world
Money and finances
Science and innovation
You are here:
Canada.ca
Library and Archives Canada
Services
Services for galleries, libraries, archives and museums (GLAMs)
Theses Canada
Item – Theses Canada
Page Content
Item – Theses Canada
OCLC number
1033015333
Link(s) to full text
LAC copy
Author
Inozemtsev, Grigori.
Title
Overlapping Computation and Communication through Offloading in MPI over InfiniBand.
Degree
Queen's University, 2014
Publisher
Kingston : Queen's University, 2014.
Description
1 online resource
Notes
Includes bibliographical references.
Abstract
As the demands of computational science and engineering simulations increase, the size and capabilities of High Performance Computing (HPC) clusters are also expected to grow. Consequently, the software providing the application programming abstractions for the clusters must adapt to meet these demands. Specifically, the increased cost of interprocessor synchronization and communication in larger systems must be accommodated. Non-blocking operations that allow communication latency to be hidden by overlapping it with computation have been proposed to mitigate this problem. In this work, we investigate offloading a portion of the communication processing to dedicated hardware in order to support communication/computation overlap efficiently. We work with the Message Passing Interface (MPI), the de facto standard for parallel programming in HPC environments. We investigate both point-to-point non-blocking communication and collective operations; our work with collectives focuses on the allgather operation. We develop designs for both flat and hierarchical cluster topologies and examine both eager and rendezvous communication protocols. We also develop a generalized primitive operation with the aim of simplifying further research into non-blocking collectives. We propose a new algorithm for the non-blocking allgather collective and implement it using this primitive. The algorithm has constant resource usage even when executing multiple operations simultaneously. We implemented these designs using CORE-Direct offloading support in Mellanox InfiniBand adapters. We present an evaluation of the designs using microbenchmarks and an application kernel that shows that offloaded non-blocking communication operations can provide latency that is comparable to that of their blocking counterparts while allowing most of the duration of the communication to be overlapped with computation and remaining resilient to process arrival and scheduling variations.
Other link(s)
qspace.library.queensu.ca
Subject
MPI.
offloading.
high performance computing.
computer engineering.
InfiniBand.
CORE-Direct.
Date modified:
2022-09-01