«Automatic Mapping of Real Time Radio Astronomy Signal Processing Pipelines onto Heterogeneous Clusters Terry Esther Filiba Electrical Engineering and ...»
Automatic Mapping of Real Time Radio Astronomy
Signal Processing Pipelines onto Heterogeneous
Terry Esther Filiba
Electrical Engineering and Computer Sciences
University of California at Berkeley
Technical Report No. UCB/EECS-2013-147
August 15, 2013
Copyright © 2013, by the author(s).
All rights reserved.
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission.
Automatic Mapping of Real Time Radio Astronomy Signal Processing Pipelines onto Heterogeneous Clusters by Terry Esther Filiba A dissertation submitted in partial satisfaction of the requirements for the degree of Doctor of Philosophy in Engineering – Electrical Engineering and Computer Sciences in the Graduate Division of the University of California, Berkeley
Committee in charge:
Professor John Wawrzynek, Co-chair Daniel Werthimer, Co-chair Professor Jan Rabaey Assistant Professor Aaron Parsons Fall 2013 Automatic Mapping of Real Time Radio Astronomy Signal Processing Pipelines onto Heterogeneous Clusters Copyright 2013 by Terry Esther Filiba Abstract Automatic Mapping of Real Time Radio Astronomy Signal Processing Pipelines onto Heterogeneous Clusters by Terry Esther Filiba Doctor of Philosophy in Engineering – Electrical Engineering and Computer Sciences University of California, Berkeley Professor John Wawrzynek, Co-chair Daniel Werthimer, Co-chair Traditional radio astronomy instrumentation relies on custom built designs, specialized for each science application. Traditional high performance computing (HPC) uses general purpose clusters and tools to parallelize the each algorithm across a cluster. In real time radio astronomy processing, a simple CPU/GPU cluster alone is insuﬃcient to process the data. Instead, digitizing and initial processing of high bandwidth data received from a single antenna is often done in FPGA as it is infeasible to get the data into a single server.
Choosing which platform to use for diﬀerent parts of an instrument is a growing challenge.
With instrument speciﬁcations and platforms constantly changing as technology progresses, the design space for these instruments is unstable and often unpredictable. Furthermore, the astronomers designing these instruments may not be technology experts, and assessing tradeoﬀs between diﬀerent computing architectures, such as FPGAs, GPUs, and ASICs and determining how to partition an instrument can prove diﬃcult. In this work, I present a tool called Optimal Rearrangement of Cluster-based Astronomy Signal Processing, or ORCAS, that automatically determines how to optimally partition an instrument across diﬀerent types of hardware for radio astronomy based on a high level description of the instrument and a set of benchmarks.
In ORCAS, each function in a high level instrument gets proﬁled on diﬀerent architectures. The architectural mapping is then done by an optimization technique called integer linear programming (ILP). The ILP takes the function models as well as the cost model as input and uses them to determine what architecture is best for every function in the instrument. ORCAS judges optimality based on a cost function and generates an instrument design that minimizes the total monetary cost, power utilization, or another user-deﬁned cost.
Acknowledgments This work would not have been possible without the help of the people around me. I am very grateful to my co-advisor Dan Werthimer for his encouragement and mentorship throughout this process. I appreciate all the time and patience he put in to help me get to this point. I am also grateful to my co-advisor Professor John Wawrzynek for his guidance which helped motivate me and helped me develop my own research ideas. I would like to thank the members of my qual and dissertation committees, Professor Aaron Parsons, Professor Jan Rabaey, and Professor Geoﬀ Bower for their time and help shaping this work. I also would like to thank Professor Don Backer for his wisdom and support. Although he is no longer with us, he made a lasting impact on my research and I am very grateful for that.
I also need to thank the members of my research group and my lab. Thanks to the members of CASPER and the BWRC, especially Mark Wagner, Jonathon Kocz, Peter McMahon, Henry Chen, Matt Dexter, Dave MacMahon, and Andrew Siemion. I appreciate the time everyone took teaching me, helping me debug things, and discussing my research.
Finally, I would like to thank the people who looked after me and brightened my time at UC Berkeley. Thanks to my parents and my sister Michelle for always being around whenever I needed them. I thank my boyfriend Ben Schrager and his family who welcomed me into their home and treated me like family. Thanks to my friends Sam Ezell, Rhonda Adato, Leslie Nishiyama, Olivia Nolan, Nat Wharton, Orly Perlstein and Ari Rabkin who kept reminding me that I would eventually ﬁnish. And, I am very grateful for the care I received from my allergist, Dr. James Kong, which made it possible for me to complete this work.
Chapter 1 Introduction Radio astronomers are trying to solve a very diverse set of problems. Asking questions such as “Are we alone,” and “When were the ﬁrst stars and galaxies formed?” and researching galactic structure and formation, the nature of gravitational wave background, the transient universe, black holes, and extrasolar planets.
Naturally, this curiosity leads to the development of larger and higher bandwidth telescopes creating a ﬂood of data. Keeping up with the data requires constant development of new instrumentation
The diversity of problems and telescopes create a number of parameters an engineer needs to worry about while designing an instrument. The instrument be designed based on the algorithm required to process the data, the bandwidth of the data, the number of channels, and the list goes on. Figure 1.1 shows two pictures of two very diﬀerent telescopes, the Very Large Array, or VLA, in Socorro, New Mexico and the Arecibo Telescope.
Traditionally, observatories dealt with this by designing custom instruments that would run on one telescope and solve one problem. This custom approach was the only way to get the requisite processing power to analyze the radio signals, but it resulted in costly designs, because the boards, backplanes, chips, protocols, and software all needed to be designed from scratch. To make matters worse, this approach resulted in a very long design cycle, requiring 5-10 years of development before an instrument could be deployed at a telescope and by the time the instrument was released, the hardware would be out of date.
Due to their custom implementations, these instruments also lacked ﬂexibility. Each instrument was designed speciﬁcally for a single purpose. A hardware upgrade or algorithm modiﬁcation would require a complete redesign of the instrument, and another long design cycle.
While these older designs needed to trade oﬀ ﬂexibility for performance, newer technology can oﬀer both performance and ﬂexibility. Programmable devices such as FPGAs, GPUs and even CPUs can provide enough processing power to keep up with the data from many new telescopes. These devices make it easy to reprogram existing hardware to support newer algorithms, and, since they are programmed using portable languages, provide a quick path to upgrade hardware without redesigning the entire instrument.
With a huge range of technology available, choosing what hardware to use to build an instrument is a challenge. New technology, optimizations, and designs are constantly being developed and with everything constantly changing it’s diﬃcult to know what is best.
1.2 Technological Challenges When building a large instrument, there is a wide variety of technology to choose from.
Unlike many applications, where the goal is to provide the fastest possible implementation, in real time radio astronomy instrumentation there is a performance target. Understanding the tradeoﬀs between diﬀerent implementations is key to cost-eﬀective design.
An instrument designed typically has four choices of hardware for their algorithm implementations. They can use CPUs, GPUs, FPGAs, and ASICs. Figure 1.2 shows each of these as a spectrum from general purpose to custom. The CPUs and, to a lesser extend GPUs, provide a very general purpose design experience, while FPGAs and ASICs require custom designs. This spectrum also can be used to generalize a number of other properties of the presented hardware. Platforms further right require higher design times and design complexity and provide less ﬂexibility in the ﬁnal design. And because the chips will be deployed in relatively low volume, the platforms on the right will also cost more money. The CHAPTER 1. INTRODUCTION
tradeoﬀ is in power and performance. The platforms on the right provide higher performance and lower power consumption.
To better understand these tradeoﬀs, consider the two platforms in the center of the spectrum, the GPU and FPGA. NVIDIA GPUs can be programmed in CUDA. Its C-like structure makes it easy for CPU programmers to pick up and provides a lot of ﬂexibility.
CUDA allows the programmer to use conditional and iterative programming the same way they would use C, easing the transition into GPU programming. FPGAs are programmed using a hardware description language, or HDL, which is less ﬂexible and harder to learn than CUDA. In the FPGA computing model the data is streaming and everything is happening at the same time, requiring specialized programming. On the other hand, FPGAs oﬀer an order of magnitude improvement in performance and power consumption. Each GPU requires hundreds of watts of power while an FPGA can be powered with tens of watts and a GPU can only process hundreds of megahertz of antenna data while an FPGA is capable of processing multiple gigahertz.
Although it may be easy to enumerate the diﬀerences between these platforms, understanding which is best is more diﬃcult. In most cases, a heterogeneous approach is best. The best mix of platforms will ultimately depend on the desired instrument, available technology, as well as the price and power restrictions. This means that a brand new implementation must be designed for every new instrument.
Traditionally, designing an instrument can be a very long process. The astronomer will determine what type of instrument they want to build and what the speciﬁcations should be. The speciﬁcation is handed oﬀ to a computer expert who will determine what platforms need to be used. The computer expert evaluates potential platforms. At best this may take a few hours if the computer expert is familiar with similar designs, but when working with newer technology or larger instruments this can take at least a week, possibly even months, giving the engineer time determine how to make the design ﬁt. Any changes in the design will force the entire design process to iterate again, as was my experience developing part of a pulsar processor. To make matters worse, this process does not guarantee an optimal result.
The engineer may have a bias towards the technology she or he is familiar with. Benchmarks CHAPTER 1. INTRODUCTION for diﬀerent technologies are very diﬃcult to compare, as they represent diﬀerent things, so the engineer might decide to simply check if it works on the preferred technology without testing alternate designs.
1.3 Optimal Rearrangement of Cluster-based Astronomy Signal Processing This dissertation explores an automated approach to instrument design using a tool called Optimal Rearrangement of Cluster-based Astronomy Signal Processing or ORCAS. The major contributions of this work are a toolﬂow that allows radio astronomy experts to design cost-optimal high performance instruments without the aid of a computer expert, the ability to quickly explore diﬀerent designs in the algorithmic design space, and support for disparate benchmarks to ﬁnd an optimal instrument design.
This tool provides cost-optimal mappings in a short amount of time. ORCAS allows the developer to deﬁne the instrument at a high level and a cost function deﬁning some cost the developer is aiming to minimize. The cost function can represent something as simple as the price of the instrument or can be used to represent more complex parameters of the instrument such as design time.
ORCAS assesses the performance of diﬀerent types of technologies in two ways. First, by using benchmarks of existing implementations of instrument building blocks such as FFTs and FIR ﬁlters we can directly assess the performance. Second, if a benchmark is not available, the tool can use a performance model instead, giving an estimate of how the block will perform. This makes it easy to keep up with improving technology and library performance without rewriting the entire tool every time a new library version or board is released.
We evaluate ORCAS by benchmarking existing libraries, such as CUFFT and the CASPER library, and use those benchmarks to produce mappings for 3 types of instruments. Each mapping is compared to existing implementations to understand how this automated approach compares to the old approach of hand optimized mapping.
The remainder of the dissertation is organized as follows. Chapter 2 provides a more in-depth look at the algorithms commonly used in radio astronomy and their applications.
Chapter 3 describes related work, done by myself and others, that preceded this work.
Chapter 4 presents a high level description of the tool and explains how the tool goes from a description of the instrument to a fully mapped algorithm. The algorithm used to map the instrument is fully speciﬁed in chapter 5. Chapter 6 describes three instrument case studies and compares them to existing instruments. And ﬁnally, in Chapter 7, I present my conclusions and some opportunities for future work.