«ABSTRACT Our group seeks to revolutionize the development of radio astronomy signal processing instrumentation by designing and demonstrating a ...»
A New Approach to Radio Astronomy Signal Processing:
Packet Switched, FPGA-based, Upgradeable, Modular Hardware
and Reusable, Platform-Independent Signal Processing Libraries
Don Backer, Chen Chang, Daniel Chapman, Henry Chen2, Pierre Droz2,4, Christina de Jesus,
David MacMahon3, Andrew Siemion, John Wawrzynek5, Dan Werthimer2, Mel Wright3
Space Sciences Laboratory,
Radio Astronomy Laboratory
Berkeley Wireless Research Center Dept. of Electrical Engineering and Compute r Sciences University of California, Berkeley ABSTRACT Our group seeks to revolutionize the development of radio astronomy signal processing instrumentation by designing and demonstrating a scalable, upgradeable, FPGA-based computing platform and software design methodology that targets a range of real-time radio telescope signal processing applications. This project relies on the development of a small number of modular, connectible, upgradeable hardware components and platformindependent signal processing algorithms and libraries which can be reused and scaled as hardware capabilities expand. We have developed such a hardware platform and many of the necessary signal processing libraries for applications in antenna array correlation, wide-band spectroscopy, and pulsar surveys. We present this platform and two applications we have developed for it as demonstrations of the technology. We also identify future directions for the development of this platform, such as packetization, RFI rejection libraries, and real-time imaging.
1. Introduction Existing radio astronomy instrumentation is highly specialized, with custom, complex, dedicated instruments being built for individual applications. Each instrument takes 3-5 years to design, construct, and debug, and by the time it is deployed, it has usually been made obsolete by the Moore’s Law growth of the electronics industry. This development cycle could be shortened by taking advantage of commodity hardware and developing signal processing libraries which are device independent. There is a growing trend in radio astronomy for high-performance real-time DSP applications such as beam forming, spatial correlation, and wideband, fine-resolution spectroscopy. The next generation of radio telescopes (eg: Allen Telescope Array (ATA), the Combined Array for Research in Millimeter-wave Astronomy (CARMA), the next generation Epoch of Reionization (EoR) array, and the Square Kilometer Array (SKA)), are being designed and built using large numbers of small antennas. One of the most computationally difficult problems in radio astronomy instrumentation is a real-time imaging system for very large arrays, with computation time scaling as O(N2).
Applications requiring several gigahertz of continuous RF bandwidth over hundreds of physical antennas require peta-operations per second. Such computational requirements are far beyond the capabilities of the general purpose computing clusters which have traditionally been the commodity solution to radio astronomy signal processing. Because of their iteration-based, inherently non-parallel architectures, CPUs can only process a bandwidth equal to their clock rate divided by the number of operations per sample. For computationally intensive applications, this number is low, even for multi-gigahertz processors.
Field-Programmable Gate Arrays (FPGAs) are large-scale configurable logic devices with commercial applications that promote commodity pricing. These devices are the keystone technology for developing flexible signal processing hardware. Their reprogrammability and flexibility places them in a middle ground between custom hardware and flexible software: gateware. The data-flow processing nature of DSP algorithms matches the stream-based computation model commonly used on FPGAs, with throughput locked to the system clock rate. These elements can provide over 10 times more computing throughput than a DSP-based system with similar power consumption and cost, and over 100 times that of a microprocessor-based system, as a result of the disparity between the inherently sequential execution model of microprocessors and the spatially parallel execution model of a hardware implementation. Furthermore, because of their simple hardware structure, FPGAs scale naturally with each successive generation of silicon processing technology, resulting in the fact that FPGAs are on a faster Moore’s Law track than CPUs. Based on current projections for FPGA computing technology, the SKA computational requirement (on the order of 100 peta-operations per second) will be feasible by 2009, and implementable by 2011, with an estimated cost of $20 million USD per 800 MHz IF channel (Chang et al. 2005).
FPGAs may be the answer for creating DSP hardware with the flexibility to be widely adopted in radio astronomy instrumentation, but Moore’s Law growth still dictates that hardware will need to be redesigned every few years. A solution is needed which minimizes the effort of redesign--a solution which minimizes the number of hardware modules which must be redesigned, and abstracts algorithms from hardware so that changing hardware affects only broad-scale implementation choices, not algorithm selection.
Hardware modularity requires that a small number of components with consistent interfaces be connectible with an arbitrary number of identical components to meet the computing needs of an application (“computing by the yard”), and that upgrading/revising a component does not change the way in which components are combined in the system. A modular system architecture can provide orders of magnitude reduction in overall cost and design time, and will closely track the early adoption of state-of-the-art IC fabrication by FPGA vendors. The Berkeley Emulation Engine (BEE2) system is one of the first attempts at providing a scalable, modular, economic solution for high-performance radio telescope DSP applications (Chang et al. 2005). Originally designed for high-end reconfigurable computing applications such as DSP and ASIC design, the BEE2 has been conscripted for radio astronomy applications in a collaboration between the Berkeley Wireless Research Center (BWRC), the UC Berkeley Radio Astronomy Laboratory, and the UC Berkeley SETI group.
The BEE2 system consists of three hardware modules developed by graduate students Chen Chang and Pierre Droz of BWRC: the main BEE2 processing board, a high-speed ADC board for data digitization, and an IBOB for high-speed serial communication between the two boards.
The BEE2 board (see Figure 1), intended to be the primary processing engine, integrates 500 Gops/sec of computational power with high-speed I/Os. Its compute power is provided by its 5 Xilinx XC2VP70 VirtexII Pro FPGAs, each containing 384 18x18-bit multipliers, two PowerPC CPU cores running Linux, and over 74,000 configurable logic cells. In addition, each FPGA can be connected to up to 4 GB of DDR2-SDRAM.
External interfaces are available through 10 Gbps ethernet connections, as well as 100 Mbps ethernet and RS232 serial ports.
IBOB (Internet BreakOut Board, see Figure 2) boards are primarily responsible for packetizing ADC data onto the ethernet protocol. Each board provides two connectors for I/O card attachment, and two 10 Gbps ethernet connectors for interfacing to BEE2 boards. Data packetization and serialization is performed by a Xilinx XC2VP50 FPGA which provides 232 18x18-bit multipliers, two PowerPC CPU cores, and over 53,000 logic cells. This FPGA, along with 36 Mbit of on-board ZBT SRAM, allows the IBOB to perform a significant amount of data preprocessing before handing off to the BEE2.
An ADC board (see Figure 3) was designed to mate directly to an IBOB board for high-speed serial data I/O. Analog data is digitized using an Atmel AT84AD001B dual 8-bit 1 Gsample/sec ADC chip, and can digitize two streams at 1 Gsample/sec or a single stream at 2 Gsample/sec. This board may be driven with either single-ended or differential inputs.
Communication between hardware modules takes place over standard 10 Gbit ethernet protocol, allowing for the eventual integration of commercial switches and processors. Any of these boards may be upgraded separately to use the latest FPGA chips, and all of them may be upgraded together to take advantage of advancements in inter-board communication. These three boards may be combined to provide ample computational resources for any radio astronomy application: spectral analysis, antenna correlation, band extraction, and back-end analysis. This modular hardware platform provides astronomers with the ability to connect as many boards as necessary to meet the needs of their application.
cannot be realized without a set of reusable libraries for quickly implementing signal processing algorithms in FPGAs. These libraries and their underlying algorithms must be abstracted from the hardware involved in order to support changes and upgrades in hardware technology, and to be of independent use to the reconfigurable computing community. The viability of developing such libraries has been already demonstrated: several of the original libraries we developed were targeted for the SERENDIP V board we designed as a first implementation of a multipurpose, FPGA-based signal processing engine for radio astronomy (http://seti.berkeley.edu/casper).
This board and associated libraries have proven useful in several applications at Arecibo (Heiles et al., 2005, http://seti.berkeley.edu/galfa), Nancay (Backer et al., 2005), for prototype antenna arrays (Backer and Bradley, 2005), and others. Most importantly, the libraries developed for these applications, which include designs for Polyphase Filter Banks (PFBs), Fast Fourier Transforms (FFTs), accumulators, digital mixers, digital oscillators, quadrature baseband down-converters, and FIR filters, were able to be ported to the new BEE2 architecture without modification.
An important tool which has helped make it possible to write reusable gateware libraries is the Xilinx System Generator package for the Mathworks Simulink language. Much as a C compiler translates platforminvariant ASCII code into processor-specific byte-code, Simulink translates designs written using a standard set of FPGA components into chip-specific VHDL or Verilog that is synthesized into a final chip configuration.
Using Simulink and Xilinx System Generator, we
the physical FPGA fabric into a set of parameterizable library blocks for implementing signal processing algorithms, and for interfacing with abstracted hardwarespecific components such as ADCs, DRAM, and other FPGAs.
In order for the signal processing algorithms we develop to be useful in a variety of applications, it is important that they be parameterized such that they be customizable for size, behavior, and speed. This requirement adds complexity to the initial design of these libraries, but dramatically enhances their applicability and potential for longevity as hardware evolves; it is important that algorithms be expandable to take advantage of the inevitable increase in chip resources. This design principle has an added feature that it decreases the time necessary for testing by allowing one to debug scale models of systems which are behaviorally identical to the larger systems and are derived from the same parameterization code. This feature has been invaluable in speeding the deployment of our various demonstrator projects.
We have found it useful and convenient to establish a general architectural consistency between the gateware libraries we have designed. Firstly, all modules operate on a vector-warning architecture whereby a single bit signal is passed along with a data stream, and is active the clock-cycle before the first valid data appears on that stream. This enables library components to phase themselves correctly to the data stream, and removes any effect pipeline delays might have on downstream modules, effectively allowing modules to be swapped in to and out of designs without affecting other modules. Secondly, data samples are interpreted as 2’s compliment, fixed-point numbers in the range [-1,1). Modules which enable the magnitude of samples to grow (such as the FFT), have selectable down-shifting or overflow detection to prevent bit growth. In addition to libraries implementing digital mixers, oscillators, baseband down-converters, decimating FIR filters, matrix transpositions, and accumulators, we have written three libraries which will be instrumental in implementing the next generation of spectrometers and correlators.
The first parameterized library we developed under Simulink was for an FFT more efficient than those commercially available in order to gain higher spectral resolution for a SETI spectrometer. To do this, we implemented a radix-2 biplex pipelined FFT (Rabiner and Gold, 1975, see Figure 4) capable of analyzing two independent, complex data streams or four real data streams simultaneously using one quarter the FPGA resources of commercial designs (Dick, 2000). Besides its increased efficiency, this design is superior in its ability to analyze the full input bandwidth at the quiescent clock rate without need of a period of off-line computation during which input samples are not accepted. Instead, this fully pipelined architecture operates at the full clock rate from stage to stage without needing extra buffering. Additional features which we have developed for this library are modules which use in-place buffering to unscramble the standard bit-reversed output order of the spectral data, and modules for extracting two real FFTs from a single complex FFT using Hermitian conjugation and interpolation. Although this library was developed primarily to be coupled with the Polyphase Filter Bank (PFB) library we discuss next, it has been used in stand-alone applications for fineresolution spectroscopy applications such as the 128 million channel SETI spectrometer discussed in the next section of this paper.