WWW.DISSERTATION.XLIBX.INFO
FREE ELECTRONIC LIBRARY - Dissertations, online materials
 
<< HOME
CONTACTS



Pages:   || 2 | 3 | 4 | 5 |   ...   | 25 |

«by Charles Eric LaForest A thesis submitted in conformity with the requirements for the degree of Doctor of Philosophy Graduate Department of ...»

-- [ Page 1 ] --

High-Speed Soft-Processor Architecture

for FPGA Overlays

by

Charles Eric LaForest

A thesis submitted in conformity with the requirements

for the degree of Doctor of Philosophy

Graduate Department of Electrical and Computer Engineering

University of Toronto

c Copyright 2015 by Charles Eric LaForest

Abstract

High-Speed Soft-Processor Architecture

for FPGA Overlays

Charles Eric LaForest

Doctor of Philosophy

Graduate Department of Electrical and Computer Engineering

University of Toronto 2015 Field-Programmable Gate Arrays (FPGAs) provide an easier path than Application- Specific Integrated Circuits (ASICs) for implementing computing systems, and generally yield higher performance and lower power than optimized software running on high- end CPUs. However, designing hardware with FPGAs remains a difficult and time- consuming process, requiring specialized skills and hours-long CAD processing times.

An easier design process abstracts away the FPGA via an “overlay architecture”, which implements a computing platform upon which we construct the desired system. Soft- processors represent the base case of overlays, allowing easy software-driven design, but at a large cost in performance and area. This thesis addresses the performance limitations of FPGA soft-processors, as building blocks for overlay architectures.

We first aim to maximize the usage of FPGA structures by designing Octavo, a strict round-robin multi-threaded soft-processor architecture tailored to the underlying FPGA and capable of operating at maximal speed. We then scale Octavo to SIMD and MIMD parallelism by replicating its datapath and connecting Octavo cores in a point-to-point mesh. This scaling creates multi-local logic, which we preserve via logical partitioning to avoid artificial critial paths introduced by unnecessary CAD optimizations.

We plan ahead for larger Octavo systems by adding support for architectural ex- tensions, instruction predication, and variable I/O latency. These features improve the ii efficiency of hardware extensions, eliminate busy-wait loops, and provide a building block for more efficient code execution.

By extracting the flow-control and addressing sub-graphs out of a program’s Control- Data Flow Graph (CDFG), we can execute them in parallel with useful computations using special dedicated hardware units, granting better performance than fully unrolled loops without increasing code size.

Finally, we benchmark Octavo against the MXP soft vector processor, the NiosII/f scalar soft-processor, and equivalent benchmark implementations written in C synthesized with the LegUp High-Level Synthesis (HLS) tool, and direct Verilog Hardware Description Language (HDL) implementations. Octavo’s higher clock frequency offsets its higher cycle count, performing roughly on-par with MXP, and within an order of magnitude of the performance of LegUp and Verilog solutions, but with an order of magnitude area penalty.

–  –  –

Since I was a teenager, I’ve always wanted to design a computer. You would not be reading this without the help and influence of many.

Firstly, my wife Joy. Do I need to expound on the love, support, and patience of a spouse over a decade spent in academia, with her own career, while raising a family?

Secondly, to Greg Steffan. I can credit pretty much my entire academic success to his ability to pull me back to the big picture after fruitfully letting me get lost in the details, arguing for my own direction. Back in 2008, he proposed I do my MASc on a seemingly small little problem, which grew into a sub-field of its own after we published, and won us a significant award. Right from the beginning, while he co-supervised my undergraduate BIS thesis in 2006, before he even had tenure, we talked about what kind of computer architectures would work best on an FPGA. The work you are holding right now stems from those conversations, and I think I can say we figured out a good solution. Neither of us knew how far it would all go, and I am sorry he did not get to see it completed.

Jason Anderson stepped up to carry the supervision torch, and gave his time and effort to make sure the LegUp benchmarks were great. I am lucky to have had his support, unflagging enthusiasm, and invaluable thesis-writing feedback.

Guy Lemieux, of UBC, influenced the later stages of this work enormously. The entire work on execution overhead originates from a single conversation we had. Guy always took the time to help and explain things, criticizing honestly, greatly stimulating my thinking. One of his students, Aaron Severance, gave his time and effort also to provide MXP benchmark data, better than any I could have done myself, and patiently explained how the machinery worked. Any mistakes or omissions are mine alone.

I have had the luck of also working with Vaughn Betz and Tarek Abdelrahman on my thesis committee, both bringing contrasting and broad perspectives on my research.

In fact, every ECE and DCS Faculty member I have spoken with have answered my questions and given help freely.

v I have also had the pleasure of great student colleagues, too numerous to mention all, but especially Ruediger Willenberg, Xander Chin, Braiden Brousseau, Robert Hesse, Ming G. Liu, Zimo Li, Tristan O’Rourke, Mihai Burcea, Andrew Shorten, Jason Lu, Henry Wong, Mark Jeffreys, Jasmina Vasiljevic, Ioana Baldini, and Charles Chiasson.





Some roots of this work came from conversations at MIT with Jack Dennis and some of Arvind’s grad students in the summer of 2010. Joel Emer of Intel also provided some early feedback. Octavo’s initial design improved a lot after speaking with Patrick Doyle, Mark Stoodley, and other people of IBM Markham.

I am also indebted to Blair Fort (as a student), Martin Labrecque, and Peter Yiannacouras. Their earlier soft-processor work laid a lot of the groundwork for my own. I am still learning things from it.

I got nothing but first-class IT tech support from Jaro Pristupa, Eugenia Distefano, and YongJoo Lee, who never blinked at all my software installation requests, and saved my data through RAID failures and my errors. Furthermore, Kelly Chan and Beverly Chu were always available for EECG administrative help, as were Judith Levene and Darlene Gorzo of the ECE Grad Office, and Jayne Leake who managed my TA assignments.

A lot of computational horsepower (over 24 CPU-years of it!), was provided by the GPC supercomputer at the SciNet HPC Consortium [83]. My special thanks to ChingHsing Yu and Dr. Daniel Gruner for their enthusiasm at helping me scale my computations, even when I unintentionally broke their system a little sometimes. (“What do you mean, I used up all the file descriptors?!? ”) The folk at Altera have also provided support in many ways: funding, Quartus licenses, support, and hardware. My special thanks to Blair Fort (as Altera staff) for his freely given help.

Further funding came from the ECE Department, the Queen Elizabeth II World Telecommunication Congress Graduate Scholarship in Science and Technology, the Walter C. Sumner Foundation, and NSERC.

If you read this and feel forgotten, my apologies. Call me up and reminisce!

vi “He was alive, trembling ever so slightly with delight, proud that his fear was under control. Then without ceremony he hugged in his forewings, extended his short, angled wingtips, and plunged directly toward the sea. By the time he passed four thousand feet he had reached terminal velocity, the wind was a solid beating wall of sound against which he could move no faster. He was flying now straight down, at two hundred fourteen miles per hour. He swallowed, knowing that if his wings unfolded at that speed he’d be blown into a million tiny shreds of seagull. But the speed was power, and the speed was joy, and the speed was pure beauty.” – “Jonathan Livingston Seagull”, Richard Bach I was born to Rock n’ Roll.

Everything I need.

I was born with the hammer down.

I was built for speed.

– “Built For Speed”, Mot¨rhead o It is by caffeine alone I set my mind in motion.

It is by the beans of Java that thoughts acquire speed, the hands acquire shakes, the shakes become a warning.

It is by caffeine alone I set my mind in motion.

– variation on the “Mentat Mantra” – attributed to Mark Stein, Arisia SF Convention, Boston, ca. 1993, Sunday morning.

Soundtrack by deadmau5.

–  –  –

4.1 Speed/Area/Density of Partitioned Meshes with 32 Datapaths...... 77

4.2 Speed/Area/Density of Partitioned Meshes with 102 Datapaths..... 78

5.1 Octavo’s Instruction Word Format with Extended Write Address Space. 86

6.1 Benchmark cycle count speedup and efficiency improvements........ 118 7.1 3-Step Array Reverse (Reverse-3) Measurements.............. 140

7.2 Sequential Hailstone (Hailstone-S) Measurements............. 141

7.3 Accelerated Hailstone (Hailstone-A) Measurements............ 142

7.4 Structured FSM (FSM-S) Measurements.................. 143 7.5 “Assembly” FSM (FSM-A) Measurements................. 143

7.6 Array Increment (Increment) Measurements................ 145

7.7 Non-Branching Sequential Hailstone (Hailstone-N) Measurements.... 146 7.8 4-Step Array Reverse (Reverse-4) Measurements.............. 147 7.9 8-Tap FIR Measurements........................... 148

8.1 Octavo Fmax on Various Altera Devices................... 163

8.2 Truth table for Empty/Full bit with back-to-back transfers......... 173

–  –  –

6.1 Control-Data Flow Graph (CDFG) of the Hailstone benchmark, along with Flow-Control, Addressing, and Useful Work sub-graphs........... 105

6.2 Address Offset Module implementation.................... 109

6.3 Branch Trigger Module implementation................... 111

6.4 Octavo Block Diagram with I/O Predication, AOM, and BTM...... 113

6.5 Average Octavo Fmax, varying the number of entries per AOM instance and the number of BTM instances...................... 115

–  –  –

E.1 Branch Trigger Module Altered for I/O Predication............ 192 H.1 Simple Floating-Point Number Recognizer................. 208

–  –  –

Introduction We can think of Field-Programmable Arrays (FPGAs) as “Lego-on-a-chip”: a large number of small building blocks which, although not very capable by themselves, we can

assemble into larger, useful digital systems. In a nutshell, these building blocks include:

• small Look-Up Tables (LUTs), which can compute any Boolean function of a few inputs (e.g.: 4 to 6) or act as small memories;

• flip-flops, which hold state;

• various arithmetic circuits, such as adders and multipliers;

• larger memory blocks, for efficient bulk storage;

• I/O blocks, which implement low-level off-chip interfaces;

• and a very large amount of programmable interconnect, which allows us to connect all of these building blocks together.

Rather than implement digital systems directly in Application-Specific Integrated Circuits (ASICs), a much longer and more costly process, we can trade-off area, power, and performance, and instead implement such systems using an FPGA. Despite the tradeoffs relative to ASICs, FPGAs still enable higher-performance and lower power solutions

12 Chapter 1. Introduction

than even highly-optimized software implementations on high-end CPUs, and unlike ASICs, we can re-configure FPGAs as the application evolves. As power limits and the slowing of Moore’s Law (through scaling limits or increasing costs) causes individual CPU performance to level-out after decades of accelerated growth, FPGAs present an avenue for higher performance without all the complications of large-scale parallel programming.

1.1 Motivation

Designing with FPGAs remains a difficult process, if a less costly one than with ASICs, with hours-long CAD processing times after laborious hardware design. At the most basic level, we can use Hardware Description Languages (HDLs) to describe logic fairly directly, but at a low level akin to assembly programming or a stripped-down C language.

Alternately, we can use High-Level Synthesis (HLS) techniques to allow higher-level system descriptions, which greatly eases design implementation, exploration, and verification. However, HLS must still translate the description down to HDL, with the same CAD processing time problems, and produces machine-generated “black box” implementations hard to relate to the original source code.

Finally, we can instead

Abstract

away the FPGA itself by using it to implement a more conventional computing platform, and then implement our desired system onto this “overlay architecture”. For example, a scalar soft-processor (e.g.: Altera’s NiosII) represents the simplest instance of an overlay.

By using overlays, we pay a further price in performance, power, and area, but gain an even faster development cycle, mostly free from FPGA CAD processing time and amenable to high-level software descriptions. Contrary to conventional CPUs, an overlay still benefits from the FPGA’s re-configurability, allowing us to incrementally customize the overlay to our application and recoup lost performance.

3

1.2. Contributions We propose that overlays already provide a better design process for FPGAs, minimizing the use of HDLs and HLS, and maximizing the return for a given development effort.

Which leaves us with the question: How do we improve the performance of overlays?

1.2 Contributions This dissertation focuses on improving the performance of soft-processors on FPGAs, as

building blocks for FPGA overlay architectures. Specifically:

• We construct Octavo, a soft-processor architecture tailored to the underlying FPGA, capable of operating at maximal speed.



Pages:   || 2 | 3 | 4 | 5 |   ...   | 25 |


Similar works:

«Loughborough University Institutional Repository Chemical kinetics modelling study on fuel autoignition in internal combustion engines This item was submitted to Loughborough University's Institutional Repository by the/an author.Additional Information: • A Doctoral Thesis. Submitted in partial fulllment of the requirements for the award of Doctor of Philosophy of Loughborough University. https://dspace.lboro.ac.uk/2134/6533 Metadata Record: c Zhen Liu Publisher: Please cite the published...»

«Forthcoming in Philosophy East & West 57:4 (2007) Language and Ontology in Early Chinese Thought∗ Chris Fraser Department of Philosophy Chinese University of Hong Kong July 2005 (Revised January 2007) Correspondence: Chris Fraser (方克濤) (Assistant Professor) Department of Philosophy Rm. 430, Fung King Hey Bldg. Chinese University of Hong Kong Shatin, N.T., Hong Kong Telephone: 852-9782-0560 Fax: 852-2603-5323 E-mail: cjfraser@cuhk.edu.hk Copyright © 2005, 2007 Brief Summary for Table of...»

«GAS PHASE LASER SYNTHESIS AND PROCESSING OF CALCIUM PHOSPHATE NANOPARTICLES FOR BIOMEDICAL APPLICATIONS by PARIMAL V. BAPAT RENATO P. CAMATA, COMMITTEE CHAIR SUSAN L. BELLIS S. AARON CATLEDGE DERRICK R. DEAN YOGESH K. VOHRA A DISSERTATION Submitted to the graduate faculty of The University of Alabama at Birmingham, in partial fulfillment of the requirements for the degree of Doctor of Philosophy BIRMINGHAM, ALABAMA 2011 GAS PHASE LASER SYNTHESIS AND PROCESSING OF CALCIUM PHOSPHATE NANOPARTICLES...»

«Piedras votivas de Pampacolca Nuevos datos sobre las lajas pintadas del sur del Perú Votivsteine aus Pampacolca Neue Daten über bemalte Steinplatten aus Südperu Inauguraldissertation zur Erlangung des Grades eines Doktors der Philosophie am Fachbereich Geschichtsund Kulturwissenschaften der Freien Universität Berlin vorgelegt von Renata Faron-Bartels M.A. Berlin 2011 1. Gutachter: Prof. Dr. Jürgen Golte 2. Gutachterin: Prof. Dr. Ursula Thiemer-Sachse Tag der Disputation: 10. 11. 2009 2...»

«INVESTIGATION OF OPTICAL LOSS CHANGES IN SILOXANE POLYMER WAVEGUIDES DURING THERMAL CURING AND AGING A Dissertation Presented to The Academic Faculty by Shashikant G. Hegde In Partial Fulfillment of the Requirements for the Degree Doctor of Philosophy in the G.W. Woodruff School of Mechanical Engineering Georgia Institute of Technology April 2008 COPYRIGHT 2008 BY SHASHIKANT G. HEGDE INVESTIGATION OF OPTICAL LOSS CHANGES IN SILOXANE POLYMER WAVEGUIDES DURING THERMAL CURING AND AGING Approved...»

«HIGH PERFORMANCE COMPUTING FOR IRREGULAR ALGORITHMS AND APPLICATIONS WITH AN EMPHASIS ON BIG DATA ANALYTICS A Thesis Presented to The Academic Faculty by Oded Green In Partial Fulfillment of the Requirements for the Degree Doctor of Philosophy in the School of Computational Science and Engineering Georgia Institute of Technology May 2014 Copyright c 2014 by Oded Green HIGH PERFORMANCE COMPUTING FOR IRREGULAR ALGORITHMS AND APPLICATIONS WITH AN EMPHASIS ON BIG DATA ANALYTICS Approved by:...»

«1 27 August 2009 Mereologies as the Grammars of Chemical Discourses ROM HARRÉ AND JEAN-PIERRE LLORED `If you cut a crumb in half do you have two new crumbs or two halves of a crumb?’ John Palmer, quoted in the Sunday Times, 28 June 2009, News Review, p. 16. Since Robert Boyle’s corpuscularian philosophy, chemistry has been a mereological science. Displacing the metaphysics of `continuous substances’ and `qualities’ as the expression of “principles”, chemistry has been built on a...»

«Making Room: The Geography of Rooming House Regulation in Toronto. by Lisa Marie Freeman A thesis submitted in conformity with the requirements for the degree of Doctor of Philosophy Department of Geography University of Toronto © Copyright by Lisa Marie Freeman 2013 Making Room: The Geography of Rooming House Regulation in Toronto Lisa Marie Freeman Doctor of Philosophy Department of Geography University of Toronto 2013 Abstract This dissertation addresses the contemporary moment of the...»

«NEGOTIATING THE BOUNDARY: EXPLORING IDENTITIES DURING ISRAEL EXPERIENCE MIFGASHIM INTERCULTURAL ENCOUNTERS ON EDUCATIONAL PROGRAMS IN ISRAEL Thesis submitted for the degree of “Doctor of Philosophy” by Minna F. Wolf Submitted to the Senate of the Hebrew University of Jerusalem July 2007 NEGOTIATING THE BOUNDARY: EXPLORING IDENTITIES DURING ISRAEL EXPERIENCE MIFGASHIM INTERCULTURAL ENCOUNTERS ON EDUCATIONAL PROGRAMS IN ISRAEL Thesis submitted for the degree of “Doctor of Philosophy” by...»

«TIME DEPENDENT FREEZING OF WATER UNDER MULTIPLE SHOCK WAVE COMPRESSION By DANIEL H. DOLAN III A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy WASHINGTON STATE UNIVERSITY Department of Physics MAY 2003 c Copyright by DANIEL H. DOLAN III, 2003 All Rights Reserved c Copyright by DANIEL H. DOLAN III, 2003 All Rights Reserved To the Faculty of Washington State University: The members of the Committee appointed to examine the dissertation of...»

«EXPERIMENTAL CHARACTERIZATION AND MULTIDISCIPLINARY CONCEPTUAL DESIGN OPTIMIZATION OF A BENDABLE LOAD STIFFENED UNMANNED AIR VEHICLE WING By VIJAY NARAYAN JAGDALE A DISSERTATION PRESENTED TO THE GRADUATE SCHOOL OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY UNIVERSITY OF FLORIDA 2010 1 © 2010 Vijay Narayan Jagdale 2 To my idol and inspiring father, Late Narayan L Jagdale; my caring mother, Kaushalya N. Jagdale; my loving wife,...»

«Anatomy of the Transmastoid Endolymphatic Sac Decompression in the Management of Ménière’s Disease Richard Robert Locke Submitted for the degree of Doctor of Philosophy The University of Glasgow Faculty of Biomedical and Life Sciences April 2008 © Richard R Locke, 2008 ABSTRACT Ménière’s disease affects 1 in 1000 people and produces vertigo and hearing loss (Morrison, 1981). Endolymphatic sac decompression has been advocated on the basis that endolymphatic hydrops is the underlying...»





 
<<  HOME   |    CONTACTS
2016 www.dissertation.xlibx.info - Dissertations, online materials

Materials of this site are available for review, all rights belong to their respective owners.
If you do not agree with the fact that your material is placed on this site, please, email us, we will within 1-2 business days delete him.