WWW.DISSERTATION.XLIBX.INFO
FREE ELECTRONIC LIBRARY - Dissertations, online materials
 
<< HOME
CONTACTS



Pages:   || 2 | 3 | 4 | 5 |   ...   | 21 |

«LOOP OPTIMIZATION TECHNIQUES ON MULTI-ISSUE ARCHITECTURES by Dan Richard Kaiser A dissertation submitted in partial fulfillment of the requirements ...»

-- [ Page 1 ] --

LOOP OPTIMIZATION TECHNIQUES

ON

MULTI-ISSUE ARCHITECTURES

by

Dan Richard Kaiser

A dissertation submitted in partial fulfillment

of the requirements for the degree of

Doctor of Philosophy

(Computer and Communication Sciences)

in The University of Michigan

Doctoral Committee:

Professor Trevor N. Mudge, Chair

Associate Professor Richard B. Brown

Professor Edward S. Davidson Professor Ronald J. Lomax Associate Professor Karem A. Sakallah © Dan Richard Kaiser 1994 All Rights Reserved Dedicated to the memory of Francis Marie Kaiser, 1911-1994.

ii

ACKNOWLEDGMENTS

First and foremost, I would like to thank my advisor Trevor Mudge for his continued support and encouragement. I would also like to thank my committee for their comments and suggestions. Thank you to the students and faculty of the Computer and Communication Sciences Department, where I began my graduate work, and to the students and faculty of the Aurora project. Thanks to my parents for their support during my school years. A special thanks to my family, Pam, Seth and Tadd, for their support and encouragement, and for bearing with me through the long process of finishing my dissertation.

This work was partially supported by the Defense Advanced Research Projects Agency under DARPA/ARO Contract Number DAAL03-90-C-0028, and by Cadence Design Systems, Inc.

iii

TABLE OF CONTENTS

ACKNOWLEDGMENTS

LIST OF TABLES

LIST OF FIGURES

CHAPTER I INTRODUCTION

1 Scheduling

2 Methodology.

3 Research Contributions

4 Thesis Organization

CHAPTER II INSTRUCTION LEVEL PARALLELISM

1 Available Parallelism Analysis

2 Machine Architectures

2.1 VLIW Architectures

2.2 DAE Architectures

2.3 Superscalar Architectures

2.4 Memory System Support

3 Similar Studies

CHAPTER III LOOP OPTIMIZATIONS

1 Loop Unrolling

1.1 An Example of Loop Unrolling

1.2 Loop Unrolling Performance Benefits

2 Trace Scheduling

3 Software Pipelining

3.1 An Example of Software Pipelining

3.2 Software Pipelining Scheduling Methods

3.3 The Performance of Software Pipelining

CHAPTER IV THE STRUCTURE OF THE OPTIMIZING COMPILER TORTOISE

1 The Organization of Tortoise

2 Data Flow Analysis and Transformations

2.1 Canonical Loop Formatting

2.2 Block Flow Graph Reconstruction

2.3 Initial Program Dependence Graph Construction

2.4 Initial Data Flow Analysis

iv

2.5 Data Dependency Graph Optimization

2.6 Constant Propagation

2.7 Loop Invariant Detection

2.8 Induction Variable Detection

2.9 Iteration Distance Computation

2.10 Array Reference Refinement

3 Machine Independent Optimizations

3.1 Loop Invariant Hoisting

3.2 Induction Variable Strength Reduction

3.3 Type Propagation

3.4 Dead Code Elimination

3.5 Summary of Machine Independent Transformations

4 Code Generation

4.1 Instruction Selection

4.2 Instruction Scheduling

4.3 Register Allocation

CHAPTER V EXPERIMENTS AND RESULTS

1 Scheduling a Scalar Architecture

1.1 Register Use

1.2 Code Size

2 Scheduling for Long Operation Latencies

3 Scheduling and Issue Policies

3.1 Aurora III

3.2 Decoupled Execution

3.3 Comparisons with VLIW and DAE

3.4 Aurora III Cache Behavior

4 Cache Effects

4.1 Previous Work

4.2 Cache Performance Effects from Software Pipelining................148

4.3 Cache Behavior with Loop Unrolling

4.4 Context Switch Effects

4.5 Summary of Cache Effects

CHAPTER VI CONCLUSIONS

1 Research Contributions

2 Future Directions

APPENDIX

BIBLIOGRAPHY

v LIST OF TABLES

TABLE 1 Machine Configurations

2 Compiler/Technique Performance on a Scalar Architecture

3 Registers Use vs. Scheduling Technique

4 Scheduling Techniques Performance Ratios

5 Percent Dual Issue under Different Scheduling Models

vi LIST OF FIGURES

FIGURE 1 Source for a vector loop

2 Block diagram of a pipelined scalar processor

3 Block diagram of a VLIW processor

4 BLock diagram of a superscalar architecture

5 Block diagram of a DAE architecture

6 Source for a vector loop

7 The loop body without unrolling

8 The loop body with unrolling

9 Unrolled Loop

10 Loop Efficiency vs. Number of Iterations Unrolled

11 Trace Scheduling Example.

12 Sequential loop execution

13 Pipelined Loop Execution

14 Phases of pipelined loop execution

15 Source for a vector loop

16 The loop body without unrolling.

17 Execution of a few iterations of a loop without unrolling.





18 Compressed execution of a few iterations of the loop

19 A Software Pipeline version of the loop body.

20 The Kernel of the loop body.

21 A Software Pipelined loop body with register expansion

22 Organization of Tortoise

23 Tortoise Analysis and Transformation Phases

24 Canonical Loop Format

25 An extraneous flow dependency

26 Dependency Graph Reconstruction - Flow Dependency

27 Dependencies Involved in Removing Assignment

28 Input CSE Dependency Transformation

29 Program fragment with nested induction variables.

30 A Nested Induction Transformation.

31 Program fragment with rewritten inner induction.

32 Program fragment with rewritten nested inductions.

33 A loop containing a recurrence.

34 Array Reference Load CSE Transformation

vii 35 Tortoise Code Generation Phases

36 A Three Stage Pipeline Schedule

37 Formation of Strongly Connected Components

38 Software Pipeline Realization

39 Multiple Live Register Values in a Software Pipeline

40 Compiler/Technique Speedup on Scalar Processor

41 Registers Use vs. Scheduling Technique

42 Code Size vs. Scheduling Technique

43 Execution Time vs. Increasing FPU Latency

44 Execution Time vs. Increasing FPU Latency (FPU not pipelined)

45 Execution Time vs. Increasing FPU Latency (FPU pipelined)

46 Execution Time Pipelined vs. not Pipelined FPU

47 Execution Time vs. Increasing FPU Pipe Stages (Constant Latency)..............128 48 Scalar Aurora III vs. R3000 w. MIPS CC

49 Scalar Aurora III Double vs. Single Load/Stores

50 Aurora III Cycles vs. I-queue Length

51 Aurora III Stalls vs. I-queue Length

52 Livermore Loop 4 - Occasional Data Dependency

53 Dual Issue Aurora III vs. R3000 w. MIPS CC

54 Dual Issue Scheduling (VLIW Model) vs. R3000 w. MIPS CC

55 Dual Issue Scheduling (Latency Doubling Model) vs. MIPS CC

56 Dual v.s Scalar Issue (VLIW Scheduling Model)

57 Register Use vs. Issue Models with Software Pipelining

58 Superscalar Register Definitions

59 VLIW Register Definitions

60 VLIW vs. Static Superscalar vs. Scheduling Technique

61 Register Use vs. Issue Policy with Software Pipelining

62 DAE vs. Static Superscalar vs. Scheduling Technique

63 Aurora III, VLIW, and DAE vs. Scheduling Technique

64 Percent Time Spent in D-Cache Stalls

65 Percent Time Spent in I-Cache Stalls

66 Code Sizes for the First Fourteen Livermore Loops

67 Execution Times for the First Fourteen Livermore Loops

68 Execution Times for LL 1 vs. Primary Cache Size

69 Execution Times for the First Livermore Loop using Gnu-C

70 Code Size for xlisp

71 xlisp: Cycles vs. Primary Cache Size (Long Latency Mem.)

72 xlisp: Cycles vs. Primary Cache Size (Short Latency Mem.)

73 Cycles Executed for xlisp vs. Unroll Size (Long Latency Mem.)

74 Cycles Executed for xlisp vs. Unroll Size (Short Latency Mem.)

–  –  –

Considerable effort has been put into designing computer architectures which exploit instruction level parallelism in an attempt to achieve execution rates of greater than one instruction per cycle. A wide variety of architectures and accompanying compiler algorithms have been proposed and developed. The best examples have shown good performance improvements relative to scalar architectures constructed in similar technologies.

Much of the experimental work on new architectures has focused on just the hardware architecture, with perhaps one scheduling algorithm designed for the architecture. A new architecture is generally compared to a similar scalar architecture as a reference point. Few experiments have compared different architectures to each other or investigated compiler scheduling algorithms across architectures, because of the difficulty of retargeting the compiler.

This work is a first step towards a direct comparison of different architectures in conjunction with different scheduling algorithms. We compare loop scheduling techniques on several architectures, together with accompanying compiler optimization techniques. In particular, loop optimizations as performed by an optimizing compiler are implemented on a set of multi-issue architectures, allowing the interactions between the loop optimizations and the architectures to be studied.

1 Scheduling Instruction scheduling is the process of determining an execution order for a set of operations. The instruction scheduler accepts a directed graph {V,E} of operations (oi ∈ V) and dependencies between the operations (oi,oj∈ E), and produces an ordered list of operations L = o1, o2,..., on. The ordered list L maintains the dependencies E of the original graph, i.e. if the graph contained a dependency oi,oj∈ E, oi appears in the list before oj. The function computed by the scheduler is shown in (1).

–  –  –

After the ordered list of operations is produced, the code generator will transform the list of operations into a list of instructions which can be executed on a particular architecture.

Instruction scheduling must be correct: the order placed on the set of instructions must maintain the semantics of the original list of instructions. The semantics of the original list is called the program order or in-order semantics. The in-order semantics is dictated by the programming language. The ordering between instructions is determined by the control and data dependencies between the instructions and is encoded as the set of dependencies E in the program graph. This is generally a partial ordering, which allows some leeway for the scheduler to reorder the instructions to improve execution efficiency. For instance, in the program segment shown in Figure 1, statement 3 is dependent on statement 1 and statement 1 must be scheduled prior to statement 3. Statements 1 and 2 are independent and can be scheduled in any order. The relationship between statements 2 and 3 depends on the values of i and j and may be dependent.

–  –  –

Control dependencies are dependencies between conditional instructions and any other instructions whose execution depends on the conditional instructions. In-order semantics does not allow dependent instructions to execute until the conditional instruction has been executed and the result of the condition is known. Since control dependencies are often a significant performance limiting factor, some execution models relax the requirement that the result of the condition be known before the dependent instruction begins execution. This is usually referred to as speculative execution. If speculative execution is allowed, some method must be provided to undo the effects of executing the dependent instructions if the eventual resolution of the condition determines that they should not have been executed.

Data dependencies are formed by the sharing of data and memory locations between instructions. There are four types of dependencies. If A and B are two instructions with A preceding B in program order, then the input and output locations of the two instructions can be denoted by the sets {InA, OutA, InB, and OutB}, respectively.

Furthermore, the possible dependencies between A and B can be defined as follows:

1. Flow dependencies are the locations in InB ∩ OutA.

2. Anti-flow dependencies are the locations in InA ∩ OutB.

3. Output dependencies are the locations in OutA ∩ OutB.

4. Input dependencies are the locations in InA ∩ InB.

In some sense, flow dependencies are the only true dependencies because they express the sharing of data between instructions. Flow dependencies must be honored to obtain correct execution semantics. Anti-flow and output dependencies arise due to sharing memory locations between different instructions. These dependencies can sometimes be removed by renaming memory locations. Input dependencies come from the sharing of memory locations between instructions. The discovery of input dependencies is not important for correctness, they do not impose an execution ordering, but they can be used to improve execution efficiency.

There is one other set of constraints that may be imposed on the instruction execution order: exceptions produced by the execution of instructions should be precise, i.e.

the machine state following the handling of an exception should appear as though any instructions following the exceptional instruction had not executed. This requirement can be quite restrictive. Precise exceptions in effect introduce a control dependency between every instruction which can produce an exception and any following instructions. Implementing this in an aggressively scheduled machine requires hardware support to maintain and restore the correct machine state when an exception is encountered.



Pages:   || 2 | 3 | 4 | 5 |   ...   | 21 |


Similar works:

«Advanced Proton Conducting Polymer Electrolytes for Electrochemical Capacitors by Han Gao A thesis submitted in conformity with the requirements for the degree of Doctor of Philosophy Department of Materials Science & Engineering University of Toronto © Copyright by Han Gao 2015 Advanced Proton Conducting Polymer Electrolytes for Electrochemical Capacitors Han Gao Doctor of Philosophy Department of Materials Science & Engineering University of Toronto Research on solid electrochemical energy...»

«ASKING THE QUESTION: ‘WHAT IS ORGANIZATION?’ Thesis submitted for the degree o f Doctor o f Philosophy at the University o f Leicester by Sverre Spoelstra School of Management University of Leicester December 2006 UMI Number: U219244 All rights reserved INFORMATION TO ALL USERS The quality of this reproduction is dependent upon the quality of the copy submitted. In the unlikely event that the author did not send a complete manuscript and there are missing pages, these will be noted. Also,...»

«EFFECTS OF CHRONIC METHYLMERCURY EXPOSURE ON REPRODUCTIVE SUCCESS, BEHAVIOR, AND STEROID HORMONES OF THE WHITE IBIS (Eudocimus albus) By NILMINI JAYASENA A DISSERTATION PRESENTED TO THE GRADUATE SCHOOL OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY UNIVERSITY OF FLORIDA 2010 1 © 2010 Nilmini Jayasena 2 To my parents 3 ACKNOWLEDGMENTS First and foremost, I wish to extend my sincere gratitude to my major adviser, Dr. Peter Frederick...»

«Stony Brook University The official electronic file of this thesis or dissertation is maintained by the University Libraries on behalf of The Graduate School at Stony Brook University. © Allll Riightts Reserved by Autthor. © A R gh s Reserved by Au hor Behavioral Relevance of Fine Timing in Different Cortical Areas A Dissertation Presented By Yang Yang To The Graduate School In Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy In Neuroscience Stony Brook...»

«KEITH CHIASSON Influence des agonistes dopaminergiques et des facteurs de transcriptions sur la réponse cellulaire face au stress oxydant Thèse présentée à la Faculté des études supérieures de l'Université Laval dans le cadre du programme de doctorat en Neurobiologie pour l'obtention du grade de Philosophiae Doctor (Ph. D.) PROGRAMME DE NEUROBIOLOGIE FACULTÉ DE MÉDECINE UNIVERSITÉ LAVAL QUÉBEC 2008 © Keith Chiasson, 2008 ii AVANT PROPOS Cette thèse vient terminer plusieurs...»

«IMPROVEMENTS FOR CHIP-CHIP INTERCONNECTS AND MEMS PACKAGING THROUGH MATERIALS AND PROCESSING RESEARCH A Dissertation Presented to The Academic Faculty by Erdal Uzunlar In Partial Fulfillment of the Requirements for the Degree Doctor of Philosophy in the School of Chemical & Biomolecular Engineering Georgia Institute of Technology May 2015 COPYRIGHT© 2015 BY ERDAL UZUNLAR IMPROVEMENTS FOR CHIP-CHIP INTERCONNECTS AND MEMS PACKAGING THROUGH MATERIALS AND PROCESSING RESEARCH Approved by: Dr. Paul...»

«The Emergent Holographic Scene Compositions of movement and affect using multiplexed holographic images A thesis submitted in fulfilment of the requirements for the degree of Doctor of Philosophy Martina Mrongovius Bachelor of Applied Science (Applied Physics) School of Architecture + Design Spatial Information Architecture Laboratory RMIT University September 2011 Declaration I certify that except where due acknowledgement has been made, the work is that of the author alone; the work has not...»

«MODELING OF COMPLEX MOLECULES ADSORBED ON COMPLEX COPPER SURFACES A Dissertation Presented to The Academic Faculty by Daniel S. Wei In Partial Fulfillment of the Requirements for the Degree Doctor of Philosophy in the School of Chemical and Biomolecular Engineering Georgia Institute of Technology December 2014 COPYRIGHT 2014 BY DANIEL S. WEI MODELING OF COMPLEX MOLECULES ADSORBED ON COMPLEX COPPER SURFACES Approved by: Dr. David S. Sholl, Advisor Dr. Michael A. Filler School of Chemical &...»

«Model Studies of Proposed Intermediates in Homogeneous Gold(I) Catalysis by Timothy Justin Brown Department of Chemistry Duke University Date:_Approved: _ Ross A. Widenhoefer, Supervisor _ Katherine J. Franz _ Jiyong Hong _ Qiu Wang Dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in the Department of Chemistry in the Graduate School of Duke University i v ABSTRACT Model Studies of Proposed Intermediates in Homogeneous Gold(I) Catalysis by...»

«Symptom Experience Following Lung Cancer Surgery by Kathleen Garrubba Hopkins Bachelors of Science, University of Pittsburgh, 1978 Master of Science, Industrial Engineering, University of Pittsburgh, 1982 Associates Degree, Community College of Allegheny County, 2005 Submitted to the Graduate Faculty of School of Nursing in partial fulfillment of the requirements for the degree of Doctor of Philosophy University of Pittsburgh © 2014 UNIVERSITY OF PITTSBURGH School of Nursing This dissertation...»

«Animosity or Alliance? Identifying the Factors that Promote Black and Latino Electoral Coalitions by Andrea Benjamin A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy (Political Science) in The University of Michigan Doctoral Committee: Professor Vincent L. Hutchings, Chair Professor Nancy Burns Professor Andrei Markovits Assistant Professor Cara Wong, University of Illinois, Urbana Champaign TO MY PARENTS: HERMAN M. AND JOAN M. BENJAMIN...»

«Analyzing Challenges and Opportunities of the Implementation of e-Government Initiatives for Development through the Lens of the Capability Approach: Case Studies from Mozambique By: Gertrudes Adolfo Macueve Submitted as partial fulfilment of the requirements for the degree of Doctor of Philosophy (Ph. D) Faculty of Mathematics and Natural Sciences Department of Informatics University of Oslo Norway August 14, 2008 Dedication To my parents, Deolinda Noé and Adolfo A. Macueve TABLE OF CONTENTS...»





 
<<  HOME   |    CONTACTS
2016 www.dissertation.xlibx.info - Dissertations, online materials

Materials of this site are available for review, all rights belong to their respective owners.
If you do not agree with the fact that your material is placed on this site, please, email us, we will within 1-2 business days delete him.