WWW.DISSERTATION.XLIBX.INFO
FREE ELECTRONIC LIBRARY - Dissertations, online materials
 
<< HOME
CONTACTS



Pages:   || 2 | 3 | 4 | 5 |   ...   | 21 |

«LOOP OPTIMIZATION TECHNIQUES ON MULTI-ISSUE ARCHITECTURES by Dan Richard Kaiser A dissertation submitted in partial fulfillment of the requirements ...»

-- [ Page 1 ] --

LOOP OPTIMIZATION TECHNIQUES

ON

MULTI-ISSUE ARCHITECTURES

by

Dan Richard Kaiser

A dissertation submitted in partial fulfillment

of the requirements for the degree of

Doctor of Philosophy

(Computer and Communication Sciences)

in The University of Michigan

Doctoral Committee:

Professor Trevor N. Mudge, Chair

Associate Professor Richard B. Brown

Professor Edward S. Davidson Professor Ronald J. Lomax Associate Professor Karem A. Sakallah © Dan Richard Kaiser 1994 All Rights Reserved Dedicated to the memory of Francis Marie Kaiser, 1911-1994.

ii

ACKNOWLEDGMENTS

First and foremost, I would like to thank my advisor Trevor Mudge for his continued support and encouragement. I would also like to thank my committee for their comments and suggestions. Thank you to the students and faculty of the Computer and Communication Sciences Department, where I began my graduate work, and to the students and faculty of the Aurora project. Thanks to my parents for their support during my school years. A special thanks to my family, Pam, Seth and Tadd, for their support and encouragement, and for bearing with me through the long process of finishing my dissertation.

This work was partially supported by the Defense Advanced Research Projects Agency under DARPA/ARO Contract Number DAAL03-90-C-0028, and by Cadence Design Systems, Inc.

iii

TABLE OF CONTENTS

ACKNOWLEDGMENTS

LIST OF TABLES

LIST OF FIGURES

CHAPTER I INTRODUCTION

1 Scheduling

2 Methodology.

3 Research Contributions

4 Thesis Organization

CHAPTER II INSTRUCTION LEVEL PARALLELISM

1 Available Parallelism Analysis

2 Machine Architectures

2.1 VLIW Architectures

2.2 DAE Architectures

2.3 Superscalar Architectures

2.4 Memory System Support

3 Similar Studies

CHAPTER III LOOP OPTIMIZATIONS

1 Loop Unrolling

1.1 An Example of Loop Unrolling

1.2 Loop Unrolling Performance Benefits

2 Trace Scheduling

3 Software Pipelining

3.1 An Example of Software Pipelining

3.2 Software Pipelining Scheduling Methods

3.3 The Performance of Software Pipelining

CHAPTER IV THE STRUCTURE OF THE OPTIMIZING COMPILER TORTOISE

1 The Organization of Tortoise

2 Data Flow Analysis and Transformations

2.1 Canonical Loop Formatting

2.2 Block Flow Graph Reconstruction

2.3 Initial Program Dependence Graph Construction

2.4 Initial Data Flow Analysis

iv

2.5 Data Dependency Graph Optimization

2.6 Constant Propagation

2.7 Loop Invariant Detection

2.8 Induction Variable Detection

2.9 Iteration Distance Computation

2.10 Array Reference Refinement

3 Machine Independent Optimizations

3.1 Loop Invariant Hoisting

3.2 Induction Variable Strength Reduction

3.3 Type Propagation

3.4 Dead Code Elimination

3.5 Summary of Machine Independent Transformations

4 Code Generation

4.1 Instruction Selection

4.2 Instruction Scheduling

4.3 Register Allocation

CHAPTER V EXPERIMENTS AND RESULTS

1 Scheduling a Scalar Architecture

1.1 Register Use

1.2 Code Size

2 Scheduling for Long Operation Latencies

3 Scheduling and Issue Policies

3.1 Aurora III

3.2 Decoupled Execution

3.3 Comparisons with VLIW and DAE

3.4 Aurora III Cache Behavior

4 Cache Effects

4.1 Previous Work

4.2 Cache Performance Effects from Software Pipelining................148

4.3 Cache Behavior with Loop Unrolling

4.4 Context Switch Effects

4.5 Summary of Cache Effects

CHAPTER VI CONCLUSIONS

1 Research Contributions

2 Future Directions

APPENDIX

BIBLIOGRAPHY

v LIST OF TABLES

TABLE 1 Machine Configurations

2 Compiler/Technique Performance on a Scalar Architecture

3 Registers Use vs. Scheduling Technique

4 Scheduling Techniques Performance Ratios

5 Percent Dual Issue under Different Scheduling Models

vi LIST OF FIGURES

FIGURE 1 Source for a vector loop

2 Block diagram of a pipelined scalar processor

3 Block diagram of a VLIW processor

4 BLock diagram of a superscalar architecture

5 Block diagram of a DAE architecture

6 Source for a vector loop

7 The loop body without unrolling

8 The loop body with unrolling

9 Unrolled Loop

10 Loop Efficiency vs. Number of Iterations Unrolled

11 Trace Scheduling Example.

12 Sequential loop execution

13 Pipelined Loop Execution

14 Phases of pipelined loop execution

15 Source for a vector loop

16 The loop body without unrolling.

17 Execution of a few iterations of a loop without unrolling.





18 Compressed execution of a few iterations of the loop

19 A Software Pipeline version of the loop body.

20 The Kernel of the loop body.

21 A Software Pipelined loop body with register expansion

22 Organization of Tortoise

23 Tortoise Analysis and Transformation Phases

24 Canonical Loop Format

25 An extraneous flow dependency

26 Dependency Graph Reconstruction - Flow Dependency

27 Dependencies Involved in Removing Assignment

28 Input CSE Dependency Transformation

29 Program fragment with nested induction variables.

30 A Nested Induction Transformation.

31 Program fragment with rewritten inner induction.

32 Program fragment with rewritten nested inductions.

33 A loop containing a recurrence.

34 Array Reference Load CSE Transformation

vii 35 Tortoise Code Generation Phases

36 A Three Stage Pipeline Schedule

37 Formation of Strongly Connected Components

38 Software Pipeline Realization

39 Multiple Live Register Values in a Software Pipeline

40 Compiler/Technique Speedup on Scalar Processor

41 Registers Use vs. Scheduling Technique

42 Code Size vs. Scheduling Technique

43 Execution Time vs. Increasing FPU Latency

44 Execution Time vs. Increasing FPU Latency (FPU not pipelined)

45 Execution Time vs. Increasing FPU Latency (FPU pipelined)

46 Execution Time Pipelined vs. not Pipelined FPU

47 Execution Time vs. Increasing FPU Pipe Stages (Constant Latency)..............128 48 Scalar Aurora III vs. R3000 w. MIPS CC

49 Scalar Aurora III Double vs. Single Load/Stores

50 Aurora III Cycles vs. I-queue Length

51 Aurora III Stalls vs. I-queue Length

52 Livermore Loop 4 - Occasional Data Dependency

53 Dual Issue Aurora III vs. R3000 w. MIPS CC

54 Dual Issue Scheduling (VLIW Model) vs. R3000 w. MIPS CC

55 Dual Issue Scheduling (Latency Doubling Model) vs. MIPS CC

56 Dual v.s Scalar Issue (VLIW Scheduling Model)

57 Register Use vs. Issue Models with Software Pipelining

58 Superscalar Register Definitions

59 VLIW Register Definitions

60 VLIW vs. Static Superscalar vs. Scheduling Technique

61 Register Use vs. Issue Policy with Software Pipelining

62 DAE vs. Static Superscalar vs. Scheduling Technique

63 Aurora III, VLIW, and DAE vs. Scheduling Technique

64 Percent Time Spent in D-Cache Stalls

65 Percent Time Spent in I-Cache Stalls

66 Code Sizes for the First Fourteen Livermore Loops

67 Execution Times for the First Fourteen Livermore Loops

68 Execution Times for LL 1 vs. Primary Cache Size

69 Execution Times for the First Livermore Loop using Gnu-C

70 Code Size for xlisp

71 xlisp: Cycles vs. Primary Cache Size (Long Latency Mem.)

72 xlisp: Cycles vs. Primary Cache Size (Short Latency Mem.)

73 Cycles Executed for xlisp vs. Unroll Size (Long Latency Mem.)

74 Cycles Executed for xlisp vs. Unroll Size (Short Latency Mem.)

–  –  –

Considerable effort has been put into designing computer architectures which exploit instruction level parallelism in an attempt to achieve execution rates of greater than one instruction per cycle. A wide variety of architectures and accompanying compiler algorithms have been proposed and developed. The best examples have shown good performance improvements relative to scalar architectures constructed in similar technologies.

Much of the experimental work on new architectures has focused on just the hardware architecture, with perhaps one scheduling algorithm designed for the architecture. A new architecture is generally compared to a similar scalar architecture as a reference point. Few experiments have compared different architectures to each other or investigated compiler scheduling algorithms across architectures, because of the difficulty of retargeting the compiler.

This work is a first step towards a direct comparison of different architectures in conjunction with different scheduling algorithms. We compare loop scheduling techniques on several architectures, together with accompanying compiler optimization techniques. In particular, loop optimizations as performed by an optimizing compiler are implemented on a set of multi-issue architectures, allowing the interactions between the loop optimizations and the architectures to be studied.

1 Scheduling Instruction scheduling is the process of determining an execution order for a set of operations. The instruction scheduler accepts a directed graph {V,E} of operations (oi ∈ V) and dependencies between the operations (oi,oj∈ E), and produces an ordered list of operations L = o1, o2,..., on. The ordered list L maintains the dependencies E of the original graph, i.e. if the graph contained a dependency oi,oj∈ E, oi appears in the list before oj. The function computed by the scheduler is shown in (1).

–  –  –

After the ordered list of operations is produced, the code generator will transform the list of operations into a list of instructions which can be executed on a particular architecture.

Instruction scheduling must be correct: the order placed on the set of instructions must maintain the semantics of the original list of instructions. The semantics of the original list is called the program order or in-order semantics. The in-order semantics is dictated by the programming language. The ordering between instructions is determined by the control and data dependencies between the instructions and is encoded as the set of dependencies E in the program graph. This is generally a partial ordering, which allows some leeway for the scheduler to reorder the instructions to improve execution efficiency. For instance, in the program segment shown in Figure 1, statement 3 is dependent on statement 1 and statement 1 must be scheduled prior to statement 3. Statements 1 and 2 are independent and can be scheduled in any order. The relationship between statements 2 and 3 depends on the values of i and j and may be dependent.

–  –  –

Control dependencies are dependencies between conditional instructions and any other instructions whose execution depends on the conditional instructions. In-order semantics does not allow dependent instructions to execute until the conditional instruction has been executed and the result of the condition is known. Since control dependencies are often a significant performance limiting factor, some execution models relax the requirement that the result of the condition be known before the dependent instruction begins execution. This is usually referred to as speculative execution. If speculative execution is allowed, some method must be provided to undo the effects of executing the dependent instructions if the eventual resolution of the condition determines that they should not have been executed.

Data dependencies are formed by the sharing of data and memory locations between instructions. There are four types of dependencies. If A and B are two instructions with A preceding B in program order, then the input and output locations of the two instructions can be denoted by the sets {InA, OutA, InB, and OutB}, respectively.

Furthermore, the possible dependencies between A and B can be defined as follows:

1. Flow dependencies are the locations in InB ∩ OutA.

2. Anti-flow dependencies are the locations in InA ∩ OutB.

3. Output dependencies are the locations in OutA ∩ OutB.

4. Input dependencies are the locations in InA ∩ InB.

In some sense, flow dependencies are the only true dependencies because they express the sharing of data between instructions. Flow dependencies must be honored to obtain correct execution semantics. Anti-flow and output dependencies arise due to sharing memory locations between different instructions. These dependencies can sometimes be removed by renaming memory locations. Input dependencies come from the sharing of memory locations between instructions. The discovery of input dependencies is not important for correctness, they do not impose an execution ordering, but they can be used to improve execution efficiency.

There is one other set of constraints that may be imposed on the instruction execution order: exceptions produced by the execution of instructions should be precise, i.e.

the machine state following the handling of an exception should appear as though any instructions following the exceptional instruction had not executed. This requirement can be quite restrictive. Precise exceptions in effect introduce a control dependency between every instruction which can produce an exception and any following instructions. Implementing this in an aggressively scheduled machine requires hardware support to maintain and restore the correct machine state when an exception is encountered.



Pages:   || 2 | 3 | 4 | 5 |   ...   | 21 |


Similar works:

«LINKER-BASED LECITHIN MICROEMULSIONS AS TRANSDERMAL DRUG DELIVERY SYSTEMS By Jessica Shuhong Yuan A thesis submitted in conformity with the requirements for the degree of Doctor of Philosophy Graduate Department of Chemical Engineering and Applied Chemistry University of Toronto ©Copyright by Jessica Shuhong Yuan 2009 Linker-based lecithin microemulsions as transdermal drug delivery systems Doctor of philosophy, 2009 Jessica Shuhong Yuan Department of Chemical Engineering and Applied Chemistry...»

«THE ARCHITECTURE OF HASSAN FATHY: BETWEEN WESTERN AND NON-WESTERN PERSPECTIVES A thesis submitted for the degree of Doctor of Philosophy at the University of Canterbury by Abdel-moniem M. EI-shorbagy University of Canterbury 2001 Volume One Hassan Fathy (c. 1984) CONTENTS Pag VOLUME ONE Acknowledgments IV vi Abstract vii List of illustrations 1 Introduction PART ONE: BIOGRAPHY 14 EARLY LIFE AND CAREER (1928-1945) Chapter 1 THE CHALLENGE: PRACTICAL EXPERIMENTS (1945-1957) 41 Chapter 2 60 Chapter...»

«Self-directed learning: managing yourself and your working relationships TIME MANAGEMENT Many years ago, before I became involved in management development, I attended a twoday time management course. It was a well run programme, and I found it useful. For me, the mark of a successful course is that you as a participant gain one thing that you are still using years later. I’ll tell you shortly what I took from that course. However, even as the course was unfolding I realised that we were...»

«Martin Carrier Department of Philosophy, Bielefeld University P.O.B. 100 131, 33501 Bielefeld martin.carrier@uni-bielefeld.de Underdetermination as an Epistemological Test Tube: Expounding Hidden Values of the Scientific Community Abstract: Duhem-Quine underdetermination plays a constructive role in epistemology by pinpointing the impact of non-empirical virtues or cognitive values on theory choice. Underdetermination thus contributes to illuminating the nature of scientific rationality....»

«INSTITUTION BUILDING IN AN EMERGING INDUSTRY: LESSONS FROM THE CARBON OFFSET INDUSTRY A DISSERTATION SUBMITTED TO THE FACULTY OF THE GRADUATE SCHOOL OF THE UNIVERSITY OF MINNESOTA BY HANS NIKOLAS RAWHOUSER IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY ALFRED MARCUS, ADVISER MAY 2012 © Hans Rawhouser 2012 Acknowledgements When I entered the PhD program I did not fully comprehend the many real risks involved and my degree of dependence on so many others to...»

«1 The Hendrick Hudson Lincoln-Douglas Philosophical Handbook• Version 4.0 (including a few Frenchmen) The HHLDPH (pronounced hull-duff, or Hillary Duff, depending on your native language) is not meant as a substitute for reading the primary materials, but it does sum up much of the information in readable (sort of) bite-sized pieces. The HHLDPH is divided into six parts. They are, in order, Rights, the Social Contract, Justice, Morality, Random Principles/Arguments, and Philosophers. Read...»

«CICERO’S IDEAL STATESMAN IN THEORY AND PRACTICE By JONATHAN PETER ZARECKI A DISSERTATION PRESENTED TO THE GRADUATE SCHOOL OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY UNIVERSITY OF FLORIDA 2005 Copyright 2005 by Jonathan Peter Zarecki MEAE CARISSIMAE REBECCAE ACKNOWLEDGMENTS Numerous people deserve mention for their help in the completion of this dissertation. First and foremost, I would like to thank my committee chair, Prof....»

«ABSTRACT Title of dissertation: UNDERSTANDING DYNAMIC CAPABILITIES AT THE SUBUNIT LEVEL: OPERATIONAL FLEXIBILITY AND THE CRUCIAL ROLE OF ORGANIZATION DESIGN AND INFORMATION SHARING Sharyn Dawn Gardner, Doctor of Philosophy, 2004 Dissertation directed by: Professor Cynthia Stevens Department of Management and Organization Professor Samer Faraj Department of Decision and Information Technologies Organizations are currently facing increasingly dynamic environments that require fast action in...»

«Achilles tendon elasticity and deformation patterns in young and middle-aged adults evaluated using quantitative ultrasound approaches By Laura Chernak Slane A dissertation submitted in partial fulfillment of The requirements for the degree of Doctor of Philosophy (Biomedical Engineering) at the UNIVERSITY OF WISCONSIN-MADISON 2014 Date of final oral examination: April 25, 2014 The dissertation is approved by the following members of the Final Oral Committee: Darryl G. Thelen, Professor,...»

«TAINTED PROOFS: STAGING WRITTEN EVIDENCE IN EARLY MODERN DRAMA By Lisa M. Barksdale-Shaw A DISSERTATION Submitted to Michigan State University in partial fulfillment of the requirements for the degree of English-Doctor of Philosophy 2015 ABSTRACT TAINTED PROOFS: STAGING WRITTEN EVIDENCE IN EARLY MODERN DRAMA By Lisa M. Barksdale-Shaw Tainted Proofs examines how drama presents stage properties, like letters, contracts, and wills, in the early modern theatre. I argue that the playwrights, William...»

«University of Connecticut DigitalCommons@UConn Articles Philosophy Department 7-1-2000 Physicalism and the Fallacy of Composition Crawford Elder University of Connecticut Department of Philosophy, crawford.elder@uconn.edu Follow this and additional works at: http://digitalcommons.uconn.edu/philo_articles Recommended Citation Elder, Crawford, Physicalism and the Fallacy of Composition (2000). Articles. Paper 3. http://digitalcommons.uconn.edu/philo_articles/3 This Article is brought to you for...»

«Philosophy and the Arts, Midwest Studies in Philosophy, Peter A. French, Theodore E. Uehling, Jr., and Howard Wettstein, eds., (NotreDame: University of Notre Dame Press, 1991), 196-208; reprinted in Synthese, 95 (1993), 13-28; in Lire Goodman, J. Cometti, ed., (Combas: Editions d’Eclat, 1992), 49-67 (in French). UNDERSTANDING: ART AND SCIENCE Catherine Z. Elgin Abstract: This paper explores exemplification in art and science. Both scientific experiments and works of art highlight,...»





 
<<  HOME   |    CONTACTS
2016 www.dissertation.xlibx.info - Dissertations, online materials

Materials of this site are available for review, all rights belong to their respective owners.
If you do not agree with the fact that your material is placed on this site, please, email us, we will within 1-2 business days delete him.