FREE ELECTRONIC LIBRARY - Dissertations, online materials

Pages:   || 2 | 3 | 4 | 5 |

«The hot-set model, characterizing the buffer requirements of relational queries, is presented. This model allows the system to determine the optimal ...»

-- [ Page 1 ] --

Buffer Management in Relational

Database Systems


Universit& di Torino



IBM T. J. Watson Research Center

The hot-set model, characterizing the buffer requirements of relational queries, is presented. This

model allows the system to determine the optimal buffer space to be allocated to a query; it can also

be used by the query optimizer to derive efficient execution plans accounting for the available buffer space, and by a query scheduler to prevent thrashing. The hot-set model is compared with the working-set model. A simulation study is presented.

Categories and Subject Descriptors: H.2.4 [Database Systems-query processing


General Terms: Algorithms, Performance, Theory Additional Key Words and Phrases: Buffer management, hot points, hot-set model, merging scans, nested scans, scheduling, sequential scans, thrashing, unstable intervals, working set

1. INTRODUCTION Most database systems use a main-memory area as a cache buffer, to reduce accessesto disks. This area is subdivided into frames, and each frame can contain a page of a secondary storage file. A process requesting a page will cause a fault if the page is not in the buffer: The requested page is then read into an available buffer frame (demand paging). When no available frames exist, a frame is made available by a replacement policy. If required, its contents are copied back to the disk.

The most popular replacement policy is LRU (least recently used page), which replaces the page that has not been referenced for the longest time. LRU belongs to the family of stack algorithms [25], having the desirable property that an increase in available buffer space never produces an increase in the fault rate.

Moreover, the LRU strategy is simple and can be very efficiently implemented.

This is especially important as the buffer manager is one of the most heavily Authors’ addresses: C. M. Sacco, Dipartimento di Informatica, Universiti di Torino, Via Valperga Caluso, 37, 10100, Torino, Italy; M. Schkolnick, IBM T. J. Watson Research Center, P.O. Box 218, Yorktown Heights, NY 10598.

Permission to copy without fee all or part of this material is granted provided that the copies are not made or distributed for direct commercial advantage, the ACM copyright notice and the title of the publication and its date appear, and notice is given that copying is by permission of the Association for Computing Machinery. To copy otherwise, or to republish, requires a fee and/or specific permission.

0 1986 ACM 0362~5915/86/1200-0473 $00.75 ACM Transactionson DatabaseSystems, Vol. 11, No. 4, December 1986, Pages 473-498.

474 G. M. Sacco and M. Schkolnick l used system components. Finally, an LRU policy appears to be the best for managing the replacement of shared pages [6].

The problem of managing a buffer has a very close relationship with virtual memory management [ll], and in fact it was suggested that files be mapped into the user process addressing space [27]. However, the philosophy used in existing address-mapping hardware is not compatible with database applications, since it is based on the assumption of a relatively small process space, while it is not uncommon for a file to be several megabytes large [35].

Although most of the results available for virtual memory systems apply to buffer management, it will be shown that, at least for relational database systems, it is possible to relax the assumption that the reference string of a process is unknown, and therefore to obtain a more precise model of buffer access. This model is called the hot-set model and characterizes the buffer requirements of a query before its execution. The main advantage over its counterpart in virtual memory systems, the working-set model [13], is that the hot-set model is a static a priori estimator. It can therefore be used in the cost analysis performed by the query optimizer to discriminate among different access plans, and in a lowoverhead scheduling strategy to prevent thrashing.

In the following, discussions on the hot-set model primarily focus on System R [ 11, a relational database system implemented at IBM, and reviewed in Section 2. The ideas presented here can be easily extended to other relational DBMSs, as well as to ad hoc applications.

Section 3 introduces the hot-set model; scheduling and buffer management strategies to avoid thrashing, based on the hot-set model, are discussed in Section 4. Section 5 discusses the use of the hot-set model by the system query optimizer. The hot-set model is compared with the working-set model in Section 6. Variations on buffering strategies are discussed in Section 7.

The following symbols are used in this paper:

relation i Ri number of pages in Ri Pi cardinality of Ri Ki number of distinct values in range of the joining attribute of NDVi relation Ri denotes the jth page of relation Ri pi(j) faults(x) where x is a buffer size, denotes the number of faults (i.e., disk accesses)when the available buffer contains x frames.

2. SYSTEM R Database management systems based on the relational model [lo] achieve a high degree of data independence by providing high-level nonprocedural interfaces.

The user specifies a description of the data to be retrieved (i.e., “what” the user needs), and not how to access it. Relational systems rely on a system component, called the query optimizer, to determine an efficient accessplan for a given query, using the available access paths to the data.

System R is a multiuser relational database system, supporting the SQL nonprocedural query language [7]. The basic structure of the system is outlined in Figure 1.

ACM Transactions on Database Systems, Vol. 11, No. 4, December 1986.

475 Buffer Management in Relational Database Systems

–  –  –

UFI (User Friendly Interface) 1sthe interactive user interface with the system;

it supports SQL. The optimizer selects an efficient strategy for evaluation, on the basis of system data on the stored relations such as existing access paths, cardinality, number of pages, and selectivity factors [33]. Preoptimized queries are directly passed to the RSS component.

The RSS component provides low-level data management primitives, such as “get tuple.” Inside the RSS, a paged buffer preserves the most recently used pages.

A relation can be accessed either by a sequential scan (i.e., exhaustive reading of the whole relation) or through an index scan. Indices are organized as B trees [12] and can be clustered or unclustered. An index is clustered if both the index and the data pages are sorted on the same attributes; this property does not hold for unclustered indices.

Joins can be evaluated using two different methods: nested scans and merging scans.

(1) Nested scans do not require any particular order on the joining attributes.

For a 2-way join, a nested scan strategy amounts to scanning one of the relations (called the outermost), and for each filtered tuple, to locate (via sequential scan or index access) tuples matching the current joining value in the innermost relation. Extensions to n-way joins are straightforward. The result relation is ordered in the same way as the scan on the outermost relation.

(2) Merging scans require both relations to be ordered in the same way according to their joining attributes. The merging scan method uses a placeholder p to reduce the length of the scans over the inner relation, exploiting the ordering.

In the case of an equijoin over two relations, Rl and R2, p is initially positioned before the first tuple in R2. When a scan is started on R2 to match the current ACM Transactions on Database Systems, Vol. 11, No. 4, December 19%.

476 G. M. Sacco and M. Schkolnick l value tl retrieved from Rl, p is set to the first tuple in R2 such that t2 = tl. The scan is terminated when a tuple with t2 tl is found. The current scan position is retained.

On the next scan, initiated for a value tl’ of Rl, two cases arise: If tl’ = tl, the scan initiates at p; otherwise the scan will continue from the current position.

By this method, scans are performed only on subsets of R2, so that the cost is generally much smaller than the nested-scan cost. If the relations are not appropriately ordered, the cost of presorting may offset these benefits.

3. THE HOT-SET MODEL The cache buffer is used to avoid refetching those pages that are reused. Thus, to minimize the number of faults, the buffer space allocated to a query should be sufficient to hold all the pages to be reused. In the case in which the buffer space is insufficient, frames containing pages to be reused will be “stolen” by other pages, either requested by the same process (internal thrashing) or by another process (external thrashing).

It is shown in the following that, since relational systems use standard evaluation strategies, the required buffer space can be estimated before query execution. If the system insures that a process is never run with an insufficient number of frames, then internal and external thrashing can be minimized (or altogether avoided).

The key idea of the hot-set model is that the number of query faults as a function of the available buffer space is a curve consisting of a number of stable intervals (within each of which the number of faults is a constant), separated by a small number of discontinuities, called unstable intervals. Figure 2 is an example, showing two stable intervals and one discontinuity.

A discontinuity occurs when a set of pages, which are rereferenced, does not fit in the available buffer space. On subsequent rereferencing of a page, the missing page must be read from secondary storage, generating a fault. Depending on the properties of the access pattern, unstable intervals can exhibit sharp or smooth discontinuities. Sharp d&continuities are characterized by a rapid increase in the number of faults, in an interval [B - 1, B] (a single frame increase). Such d&continuities are usually caused by looping reusal (Figure 2). Smooth discontinuities usually occur in connection with indexed access, in which the reusal is “less precise” (Figure 3).

A stable interval can be completely characterized by the value of the fault function for the lower extremum of the interval. This buffer size is called a hot point. The minimum number of frames needed by a query to be run is called the minimum hot point. In most systems it is one frame.

Notice that each hot point (with the exception of the minimum one) is both the lower extremum of a stable interval, and the upper extremum of an unstable one. Therefore the fault curve of any query can be completely characterized in terms of its unstable intervals, with the addition of the minimum hot point.

The fault curve in Figure 2 is relative to a join: Rl join R2, executed by nested loops using sequential scans. The fault-rate graph exhibits a sharp increase in the transition between a buffer holding 1 + P2 pages, and a buffer holding P2 ACM Transactions on Database Systems, Vol. 11, No. 4, December 1986.

477 Buffer Management in Relational Database Systems

–  –  –

pages. This sharp discontinuity is explained as follows: The accesspattern of the join is (1) access the current page of Rl, (2) perform a sequential scan on R2, accessing page pZ(l),..., pZ(P2).

If 1 + P2 pages are available, the entire loop on R2 plus the current page of Rl fits in the buffer, thus requiring the minimum number of faults. When this ACM Transactionson DatabaseSystems,Vol. 11, No. 4, December1986.

478 G. M. Sacco and M. Schkolnick l is no longer possible (e.g., buffer = P2), there is no reusal of pages. In fact, at the end of the first loop instance, the LRU stack will contain pages p2(1) to p2(P2), with page p2(P2) being at the top of the stack. The reference to the current page of Rl will cause the replacement of page p2(1). The subsequent reference to p2(1) will cause the replacement of p2(2), and so on. In fact the fault rate for B 1 + P2 is stable in the interval [l, P2] and is equal to the fault rate at B = 1, that is, Kl * (1 + P2).

A query of this type can then be exactly characterized by the minimum hot point (B = 1) and the unstable interval [PS, 1 + P2]. As a matter of fact, the analysis of fault behavior for a query with only sharp discontinuities can be further restricted to hot points only (in the example, hpl = 1, hp2 = 1 + P2).

The analysis of unstable intervals is useful only in the context of smooth discontinuities. Figure 3 (a join computed by nested scans using a sequential scan on the outer relation and an index scan on the inner) shows an instance of a smooth discontinuity. In these cases, the fault curve inside an unstable interval must be analyzed. In the following, this analysis is performed by means of interpolation between the extremes of the unstable interval. Thus, in addition to the hot point relative to the smooth discontinuity (upper extremum), the lower extremum, called cold point, is needed. The term cold point is used to indicate that this buffer size is never to be chosen for execution, since it is the upper extremum of a stable interval (and therefore consumes more buffer resources, with no fault benefit, than the lower extremum of the stable interval).

Whether discontinuities are smooth or sharp, the basic principle is that buffer sizes inside a stable interval, and different from the hot point relative to that interval, do not produce any benefit in terms of fault reduction, while using more buffer resources.

The following discussion provides tools to identify hot points and unstable intervals for a given query and to estimate the number of faults generated by a query running with a buffer size equal to a given hot point, or inside an unstable interval. For the purpose of discussion, it is useful to distinguish between simple, loop, unclustered, and index reusal. In this section, queries are assumed to run in isolation.

Pages:   || 2 | 3 | 4 | 5 |

Similar works:

«THE COFFIN CORNER: Vol. 26, No. 4 (2004) A CHRONOLOGY OF PRO FOOTBALL ON TELEVISION: Part 2 by Tim Brulia 1970: The merger takes effect. The NFL signs a massive four year $142 million deal with all three networks: The breakdown as follows: CBS: All Sunday NFC games. Interconference games on Sunday: If NFC team plays at AFC team (example: Philadelphia at Pittsburgh), CBS has rights. CBS has one Thanksgiving Day game. CBS has one game each of late season Saturday game. CBS has both NFC divisional...»

«1 In chapter 2 beginning on page 45, the textbook will tell you about the 3 key economic  indicators: Gross Domestic Product, the Unemployment Rate, and the Price Indexes, which  tell you what the rate of inflation is.  The first of the 3 key economic indicators is Gross  Domestic Product. 2 Here is your text’s definition of Gross Domestic Product from page 45.  (Read definition.)  I ...»

«Becoming Simultaneously Thicker and Thinner Skinned: The inherent conflicts arising in the professional development of coaches Abstract Research Paper Purpose There is a hidden paradox inherent in the ideal of continuing professional development (CPD) for executive coaches, stemming from the fact that the coach wishes to retain or preserve the freshness and openness of a ‘beginner’, whilst also acquiring greater robustness and resilience in the face of difficult assignments. The paradox...»

«Eötvös Loránd University Department of Information Systems DISTRIBUTED SURVIVABLE PIPELINE COMPUTATION AND COMMUNICATION PhD Dissertation Zsolt Palotai Supervisor: Dr. habil. András L®rincz Head Senior Researcher ELU, Department of Information Systems PhD School of Informatics Dr. János Demetrovics PhD Program of Information Systems Dr. András Benczúr Budapest, 2007. Abstract The increasing availability of mobile and/or intelligent sensors with computation and wireless communication...»

«Testimony of Jeff C. Wright Director, Office of Energy Projects Federal Energy Regulatory Commission 888 First Street, N.E. Washington, DC, 20426 202-502-8700 Before the Committee on Energy and Commerce Subcommittee on Energy and Power United States House of Representatives Hearing on H.R. 3301 “North American Energy Infrastructure Act” October 29, 2013 Mr. Chairman and Members of the Subcommittee: My name is Jeff Wright and I am the Director of the Office of Energy Projects (OEP) at the...»

«Elementary Episode Guide Episodes 001–102 Last episode aired Sunday November 13, 2016 c www.cbs.com c 2016 www.tv.com c 2016 www.cbs.com c 2016 tvrage.com c 2016 www.imdb.com c 2016 c 2016 c 2016 community.ew.com thecelebritycafe.com celebdirtylaundry.com The summaries and recaps of all the Elementary episodes were downloaded from http://www.tv.com and http:// www.cbs.com and http://tvrage.com and http://www.imdb.com and http://thecelebritycafe.com and http: //celebdirtylaundry.com and...»

«F I C H E P R E V E N T I O N N ° 0 2 « H Y G I E N E S E C U R I T E » LE TRAVAIL EN HAUTEUR STATISTIQUES Avec près d’une centaine d’agent victime de chutes de hauteur durant les 5 dernières années dans les collectivités de la Manche, ce risque représente l’une des principales causes d’accidents. DEFINITION On considère que le risque de chute existe dès lors qu’il n’y a pas d’obstacle suffisamment efficace en bordure du vide. C’est un risque très présent dans les...»

«STATEMENT OF SFAS No. FINANCIAL ACCOUNTING STANDARD 30 INDONESIAN INSTITUTE OF ACCOUNTANTS ACCOUNTING FOR LEASES ACCOUNTING FOR LEASES SFAS No.30 CONTENTS paragraphs 01-11 INTRODUCTION Background Type of Leases Execution of Lease Transaction Syndicated Lease STATEMENT OF FINANCIAL ACCOUNTING STANDARD NO 30 12-46 ACCOUNTING FOR LEASES Basic Considerations Purpose Criteria for Classifying Lease Transactions Accounting Treatment by Lessor Accounting Treatment by Lessee Reporting and Disclosures of...»

«The Montana Mathematics Enthusiast ISSN 1551-3440 VOL. 6, NOS.1&2, January 2009, pp.1-294 Editor-in-Chief Bharath Sriraman, The University of Montana Associate Editors: Lyn D. English, Queensland University of Technology, Australia Claus Michelsen, University of Southern Denmark, Denmark Brian Greer, Portland State University, USA Luis Moreno-Armella, University of Massachusetts-Dartmouth International Editorial Advisory Board Miriam Amit, Ben-Gurion University of the Negev, Israel. Ziya Argun,...»

«A review of the performance of bilge socks proposed for use in Buzzards Bay recreational boats in response to a request for proposals issued by the Buzzards Bay Action Committee and the Town of Dartmouth Prepared by Joseph E. Costa, Ph.D. Buzzards Bay Project National Estuary Program Final May 20, 2000 Introduction Most boats have compartments inside their hull that serve to capture rain and seawater entering the hull of the boat. These compartments also capture fuel and engine oil that may...»

«2013 2015 southern university and A&M CoLLeGe Student Handbook b A t o n r o u G e, L o u i s i A n A 2013-2015 SOUTHERN UNIVERSITY and A & M COLLEGE Board of Supervisors Southern univerSity SyStem Board of SuperviSorS Bridget A. Dinvaut, J.D., Chairwoman Ronald Mason Jr., J.D., System President and Secretary of the Board Mrs. Ann A. Smith 1st Congressional District Mr. Mike A. Small 1st Congressional District Mr. Darren G. Mire 2nd Congressional District Dr. Eamon M. Kelly 2nd Congressional...»

«A Compilation of Resources for the Passover Seder Rabbi Debra Orenstein, project founder & chair Rabbi Erin Hirsh, project manager & editor, First Edition Nila M. Pusin, editor, Revised Edition Gabrielle Kaplan-Mayer, outreach & social media coordinator Contents Introduction: The Message and Opportunity of a Seder page III Multi-Part Supplements and Full-Length Haggadot page 1 Focusing on Modern Slavery Passover Seder Starters, Listed in the Order of the Ritual Items to Place On the Seder Table...»

<<  HOME   |    CONTACTS
2016 www.dissertation.xlibx.info - Dissertations, online materials

Materials of this site are available for review, all rights belong to their respective owners.
If you do not agree with the fact that your material is placed on this site, please, email us, we will within 1-2 business days delete him.