WWW.DISSERTATION.XLIBX.INFO
FREE ELECTRONIC LIBRARY - Dissertations, online materials
 
<< HOME
CONTACTS



Pages:   || 2 | 3 | 4 | 5 |   ...   | 14 |

«Abstract of “ Query Performance Prediction for Analytical Workloads ” by Jennie Duggan, Ph.D., Brown University, May 2013 Modeling the complex ...»

-- [ Page 1 ] --

Abstract

of “ Query Performance Prediction

for Analytical Workloads ” by Jennie Duggan, Ph.D., Brown University, May 2013

Modeling the complex interactions that arise when query workloads share computing

resources and data is challenging albeit critical for a number of tasks such as Quality of Service (QoS) management in the emerging cloud-based database platforms,

effective resource allocation for time-sensitive processing tasks, and user-experience

management for interactive systems. In our work, we develop practical models for query performance prediction (QPP) for heterogeneous, concurrent query workloads in analytical databases.

Specifically, we propose and evaluate several learning-based solutions for QPP.

We first address QPP for static workloads that originate from well-known query classes. Then, we propose a more general solution for dynamic, ad hoc workloads.

Finally, we address the issue of generalizing QPP for different hardware platforms such as those available from cloud-service providers.

Our solutions use a combination of isolated and concurrent query execution samples, as well as new query workload features and metrics that can capture how different query classes behave for various levels of resource availability and contention. We implemented our solutions on top of PostgreSQL and evaluated them experimentally by quantifying their effectiveness for analytical data and workloads, represented by the established benchmark suites TPC-H and TPC-DS. The results show that learning-based QPP can be both feasible and effective for many static and dynamic workload scenarios.

Query Performance Prediction for Analytical Workloads by Jennie Duggan B.Sc., Rensselaer Polytechnic Institute; Troy, NY, 2003 Sci.M., Brown University; Providence, RI, 2009 A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in The Department of Computer Science at Brown University

PROVIDENCE, RHODE ISLAND

May 2013 c Copyright 2013 by Jennie Duggan This dissertation by Jennie Duggan is accepted in its present form by The Department of Computer Science as satisfying the dissertation requirement for the degree of Doctor of Philosophy.

Date U˘ur Cetintemel, Ph.D., Advisor g¸ Recommended to the Graduate Council Date Olga Papaemmanouil, Ph.D., Reader Brandeis University Date Eliezer Upfal, Ph.D., Reader Date Stanley Zdonik, Ph.D., Reader Approved by the Graduate Council Date

–  –  –

I would first like to acknowledge Ugur Cetintemel. He has been an incredibly insightful and supportive advisor over the years. Ugur has tirelessly helped me navigate the ups and downs of the research process. His gift for explaining data management at both the high and detailed levels is both inspiring and instructive. His inimitable way of identifying compelling and high impact research problems has also made this journey possible.

I would also like to thank Olga Papaemmanouil. She has taught me how to peel the onion of the grad student experience from our years as officemates to her present post as an excellent professor at Brandeis. I learned how to drill deeply into research problems from her and she given me invaluable advice about how to convey my ideas into an accessible paper.

Eli Upfal has also been integral to my PhD odyssey. I am very grateful to him for all that he has done to teach me how to think through difficult problems. Eli routinely demonstrates how to solve problems both elegantly and pragmatically.

Stan Zdonik has also served as an excellent role model for me. I am thankful for all of the times where he has given me feedback for my research. He has this great ability to rapidly understand and either poke holes or dramatically improve my initial ideas.

iv Roberto Tamassia has also been an influential mentor to me. In TAing his class I have learned so much about how to instruct students and excite them about my research. Roberto also taught me about how to navigate the department during my tenure as FGL. His can-do attitude is very inspiring.

The Brown Data Management group has also been an integral part of this process. I have been fortunate enough to have a wonderful set of colleagues including Yanif Ahmad, Tingjian Ge, Jeong-Hyon Hwang, Alex Rasin, Mert Akdere, Nathan Backman, Hideaki Kimura, Andy Pavlo and Justin Debrebant. Working with them has enriched my experience in this department and I am grateful for the opportunity.

I have also been lucky enough to have a great circle of friends around Brown CS.

The experience would not have been the same without Irina Calciu, Jason Pacheco, Micha Elsner, Steve Gomez and Yossi Lev. They provided a much needed sounding board through this intense time.

I would also like to thank my family for their continued support. My parents, Julie and John, and sisters Sarah and Katherine have spent years listening to me talk about my research and for that I am appreciative.

Finally I thank Matthew Duggan, my husband. His tireless support of this endeavor made so much of it possible. He has been patient through countless weekends of deadlines and very encouraging through the rough parts. I dedicate this dissertation to you, my love!

<

–  –  –





Introduction Concurrent query execution facilitates improved resource utilization and aggregate throughput, while making it a challenge to accurately predict individual query performance. Modeling the performance impact of complex interactions that arise when multiple queries share computing resources and data is difficult albeit critical for a number of tasks such as Quality of Service (QoS) management in the emerging cloudbased database platforms, effective resource allocation for time-sensitive processing tasks, and user experience management for interactive database systems.

Consider a cloud-based database-as-a-service platform for data analytics. The service provider would negotiate service level agreements (SLAs) with its users. Such SLAs are often expressed in terms of QoS (e.g., latency, throughput) requirements for various query classes, as well as penalties that kick in if the QoS targets are violated. The service provider has to allocate sufficient resources to user queries to avoid such violations, or else face consequences in the form of lost revenue and damaged reputation due to unhappy customers. Thus, it is important to be able to accurately predict the run-time of an incoming query on the available machines, as well as its impact on the existing queries, so that the scheduling of the query does not lead to any QoS violations. The service provider may have to scale up and allocate more cloud resources if it deems that existing resources are insufficient to accommodate the incoming query.

Concurrent query performance prediction (cQPP) is relevant in many contexts.

We start by examining it broadly for analytical queries. We consider two approaches:

interaction and resource modeling. Next we extend this work to support distributed OLAP queries. Finally, we explore two approaches to OLTP throughput prediction.

1.1 Modeling Query Performance for Static Work

–  –  –

In our first section, we address the performance prediction problem for analytical concurrent query workloads. Specifically, we study the following problem: ”Given a collection of queries q1, q2, q3,..., qn, concurrently executing on the same machine at arbitrary stages of their execution, predict when each query will finish its execution.” We assume that all queries are derived from a set of known query classes (e.g., instances of TPC-H query templates) and that they are mostly I/O bound (e.g., typical TPC-H queries).

We propose a two-phase solution for this problem.

1. (Model building) We build a composite, multivariate regression model that captures the execution behavior of concurrent queries as they go through distinct query mixes. We use this model to predict the execution speed for each query in a given concurrent workload.

2. (Timeline analysis) We analyze the execution timeline of the workload to predict the termination points for individual queries. This timeline analysis starts by predicting the first query to complete and then repeatedly performs prediction for the remaining queries. The timeline can be either real-time or have access to a queue for batch-oriented planning.

One of our key ideas is to use Buffer Access Latency (BAL) as an effective means to both capture the execution speed of a query as well as to quantify the performance impact of concurrently running queries. Our first regression model uses the average BAL of the query execution to accurately predict runtimes of individual queries when run concurrently. We call this model ”BAL to Latency” (B2L).

While it is not tractable to sample how much BAL is affected for all possible concurrent query combinations, we show that capturing only the first- and secondorder effects, which can be obtained by sampling isolated and pairwise-concurrent query runs, is sufficient to yield good predictions. Our second regression model, therefore, uses the base BAL for a query q (obtained from the isolated run of q), other queries in the mix, delta BALs (obtained from pairwise-concurrent runs for the concurrent queries in the mix) and limited higher multiprogramming level sampling to predict the change in average BAL for q. We refer to this multivariate regression model as ”BAL to concurrent BAL” (B2cB). As a final step, we compose B2cB and B2L to obtain execution latency predictions for queries in the concurrent mix.

Finally, we adapt this system to support changing workloads by calculating incremental predictions of the execution latency for discrete mixes as they occur. When an incoming query is added to a mix we project how it will impact the currently running queries and estimate the execution latency we can expect for the new addition.

This technique also allows us to dynamically correct some of our previous estimates by re-evaluating with each scheduling decision. It can also allow for batch-based resource planning by modeling a larger queue of queries in our scheduling.

1.2 Query Performance Prediction for Dynamic

–  –  –

In our second section, we explore the idea of a more generalized concurrency model based on allocating specific resources for each query as we schedule it. This approach will allow us to model concurrency with a much lighter training phase. Concurrent query execution allows users to decrease the time required for a batch of queries [5, 8] in analytical workloads. When several queries execute simultaneously, hardware resources can be better used by exploiting parallelism. At the same time, concurrent execution raises a number of challenges, including predicting how interleaving queries will affect each other’s rate of progress. As multiple queries compete for hardware resources, their interactions may be positive, neutral, or negative [3]. For example, a positive interaction may occur if two queries share a large table scan: one query may pre-fetch data for the other and they both enjoy a modest speedup. In contrast, if two queries access disjoint data and are I/O-bound, they may slow each other down by a factor of two or more. Judicious scheduling of concurrent queries can significantly impact the completion time of individual mix members as well as the entire batch [7].

Concurrent query execution allows users to decrease the time required for a batch of queries [5, 8] in analytical workloads. When several queries execute simultaneously, hardware resources can be better used by exploiting parallelism. At the same time, concurrent execution raises a number of challenges, including predicting how interleaving queries will affect each other’s rate of progress. As multiple queries compete for hardware resources, their interactions may be positive, neutral, or negative [3].

For example, a positive interaction may occur if two queries share a large table scan:

one query may pre-fetch data for the other and they both enjoy a modest speedup.

In contrast, if two queries access disjoint data and are I/O-bound, they may slow each other down by a factor of two or more. Judicious scheduling of concurrent queries can significantly impact the completion time of individual mix members as well as the entire batch [7].

Accurate concurrent query performance prediction (cQPP) stands to benefit a variety of applications. This knowledge would allow system administrators to make better scheduling decisions for large batches of queries [7]. With cQPP, cloud-based provisioning would be able to make more informed deployment plans [45, 1]. Performance prediction under concurrency can also create more refined query progress indicators by analyzing in real time how physical resource availability affects a query’s estimated completion time. Moreover, accurate cQPP could also enable query optimizers to create interaction-aware execution plans.

Because of its important applications, there has been significant recent work on cQPP, which has primarily focused on static workloads [5, 23]. These solutions predict models on well-defined sets of query templates where interactions within the workload must be sampled before predictions may be produced. Furthermore, the sampling requirements for these approaches grow exponentially in proportion to the complexity of their workloads, limiting their viability in real-world deployments. In this work, we propose a more general solution to target dynamic or ad hoc workloads, where new templates may be executed with queries from a well-known workload.

We create predictions for these unseen templates without the requirement to sample their interactions with our workload. In doing so, we retain the benefits of the prior work, while accommodating unseen or changing workloads. Thus, this approach dramatically simplifies the process of supporting unpredictable or evolving user requirements, which are present in many exploration-oriented database applications including science, engineering and business.



Pages:   || 2 | 3 | 4 | 5 |   ...   | 14 |


Similar works:

«VIDAL SASSOON ACADEMY COSMETOLOGY PROGRAM VIDAL SASSOON ACADEMY 321 Santa Monica Blvd, Santa Monica, CA 90401 T: 310-255-0011 ext. 1 T: 888-757-5100 ext. 1 January 1, 2015 through December 31, 2015 SASSOON-ACADEMY.COM TABLE OF CONTENTS APPROVAL DISCLOSURE STATEMENT 4 PROGRAM OFFERED 5–9 Mission And Purposes 5 Goals And Objectives 6 Location 6 Facilities 6 Teaching and Learning Methods 7 Library 7 Student Kit 7 Course Curriculum 8-9 STANDARDS FOR STUDENT ACHIEVEMENT 10 – 13 Grading...»

«United States Court of Appeals FOR THE DISTRICT OF COLUMBIA CIRCUIT Argued April 12, 2016 Decided October 11, 2016 No. 15-1177 PHH CORPORATION, ET AL., PETITIONERS v. CONSUMER FINANCIAL PROTECTION BUREAU, RESPONDENT On Petition for Review of an Order of the Consumer Financial Protection Bureau (CFPB File 2014-CFPB-0002) Theodore B. Olson argued the cause for petitioners. With him on the briefs were Helgi C. Walker, Mitchel H. Kider, David M. Souders, Thomas M. Hefferon, and William M. Jay. C....»

«Community Living Guide 2014-2015 1 Table of Contents: Letter from Steve Harrison, Director of University Housing COMMUNITY SAFETY AND SUCCESS University Housing Staff & Student Leadership Community Council Community Desks Department of Public Safety Desk Assistants & Desk Managers Housing Assignments Services Maintenance and Custodial Staff National Residence Hall Honorary (NRHH) Resident Hall Association Resident Assistants Resident Directors Community Living Living with a Roommate Personal...»

«African Literature, Visual Arts & Film In Local and Transnational Spaces 37th Annual ALA Conference April 13-17, 2011 Hosted by Ohio University Department of English and African Studies Program Ghirmai Negash Conference Convener Page |1 Table of Contents ALA Governance & Officers (2010-2011). Summary of Events.. Schedule of Daily Events. Invited Speakers.. Summary of Films.. Index of Participants.. Advertisements.. Baker Center Floor Plan. ALA Governance & Officers (2010-2011) ALA Officers...»

«History of long-term glacial erosion in the Patagonian Andes Chelsea Willett Advisor: Mark Brandon Second Reader: David Evans April 27, 2011 A Senior Thesis presented to the faculty of the Department of Geology and Geophysics, Yale University, in partial fulfillment of the Bachelor's Degree. Willett 2 In presenting this thesis in partial fulfillment of the Bachelor’s of Science Degree from the Department of Geology and Geophysics, Yale University, I agree that the department may make copies...»

«A HELLENISTIC TERRACOTTA GROUP FROM CORINTH 67 AND 68) (PLATES Xj7j T HEN consideringlarge-scaleterracottasculptureof the late Classicaland Hellenistic periods, one tends to think of Italy more than of Greece and of the architectural and votive terracottasfrom Etruscan and Campanian sites. The amount of known and slight. Summedup by Laumonierin 1956 publishedmaterialfrom Greece is comparatively in his publicationof a large-scalehead from Delos, the corpus has not changednoticeably.1 That...»

«Muir Woods Winter Solstice Guide Winter Solstice Musings 100 people. Soon artists were creating speAlready it feels like winter! cial pieces for solstice, the Loosely Coho in the creek, polypody unKnits played in the Visitor Center. furling, kinglets and varied thrush We sipped solstice cider, and an flitting through the underbrush, intern wrote the first shadow pupmushrooms galore. We’re even pet play. Morris Dancers emerged sniffing the first fetid addersas a tradition! Soon, we needed...»

«from topic to presentation. writing centers. invention writing spaces as inquiry based learning. patchwriting. storytelling. voice. Wikipedia research. ethnography. navigating genres. first person. collaborative writing. rhetorical analysis. academic writing. revision. ethical invention. philosophies of error. invention and investment. rhetorical occasion and vocabulary. first-year writing. logic in argumentative writing. myth of the inspired writer. inner and outer...»

«OUNAVARRA OVERVIEW by Laurence B. Siegel February 2013 Senior Advisor, Ounavarra Capital LLC and the Gary P. Brinson Director of Research, Research Foundation of CFA Institute The “New Finance”: Illiquidity, the Liquidity Premium, and Liquidity-Preserving Strategies Laurence B. Siegel Liquidity, the ability to convert one’s investments to cash when needed, is one of the most poorly defined concepts in finance. Yet it’s almost universally agreed that investors pay to get it and require a...»

«ETSI TS 100 394-4-12 V1.1.1 (2000-10) European Standard (Telecommunications series) Terrestrial Trunked Radio (TETRA); Conformance testing specification; Part 4: Protocol testing specification for Direct Mode Operation (DMO); Sub-part 12: Test Suite Structure and Test Purposes (TSS&TP) for Repeater type 2 2 ETSI TS 100 394-4-12 V1.1.1 (2000-10) Reference DTS/TETRA-02009-4-12 Keywords DMO, protocol, radio, testing, TETRA, TSS&TP, TTCN ETSI 650 Route des Lucioles F-06921 Sophia Antipolis Cedex...»

«GROWING ALUM CRYSTALS © 2000 by David A. Katz. All rights reserved INTRODUCTION In nature, crystals are formed naturally from molten rocks (magmas), from hot aqueous solutions, or from hot gases. Well-formed crystals, frequently encountered in hydrothermal veins, are spectacular in form and color. Such crystals have always held a fascination for people. Throughout time, people have collected crystals, treasured them, studied them, and even ascribed magical powers to them. Mineralogy, the...»

«SSS10 Proceedings of the 10th International Space Syntax Symposium 021 Analysis of airport configuration and passenger behaviour Sofia Kalakou Instituto Superior Técnico, Universidade de Lisboa, CESUR, DeCIVIL, Lisbon, Portugal sofia.kalakou@ist.utl.pt Filipe Moura Instituto Superior Técnico, Universidade de Lisboa, CESUR, DeCIVIL, Lisbon, Portugal fmoura@tecnico.ulisboa.pt Valério Medeiros Câmara dos Deputados, Congresso Nacional; Faculdade de Arquitetura e Urbanismo da Universidade de...»





 
<<  HOME   |    CONTACTS
2016 www.dissertation.xlibx.info - Dissertations, online materials

Materials of this site are available for review, all rights belong to their respective owners.
If you do not agree with the fact that your material is placed on this site, please, email us, we will within 1-2 business days delete him.