WWW.DISSERTATION.XLIBX.INFO
FREE ELECTRONIC LIBRARY - Dissertations, online materials
 
<< HOME
CONTACTS



Pages:   || 2 | 3 |

«The Batched Predecessor Problem in External Memory Michael A. Bender12, Mart´n Farach-Colton23, Mayank Goswami4, ı Dzejla Medjedovic5, Pablo ...»

-- [ Page 1 ] --

The Batched Predecessor Problem in External Memory

Michael A. Bender12, Mart´n Farach-Colton23, Mayank Goswami4,

ı

Dzejla Medjedovic5, Pablo Montes1, and Meng-Tsung Tsai3

Stony Brook University, Stony Brook NY 11794, USA

{bender,pmontes}@cs.stonybrook.edu

Tokutek, Inc.

Rutgers University, Piscataway NJ 08854, USA

{farach,mtsung.tsai}@cs.rutgers.edu

Max-Planck Institute for Informatics, Saarbr¨ cken 66123, Germany

u

gmayank@mpi-inf.mpg.de

Sarajevo School of Science and Technology, Sarajevo 71000, Bosnia-Herzegovina dzejla.medjedovic@ssst.edu.ba Abstract. We give lower and upper bounds for the batched predecessor problem in external memory. We study tradeoffs between the I/O budget to preprocess a dictionary S versus the I/O requirement to find the predecessor in S of each element in a query set Q. For Q polynomially smaller than S, we give lower bounds in three external-memory models: the I/O comparison model, the I/O pointermachine model, and the indexability model.

In the comparison I/O model, we show that the batched predecessor problem needs Ω(logB n) I/Os per query element (n = |S|) when the preprocessing is bounded by a polynomial. With exponential preprocessing, the problem can be solved faster, in Θ((log2 n)/B) per element. We give the tradeoff that quantifies the minimum preprocessing required for a given searching cost.

In the pointer-machine model, we show that with O(n4/3−ε ) preprocessing for any constant ε 0, the optimal algorithm cannot perform asymptotically faster than a B-tree. In the indexability model, we exhibit the tradeoff between the redundancy r and access overhead α of the optimal indexing scheme, showing that to report all query answers in α(x/B) I/Os, log r = Ω((B/α2 ) log(n/B)).

Our lower bounds have matching or nearly matching upper bounds.

1 Introduction A static dictionary is a data structure that represents a set S = {s1, s2,..., sn } subject

to the following operations:

PREPROCESS(S): Prepare a data structure to answer queries.

Return TRUE if q ∈ S and FALSE otherwise.

SEARCH(q, S):

Return maxsi ∈S {si q}.

PREDECESSOR(q, S):

The traditional static dictionary can be extended to support batched operations. Let

Q = {q1,..., qx }. Then, the batched predecessor problem can be defined as follows:

BATCHEDPRED(Q, S): Return A = {a1,..., ax }, where ai = PREDECESSOR(qi, S).

This research was supported in part by NSF grants CCF 1114809, CCF 1114930, CCF 1217708, IIS 1247726, IIS 1247750, and IIS 1251137.

In this paper we prove lower bounds on the batched predecessor problem in external memory [3], that is, when the dictionary is too large to fit into main memory. We study tradeoffs between the searching cost and the cost to preprocess the underlying set S. We present our results in three models: the comparison-based I/O model [3], the pointermachine I/O model [18], and the indexability model [10, 11].

We focus on query size x ≤ nc, for constant c 1. Thus, the query Q can be large, but is still much smaller than the underlying set S. This query size is interesting because, although there is abundant parallelism in the batched query, common approaches such as linear merges and buffering [4, 6, 7] are suboptimal.

Our results show that the batched predecessor problem in external memory cannot be solved asymptotically faster than Ω(logB n) I/Os per query element if the preprocessing is bounded by a polynomial; on the other hand, the problem can be solved asymptotically faster, in Θ((log2 n)/B) I/Os, if we impose no constraints on preprocessing. These bounds stand in marked contrast to single-predecessor queries, where one search costs Ω(logB n) even if preprocessing is unlimited.

We assume that S and Q are sorted. Without loss of generality, Q is sorted because Q’s sort time is subsumed by the query time. Without loss of generality, S is sorted, as long as the preprocessing time is slightly superlinear. We consider sorted S throughout the paper. For notational convenience, we let s1 s2 · · · sn and q1 q2 · · · qx, and therefore a1 ≤ a2 ≤ · · · ≤ ax.

Given that S and Q are sorted, an alternative interpretation of this paper is as follows: how can we optimally merge two sorted lists in external memory? Specifically, what is the optimal algorithm for merging two sorted lists in external memory when one list is some polynomial factor smaller than the other?

Observe that the na¨ve linear-scan merging is suboptimal because it takes Θ(n/B) ı I/Os, which is greater than the O(nc logB n) I/Os of a B-tree-based solution. Buffer trees [4, 6, 7] also take Θ(n/B) I/Os during a terminal flush phase. This paper shows that with polynomial preprocessing, performing independent searches for each element in Q is optimal, but it is possible to do better for higher preprocessing.

Single and batched predecessor problems in RAM. In the comparison model, a single predecessor can be found in Θ(log n) time using binary search. The batched predecessor problem is solved in Θ(x log(n/x) + x) by combining merging and binary search [13, 14]. The bounds for both problems remain tight for any preprocessing budget.

P˘ trascu and Thorup [15] give tight lower bounds for single predecessor queries in a¸ the cell-probe model. We are unaware of prior lower bounds for the batched predecessor problem in the pointer-machine and cell-probe models.





Although batching does not help algorithms that rely on comparisons, Karpinski and Nekrich [12] give an upper bound for this problem in the word-RAM model (bit √ operations are allowed), which achieves O(x) for all batches of size x = O( log n) (O(1) per element amortized) with superpolynomial preprocessing.

Batched predecessor problem in external memory. Dittrich et al. [8] consider multisearch problems where queries are simultaneously processed and satisfied by navigating through large data structures on parallel computers. They give a lower bound of Ω(x logB (n/x) + x/B) under stronger assumptions: no duplicates of nodes are allowed, the ith query has to finish before the (i + 1)st query starts, and x n1/(2+ε), for a constant ε 0.

Buffering is a standard technique for improving the performance of externalmemory algorithms [4, 6, 7]. By buffering, partial work on a set of operations can share an I/O, thus reducing the per-operation I/O cost. Queries can similarly be buffered. In this paper, the number of queries, x, is much smaller than the size, n, of the data structure being queried. As a result, as the partial work on the queries progresses, the query paths can diverge within the larger search structure, eliminating the benefit of buffering.

Goodrich et al. [9] present a general method for performing x simultaneous external memory searches in O((n/B + x/B) logM/B (n/B)) I/Os when x is large. When x is small, this technique achieves O(x logB (n/B)) I/Os with a modified version of the parallel fractional cascading technique of Tamassia and Vitter [19].

Results We first consider the comparison-based I/O model [3]. In this model, the problem cannot be solved faster than Ω(logB n) I/Os per element if preprocessing is polynomial.

That is, batching queries is not faster than processing them one by one. With exponential preprocessing, the problem can be solved faster, in Θ((log2 n)/B) I/Os per element. We generalize to show a query-preprocessing tradeoff.

Next we study the pointer-machine I/O model [18], which is less restrictive than the comparison I/O model in main memory, but more restrictive in external memory.6 We show that with preprocessing at most O(n4/3−ε ) for constant ε 0, the cost per element is again Ω(logB n).

Finally, we turn to the more general indexability model [10, 11]. This model is frequently used to describe reporting problems, and it focuses on bounding the number of disk blocks that contain the answers to the query subject to the space limit of the data structure; the searching cost is ignored. Here, the redundancy parameter r measures the number of times an element is stored in the data structure, and the access overhead parameter α captures how far the reporting cost is from the optimal.

We show that to report all query answers in α(x/B) I/Os, r = (n/B)Ω(B/α ). The lower bounds in this model also hold in the previous two models. This result shows that it is impossible to obtain O(1/B) per element unless the space used by the data structure is exponential, which corresponds to the situation in RAM, where exponential preprocessing is required to achieve O(1) amortized time per query element [12].

The rest of this section formally outlines our results.

–  –  –

I/Os in the worst-case, no matter the preprocessing. There exists a comparison-based algorithm matching this bound.

Traditional information-theoretic techniques give tight sorting-like lower bounds for this problem in the RAM model. In external memory, the analogous approach yields a lower bound of Ω B logM/B n + B. On the other hand, repeated finger searching x x x in a B-tree yields an upper bound of O(x logB n). Theorem 1 shows that both bounds are weak, and that in external memory this problem has a complexity that is between sorting and searching.

We can interpret results in the comparison model through the amount of information that can be learned from each I/O. For searching, a block input reduces the choices for the target position of the element by a factor of B, thus learning log B bits of information. For sorting, a block input learns up to log M = Θ(B log(M/B)) bits (obtained B by counting the ways that an incoming block can intersperse with elements resident in main memory). Theorem 1 demonstrates that in the batched predecessor problem, the optimal, unbounded-preprocessing algorithm learns B bits per I/O, more than for searching but less than for sorting.

The following theorem captures the tradeoff between the searching and preprocessing: at one end of the spectrum lies a B-tree (j = 1) with linear construction time and logB n searching cost per element, and on the other end is the parallel binary search (j = B) with exponential preprocessing cost and (log2 n)/B searching cost. This tradeoff shows that even to obtain a performance that is only twice as fast as that of a B-tree, quadratic preprocessing is necessary. To learn up to j log(B/j + 1) bits per I/O, the algorithm needs to spend nΩ(j) in preprocessing.

Theorem 2 (Search-preprocessing tradeoff, I/O comparison model). Let S be a set of size n and Q a set of size x ≤ nc, 0 ≤ c 1. In the I/O comparison model, computing BATCHEDPRED(Q, S) in O((x logB/j+1 n)/j) I/Os requires that PREPROCESSING(S) use nΩ(j) blocks of space and I/Os.

In order to show results in the I/O pointer-machine model, we define a graph whose nodes are the blocks on disk of the data structure and whose edges are the pointers between blocks. Since a block has size B, it can contain at most B pointers, and thus the graph is fairly sparse. We show that any such sparse graph has a large set of nodes that are far apart. If the algorithm must visit those well-separated nodes, then it must perform many I/Os. The crux of the proof is that, as the preprocessing increases, the redundancy of the data structure increases, thus making it hard to pin down specific locations of the data structure that must be visited. We show that if the data structure is reasonable in size—in our case O(n4/3−ε )—then we can still find a large, well dispersed set of nodes

that must be visited, thus establishing the following lower bound:

Theorem 3 (Lower bound, I/O pointer-machine model). Let S be a set of size n.

In the I/O pointer-machine model, if PREPROCESSING(S) uses O(n4/3−ε ) blocks of space and I/Os, for any constant ε 0, then there exists a constant c and a set Q of size nc such that computing BATCHEDPRED(Q, S) requires Ω(x logB (n/x) + x/B) I/Os.

–  –  –

2 Batched Predecessor in the I/O Comparison Model In this section we give the lower bound for when preprocessing is unrestricted. Then we study the tradeoff between preprocessing and the optimal number of I/Os.

–  –  –

Proof of Theorem 1 (Lower Bound). Consider the following problem instance:

1. For all qi, |Si | = n/x. That is, all elements in Q have been given the first log x bits of information about where they belong in S.

2. For all i and j (1 ≤ i = j ≤ x), Si ∩ Sj = ∅. That is, search intervals are disjoint.

We do not charge the algorithm for transferring elements of Q between main memory and disk. This accounting scheme is equivalent to allowing all elements of Q to reside in main memory at all times while still having the entire memory free for other manipulations. Storing Q in main memory does not provide the algorithm with any additional information, since the sorted order of Q is already known.

Now we only consider I/Os of elements in S. Denote a block being input as b = (b1,..., bB ). Observe that every bi (1 ≤ i ≤ B) belongs to at most one Sj. The element bi acts as a pivot and helps qj learn at most one bit of information—by shrinking Sj to its left or its right half.

Since a single pivot gives at most one bit of information, the entire block b can supply at most B bits, during an entire execution of BATCHEDPRED(Q, S).

We require the algorithm to identify the final block in S where each qi belongs.

Thus, the total number of bits that the algorithm needs to learn to solve the problem is Ω(x log(n/xB)). Along with the scan bound to output the answer, the minimum x n x number of block transfers required to solve the problem is Ω B log xB + B.

–  –  –

2.2 Preprocessing-Searching Tradeoffs We give a lower bound on the space required by the batched predecessor problem when the budget for searching is limited. We prove Theorem 2 by proving Theorem 7.

Definition 6. An I/O containing elements of S is a j-parallelization I/O if j distinct elements of Q acquire bits of information during this I/O.



Pages:   || 2 | 3 |


Similar works:

«1 Handout 3 WARNING SIGNS OF DRUG ACTIVITY The sooner it is recognized, the faster it can be stopped. COMPLAINTS WE HAVE HEARD: The neighbors tell me my tenants are dealing drugs. But I drove by three different times and didn t see a thing. ADVICE WE WERE GIVEN: You ve got to give up being naive. We could stop a lot more of it if more people knew what to look for. Narcotics detective THE DRUGS While many illegal drugs are sold on the street today, the following are most common: Cocaine and...»

«UNITED STATES SECURITIES AND EXCHANGE COMMISSION Washington, D.C. 20549 FORM 20-F (Mark One) [] REGISTRATION STATEMENT PURSUANT TO SECTION 12(b) OR (g) OF THE SECURITIES EXCHANGE ACT OF 1934 OR [X] ANNUAL REPORT PURSUANT TO SECTION 13 OR 15(d) OF THE SECURITIES EXCHANGE ACT OF 1934 For the fiscal year ended December 31, 2009 OR [] TRANSITION REPORT PURSUANT TO SECTION 13 OR 15(d) OF THE SECURITIES EXCHANGE ACT OF 1934 For the transition period from _ to _ OR [] SHELL COMPANY REPORT PURSUANT TO...»

«From Iowa to India Hannah McAtee 2011 World Food Prize Borlaug-Ruan International Intern Maharashtra Hybrid Seeds Company Limited Mahyco Research Center Dawalwadi, Jalna, Maharashtra State, India 1 Table of Contents Acknowledgements.. 3 Personal Background.. 4 One Man’s Dream to Change India. 6 Maharashtra Hybrid Seeds Company, Limited. 6 Cabbage Transformation Project. 8 Abstract.. 8 My Contributions.. 9 Protocol.. 10 Conclusions and Future Planning. 11 Recombinant DNA sub-cloning Project....»

«Topic Significance Ranking of LDA Generative Models Loulwah AlSumait1 Daniel Barbar´1 James Gentle2 Carlotta Domeniconi1 a 1 Department of Computer Science, George Mason University, Fairfax VA 22030, USA 2 Department of Computational and Data Sciences, George Mason University, Fairfax VA 22030, USA Abstract. Topic models, like Latent Dirichlet Allocation (LDA), have been recently used to automatically generate text corpora topics, and to subdivide the corpus words among those topics. However,...»

«MEET THE PAZAPS OF BHUTAN Central Bhutan Adventure Seven-night Exploration of Western & Central Bhutan In this all-inclusive adventure, join us for a journey to meet the Pazaps, the locals of the Punakha valley, the spiritual heartland of our Kingdom. Your english speaking Bhutanese guide is always on hand to unlock the mysteries of the two magnificent dzongs (huge fortress-like monasteries) of Punakha and Wangdue Phodrang plus the temple of the Divine Madman, Chimmi Lhakhang. On the way we...»

«ANNUAL PROGRESS REPORT 2014/15 TRANSPARENCY INTERNATIONAL (TI) NEPAL ANNUAL PROGRESS REPORT 2014/15 0 EXECUTIVE COMMITTEE 2014-2016 BHARAT BAHADUR THAPA PRESIDENT GEETA KESHARY MUKUNDA PRADHAN PADMINI PRADHANANGA VICE PRESIDENT SECRETARY GENERAL TREASURER ANANDA R MULMI RAM K MANANDHAR LILA P SAPKOTA CHINTAMANI YOGI SURENDRABIR MALAKAR MEMBER MEMBER MEMBER MEMBER INST. MEMBER INVITEES BISHNU BAHADUR K.C. DEVENDRA RAJ PANDAY FORMER PRESIDENT FORMER PRESIDENT IN ATTENDANCE ASHISH THAPA EXECUTIVE...»

«Distribution Agreement In presenting this thesis or dissertation as a partial fulfillment of the requirements for an advanced degree from Emory University, I hereby grant to Emory University and its agents the non-exclusive license to archive, make accessible, and display my thesis or dissertation in whole or in part in all forms of media, now or hereafter known, including display on the world wide web. I understand that I may select some access restrictions as part of the online submission of...»

«Clay Minerals (1982) 17, 443-452 X-RAY PHOTOELECTRON DIFFRACTION STUDIES OF LEPIDOLITE S. E V A N S AND E. R A F T E R Y Edward Davies Chemical Laboratories, University College of Wales, A bervstwyth, Djfed S Y23 1NE, UK (Received 29 January 1982) A B S T R A C T: X-ray photoelectron diffraction data for three cleavage surfaces from a crystal of a Norwegian lepidolite containing 2.3% Rb and 3% Mn are reported and interpreted. Rubidium is shown to occupy anhydrous interlayer sites equivalent to...»

«CÔNG TY TNHH MTV CƠ KHÍ TÂY NINH BAN CHỈ ĐẠO CỔ PHẦN HÓA CÔNG TY TNHH MTV CƠ KHÍ TÂY NINH KHUYẾN CÁO CÁC NHÀ ĐẦU TƢ NÊN ĐỌC KỸ CÁC THÔNG TIN TRONG TÀI LIỆU NÀY VÀ QUY CHẾ ĐẤU GIÁ TRƢỚC KHI ĐĂNG KÝ THAM DỰ ĐẤU GIÁ.TỔ CHỨC THỰC HIỆN ĐẤU GIÁ : SỞ GIAO DỊCH CHỨNG KHOÁN TP. HCM Địa chỉ: 16 Võ Văn Kiệt, Quận 1, TP. HCM Điện thoại: (08) 3821 7713 Fax: (08) 3821 7452 TỔ CHỨC PHÁT HÀNH : CÔNG TY TNHH MTV...»

«Uitvoeringsregeling zelfstandige toegang meterruimtes netbeheerders 1. Inleiding Deze uitvoeringsregeling beschrijft de wijze waarop praktische invulling wordt gegeven aan de afspraken uit het “Convenant zelfstandige toegang tot meterruimtes netbeheerders”. Deze regeling is opgesteld en bekrachtigd door de Werkgroep Sleutelverstrekking Derden, waarin vertegenwoordigers van VMNED en van Netbeheer Nederland zitting hebben. Deze werkgroep is inhoudelijk verantwoordelijk voor deze...»

«Inspection report for Little Folly Children’s Centre Local authority Wiltshire Inspection number 367833 Inspection dates 22–23 February 2012 Reporting inspector Christopher Russell HMI Centre leader Jennie Maybury Date of previous inspection NA Centre address Winding Way, Bemerton Heath, Salisbury SP2 9DY Telephone number 01722 414301 Fax number Email address jmaybury@spurgeons.org Linked school if applicable Linked early years and South Hills Nursery (Salisbury) childcare, if applicable...»

«National Humanities Center Resource Toolbox The Making of African American Identity: Vol. I, 1500-1865 “I ’member well when the war was on.” Enslavement & Emancipation during the Civil War: * Selections from the WPA interviews of formerly enslaved African Americans, 1936-1938* Over 2300 former slaves were interviewed during the Great Depression of the 1930s by members of the Federal Writers' Project, a New Deal agency in the Works Progress Administration (WPA). Note: Selections from the...»





 
<<  HOME   |    CONTACTS
2016 www.dissertation.xlibx.info - Dissertations, online materials

Materials of this site are available for review, all rights belong to their respective owners.
If you do not agree with the fact that your material is placed on this site, please, email us, we will within 1-2 business days delete him.