FREE ELECTRONIC LIBRARY - Dissertations, online materials

Pages:   || 2 | 3 | 4 |

«Minimal Indices for Successor Search This thesis was submitted in partial fulfillment of the requirements for the M.Sc. degree in the School of ...»

-- [ Page 1 ] --





Minimal Indices for Successor Search

This thesis was submitted in partial fulfillment of the requirements for the M.Sc. degree in

the School of Computer Science, Tel-Aviv University by Sarel Cohen The research work for this thesis has been carried out at Tel-Aviv University under the supervision of Prof. Amos Fiat and Prof. Haim Kaplan December 2012 Acknowledgements I would like to thank my advisors Prof. Amos Fiat and Prof. Haim Kaplan for their help, guidance, and numerous contributions to this work. The many talks we had during the writing of this dissertation have inspired me to new and wonderful ideas.

I would like to thank my parents and my sister for their endless love, support and encouragement throughout my life. All the support you give me over the years is the greatest gift anyone has ever given me. Without your love and support this thesis wouldn’t have been possible.

I would like to thank Moshik Hershcovitch who coauthored the extended


of this work, for his many ideas and numerous contributions to this work.

I would like to thank Nir Shavit for introducing us to the problems of contention in multicore environments, for posing the question of multicore efficient data structures, and for many useful discussions. We also wish to thank Mikkel Thorup for his kindness and useful comments.

i Abstract We give a new successor data structure which improves upon the index size of the Pˇtra¸cu-Thorup as data structures, reducing the index size from O(nw4/5 ) bits to O(n log w) bits, with optimal probe complexity. Alternately, our new data structure can be viewed as matching the space complexity of the (probe-suboptimal) z-fast trie of Belazzougui et al. Thus, we get the best of both approaches with respect to both probe count and index size. The penalty we pay is an extra O(log w) inter- register operations. Our data structure can also be used to solve the weak prefix search problem, the index size of O(n log w) bits is known to be optimal for any such data structure.

The technical contributions include highly efficient single word indices, with out-degree w/ log w (compared to the w1/5 out-degree of fusion tree based indices). To construct such high efficiency single word indices we device highly efficient bit extractors which, we believe, are of independent interest.

ii Contents Abstract

–  –  –

Introduction A fundamental problem in data structures is the successor problem: given a RAM with w bit word operations, and n keys (each w bits long), give a data structure that answers successor queries efficiently.

Of particular interest are data structures with small index structures, sublinear in the input size, that can fit within a high speed cache.

The simplest successor data structure is a sorted list, this requires no index, and O(log n) probes for binary search. This is highly inefficient for large data sets. Memory addresses accessed are widely dispersed.

Given a high speed cache, one can improve performance by building a small index via x-fast tries [Wil83]: use a x-fast trie for a set of equally spaced “splitters”, ranks poly(w) apart. Such an “index” requires only O(n/poly(w)) bits. Unfortunately, the number of probes external to the index is still relatively large, O(log w) probes.

Another approach is to employ one or more levels of Fusion tree nodes, such nodes have outdegree B = w1/5. Done appropriately, this gives an index of size

–  –  –

bits, and only O(1) probes external to the index. This motivates searching for alternatives to Fusion tree nodes that have outdegree w1/5. This we do in this paper, obtaining outdegree w/ log w.

Yet another consideration, due to multicore shared cache contention issues, makes it desirable,

–  –  –

not only to reduce the index size, but also to reduce the number of probes to the index.

Pˇtra¸cu and Thorup [PT06] solve the successor problem optimally (to within an O(1) factor) as for any possible point along the probe count/space tradeoff, and for any value of n and w.

However, they do not distinguish between the space required to store the input (which is exactly nw bits) and the extra space required. They consider only the total space. This distinction is of interest when the index size (the extra space) is small, o(nw) bits.

Conversely, the probabilistic z-fast trie of Belazzougui et al. [BBPV09] reduces the extra space requirement to O(n log w) extra bits, but requires a (suboptimal) expected O(log w) probes (and O(log n) probes in the worst case).

We give a (deterministic) data structure that, for any n, w, answers a successor query with an optimal number of probes (within an O(1) factor), and requires only O(n log w) extra bits. We

–  –  –

word fusion-tree [FW93]. Analogously to the use of fusion-nodes, we make use of new single word indices, denoted by α, β, and γ nodes hereinafter. At the heart of our construction are single word indices with outdegree w/ log w (vs. outdegree w1/5 for fusion nodes).

–  –  –

We assume:

• A RAM model of computation with w bits per word.

• Generally, we assume word operations take O(1) time. We give several alternative constructions, our bit extractors and our penultimate γ-node construction does not require multiplication and meets the restrictions of “a Practical RAM”, as defined by Peter Bro Miltersen,

–  –  –

• A key (or query) is one word (w bits long).

Chapter 2 Related Work

2.1 The Successor Problem Previous work on the successor problem include van Emde Boas priority queues [vEB77], Fusion trees, due to Fredman and Willard, ([FW93]), the Beame and Fich successor data structure [PF99], x-fast tries and y-fast tries, due to Willard, ([Wil83]) z-fast tries due to Belazzougui et al.,

–  –  –

See Table 2.1 for a comparison between these data structures with respect to the space and probe parameters under consideration. See Figure 2.1 for a graphical description of the minimal probe complexity as a function of n, w.

2.1.1 Blind Tries and Important Bits Blind tries (such as PATRICIA, [Mor68]) which can be used to answer successor queries, were devised to prevent excessive access to external memory. Blind tries delay making comparisons as long as possible by skipping over “unimportant” segments of the search pattern. See Figure 2.2.

Given some set of binary bit strings, S, consider the blind trie whose leaves are the elements of S. Define the “important” indices to be the indices of bits used to branch (somewhere) in the trie.

The number of important indices for S is no more than |S|.

–  –  –

Table 2.1: Comparison of Probe and Space requirements of various data structures for the successor problem.

Word length is w. Adding layers of γ-nodes to Fusion Trees, {x, y}-fast tries, or any of the probe-optimal Pˇtra¸cu & Thorup constructions, reduces the index space requirements to as O(n log w) bits while preserving the probe complexity, and increasing the number of operations by + log w.

–  –  –

Fusion Trees [FW93] are successor data structures that answer successor queries in time O(log n/ log w).

A fusion node, used in a Fusion tree, occupies one w bit word and can be used to index w1/5 keys in O(1) time. Like blind tries, the search is based on extracting and comparing “important bits” rather than the entire query.

In our terminology (to be defined in section 3.3, Fusion trees apply a (w1/5, w4/5 ) bit-extractor to every key, producing a “sketch” of length w4/5 that contains w1/5 important bits of the key. The sketches of many keys (i.e., w/w4/5 = w1/5 keys) are packed into a single w bit word, this forms a fusion node.

The key observation is that multiple parallel operations can be performed on all these w1/5 sketches by single word operation. Omitting lots of (critical) detail, this allows Fusion trees to search amongst these w1/5 keys in O(1) time, resulting in the O(log n/ log w) bound above.

Note that the shorter the sketch, the more sketches can be packed into a word, the greater the parallelism and the greater the resulting speedup.

We remark that Andersson, Miltersen, and Thorup [AMT99] give an AC(0) implementation

–  –  –

produces a sketch of length k containing k bits of the key). Ignoring other difficulties, computing a [perfect] sketch in AC(0) is easy: just lead wires connecting the source bits to the target bits. With this interpretation, our bit-extractor is a software implementation in O(log w) time that implements the special purpose hardware implementation of [AMT99].

2.2 Implicit and Succinct Data Structures Space efficient data structures have been extensively studied. Implicit data structures are data structures that require no extra space other than the space required for the data itself (see e.g.

–  –  –

Figure 2.2: A Compact Trie and a Blind Trie.

Testing bits 1,3,5,6, and 7 of a query will produce a candidate successor. As not all bits have been considered, this requires verification.

o(1) of the information theoretic lower bound required to represent the data.

A motivation for small extra space is to devise data structures where most of the memory probes are to a small set of addresses, the so-called working set. When this set fits in fast memory then overall performance is improved.

2.3 I/O Efficient Data Structures I/O efficient data structures [Vit06] seek to minimize the number of data transfers between different levels of memory. Cache oblivious data structures [Bro04] are designed to achieve the same goal even when the architecture of the memory hierarchy is unknown.

2.4 Reducing Probe Complexity: New Motivation from Multicore

–  –  –

Multicore architectures pose new performance challenges [Dre07, Sha11]. When multiple cores access shared cache/memory, contention arises. Such contention is deviously problematic because it may cause serialization of memory accesses, making a mockery of multicore parallelism. Multiple memory banks and other hardware are attempts to deal with such problems, to various degrees.

–  –  –

number of probes within the index, become even more critical in the multicore setting.

Chapter 3 Our Results

3.1 A Succinct, Probe-Optimal Data-Structure for Successor Search In this paper we give data structures for successor queries that greatly improve upon the size of the index over previous designs, are optimal with respect to the total number of probes, and perform O(1) probes outside the index (one needs one extraneous probe simply so as to access the successor). The total number of operations required is bounded by the number of probes plus an extra O(log w) (inter-register) operations.

Fundamental to our construction are high outdegree single word indices to reduce space requirements and probes. We give three such variants: α nodes, β nodes, and γ nodes. Each of these index w/ log w keys and answer successor queries using only O(1) w-bit words, O(log w) time, and O(1) extra-index probes (in expectation for α and β nodes, worst case for γ nodes).

Combining such α, β, or γ nodes with space efficient data structures such as Fusion Trees to index the buckets, gives us successor queries requiring only O(log n/ log w) probes and much reduced index size, while paying a minor increase in the number of inter-register operations (O(log w)), see last row of Table 2.1.

If the top levels consist of a y-fast-trie data structure, we get an improvement upon the recently introduced [probabilistic] z-fast-trie, [BBPV09, BBPV08], see Chapter 7 (we omit the “probabilistic” prefix hereinafter). The worst-case query time and probes improves from O(log n) to O(1) by using γ-nodes within the last-level buckets, and the query becomes deterministic.

–  –  –

The α node is simply a z-fast trie applied to w/ log w keys. Given the small number of keys, the zfast trie can be simplified. A major component of the z-fast trie involves translating a prefix match (between a query and a trie node) to the query rank. As there are only w/ log w keys involved, we can discard this part of the z-fast trie and store ranks explicitly in O(1) words.

One drawback of the original z-fast trie was that the worst case number of probes, external to the index, was O(log n). Therefore, when using α nodes, the worst case number of probes drops to O(log w).

Based on a different set of ideas, β nodes are arguably simpler than the z-fast trie, and have the same performance as the α nodes.

Our last variant, γ nodes, has the advantages that it is deterministic and gives worst case O(1) non-index probes, and, furthermore, requires no multiplication.

We introduce highly efficient bit-extractors and use them to build γ-nodes. Essentially, a bitextractor selects a multiset of bits from a binary input string and outputs a rearrangement of these bits within a shorter output string.

3.3 Introducing Bit-Extractors and Building a (k, k)-Bit Extractor A (k, L) bit-extractor, 1 ≤ k ≤ L ≤ w, consists of a preprocessing phase and a query phase, (see

Figure 3.1):

• The preprocessing phase: The input is a sequence of length k (with repetitions),

–  –  –

Figure 3.1: A (k, L) bit-extractor, 1 ≤ k ≤ L ≤ w.

Pages:   || 2 | 3 | 4 |

Similar works:

«International Conference: L’action d’art (The Action of Art) Concept: Paola Yacoub Michel Lasserre In partnership with l’Académie Libanaise des Beauxarts – Université de Balamand Friday 15 and Saturday 16 April, 16:00 to 20:00 Auditorium, Level 2 In English and French French interventions are indicated in the schedule Free admission A century ago, the journal L’action d’art (The Action of Art) was founded by Gérard de LacazeDuthiers, a 1 professor, publisher, curator and...»

«April 1, 2016 Dear Boston AGO Member, April begins the election cycle for the Boston Chapter Executive Committee members for the 2016-2017 year. According to our by-laws, you must be informed 3 weeks before the official election begins of the candidates for election. The slate of BAGO candidates is attached for your consideration. On or before May 1, 2016 you will receive an official electronic ballot. Voting closes May 14, 2016. I would ask that you cast your vote as soon as possible. If you...»


«ACT Government Gazette Gazetted Notices for the week beginning 15 May 2014 Published by Shared Services | 22 May 2014 | © Australian Capital Territory, Canberra, 2014 ACT Government Gazette | 22 May 2014 EXECUTIVE NOTICES Education and Training Contract Cessation Note: The following Executive has been issued with a new contract. This notification is in accordance with the provisions of section 81 of the Public Sector Management Act 1994. Joanne Garrisson – Director Information,...»

«PROTOCOL ON THE STATUTE OF THE AFRICAN COURT OF JUSTICE AND HUMAN RIGHTS TABLE OF CONTENTS PROTOCOL PREAMBLE Chapter I: Merger of The African Court on Human and Peoples’ Rights and The Court of Justice of The African Union Article 1 – Replacement of the 1998 and 2003 Protocols Article 2 – Establishment of a Single Court Article 3 – Reference to the single Court in the Constitutive Act Chapter II: Transitional Provisions Article 4 – Term of Office of the Judges of the African Court on...»

«11 September 2015 DISCUSSION PAPER How to deal with Europe’s British problem Andrew Duff This paper argues that the UK government's renegotiation bid is too feeble to be deserving of concessions by its EU partners, but that the rest of the EU can give the British what they seem to want by pressing on themselves to federal union. Is David Cameron really serious about renegotiating the terms of the British membership? One has to begin to ask the question. For much has been boasted but little...»

«Starlight Ice Dance Club of the Twin Cities July 1, 2016 – June 30, 2017 Membership Application Personal Information U.S Figure Skating Member Number * Skater Last Name * Skater First Name * Date of Birth * Gender: Female Male * U.S. Citizen: Yes No (mm/dd/yyyy) * Address Check here if new address: * City * State * Zip Code Home Phone Work Phone Parent/Guardian Name ** Email (If skater under age 18) ** Club communication is sent via email; check this box if you * Item is required for...»

«3 He a lt h 2 2 Fire 0 3 0 Re a c t iv it y P e rs o n a l P ro t e c t io n Material Safety Data Sheet Formic acid, 85%, F.C.C MSDS Section 1: Chemical Product and Company Identification Product Name: Formic acid, 85%, F.C.C Contact Information: Sciencelab.com, Inc. Catalog Codes: SLF1387 14025 Smith Rd. CAS#: Mixture. Houston, Texas 77396 US Sales: 1-800-901-7247 RTECS: Not applicable. International Sales: 1-281-441-4400 TSCA: TSCA 8(b) inventory: Formic acid; Water Order Online:...»

«MARINE INVESTIGATION REPORT M04L0092 GROUNDING CONTAINER SHIP HORIZON SAINTE-ANNE-DE-SOREL, QUEBEC 24 JULY 2004 The Transportation Safety Board of Canada (TSB) investigated this occurrence for the purpose of advancing transportation safety. It is not the function of the Board to assign fault or determine civil or criminal liability. Marine Investigation Report Grounding Container Ship Horizon Sainte-Anne-de-Sorel, Quebec 24 July 2004 Report Number M04L0092 Summary In the early morning hours of...»

«Adrenal Cancer Overview The information that follows is an overview of this type of cancer. For more detailed information, call 1-800-227-2345. Or visit our Web site at www.cancer.org. What is cancer? The body is made up of hundreds of millions of living cells. Normal body cells grow, divide, and die in an orderly way. During the early years of a person's life, normal cells divide faster to allow the person to grow. After the person becomes an adult, most cells divide only to replace worn-out,...»

«Rare lung cancers There are several different kinds of lung cancer, often referred to as lung cancer subtypes. Some of these occur more often than others. In this factsheet we will specifically look at the subtypes of cancers that do not happen very often and are considered ‘rare’. People’s experiences of these rarer types of lung cancer will generally be similar to those of people with more common forms of the condition. However, there are some differences in treatment and outlook....»

«ASA University Review, Vol. 6 No. 1, January–June, 2012 Disclosure of Financial Reporting and Firm Structure as a Determinant: A Study on the Listed Companies of DSE Alim Al Ayub Ahmed* Abstract The paper examines effect of firms’ structure on the financial reporting quality of Bangladesh quoted manufacturing firms using assets, leverage and share dispersion and residuals of modified EBO model as proxies of firm’s structure and financial reporting quality respectively. The data is...»

<<  HOME   |    CONTACTS
2016 www.dissertation.xlibx.info - Dissertations, online materials

Materials of this site are available for review, all rights belong to their respective owners.
If you do not agree with the fact that your material is placed on this site, please, email us, we will within 1-2 business days delete him.