# «Minimal Indices for Successor Search This thesis was submitted in partial fulﬁllment of the requirements for the M.Sc. degree in the School of ...»

## TEL-AVIV UNIVERSITY

## RAYMOND AND BEVERLY SACKLER

## FACULTY OF EXACT SCIENCES

## SCHOOL OF COMPUTER SCIENCE

Minimal Indices for Successor Search

This thesis was submitted in partial fulﬁllment of the requirements for the M.Sc. degree in

the School of Computer Science, Tel-Aviv University by Sarel Cohen The research work for this thesis has been carried out at Tel-Aviv University under the supervision of Prof. Amos Fiat and Prof. Haim Kaplan December 2012 Acknowledgements I would like to thank my advisors Prof. Amos Fiat and Prof. Haim Kaplan for their help, guidance, and numerous contributions to this work. The many talks we had during the writing of this dissertation have inspired me to new and wonderful ideas.

I would like to thank my parents and my sister for their endless love, support and encouragement throughout my life. All the support you give me over the years is the greatest gift anyone has ever given me. Without your love and support this thesis wouldn’t have been possible.

I would like to thank Moshik Hershcovitch who coauthored the extended

**Abstract**

of this work, for his many ideas and numerous contributions to this work.

I would like to thank Nir Shavit for introducing us to the problems of contention in multicore environments, for posing the question of multicore eﬃcient data structures, and for many useful discussions. We also wish to thank Mikkel Thorup for his kindness and useful comments.

i Abstract We give a new successor data structure which improves upon the index size of the Pˇtra¸cu-Thorup as data structures, reducing the index size from O(nw4/5 ) bits to O(n log w) bits, with optimal probe complexity. Alternately, our new data structure can be viewed as matching the space complexity of the (probe-suboptimal) z-fast trie of Belazzougui et al. Thus, we get the best of both approaches with respect to both probe count and index size. The penalty we pay is an extra O(log w) inter- register operations. Our data structure can also be used to solve the weak preﬁx search problem, the index size of O(n log w) bits is known to be optimal for any such data structure.

The technical contributions include highly eﬃcient single word indices, with out-degree w/ log w (compared to the w1/5 out-degree of fusion tree based indices). To construct such high eﬃciency single word indices we device highly eﬃcient bit extractors which, we believe, are of independent interest.

ii Contents Abstract

Introduction A fundamental problem in data structures is the successor problem: given a RAM with w bit word operations, and n keys (each w bits long), give a data structure that answers successor queries eﬃciently.

Of particular interest are data structures with small index structures, sublinear in the input size, that can ﬁt within a high speed cache.

The simplest successor data structure is a sorted list, this requires no index, and O(log n) probes for binary search. This is highly ineﬃcient for large data sets. Memory addresses accessed are widely dispersed.

Given a high speed cache, one can improve performance by building a small index via x-fast tries [Wil83]: use a x-fast trie for a set of equally spaced “splitters”, ranks poly(w) apart. Such an “index” requires only O(n/poly(w)) bits. Unfortunately, the number of probes external to the index is still relatively large, O(log w) probes.

Another approach is to employ one or more levels of Fusion tree nodes, such nodes have outdegree B = w1/5. Done appropriately, this gives an index of size

bits, and only O(1) probes external to the index. This motivates searching for alternatives to Fusion tree nodes that have outdegree w1/5. This we do in this paper, obtaining outdegree w/ log w.

Yet another consideration, due to multicore shared cache contention issues, makes it desirable,

not only to reduce the index size, but also to reduce the number of probes to the index.

Pˇtra¸cu and Thorup [PT06] solve the successor problem optimally (to within an O(1) factor) as for any possible point along the probe count/space tradeoﬀ, and for any value of n and w.

However, they do not distinguish between the space required to store the input (which is exactly nw bits) and the extra space required. They consider only the total space. This distinction is of interest when the index size (the extra space) is small, o(nw) bits.

Conversely, the probabilistic z-fast trie of Belazzougui et al. [BBPV09] reduces the extra space requirement to O(n log w) extra bits, but requires a (suboptimal) expected O(log w) probes (and O(log n) probes in the worst case).

We give a (deterministic) data structure that, for any n, w, answers a successor query with an optimal number of probes (within an O(1) factor), and requires only O(n log w) extra bits. We

word fusion-tree [FW93]. Analogously to the use of fusion-nodes, we make use of new single word indices, denoted by α, β, and γ nodes hereinafter. At the heart of our construction are single word indices with outdegree w/ log w (vs. outdegree w1/5 for fusion nodes).

**We assume:**

• A RAM model of computation with w bits per word.

• Generally, we assume word operations take O(1) time. We give several alternative constructions, our bit extractors and our penultimate γ-node construction does not require multiplication and meets the restrictions of “a Practical RAM”, as deﬁned by Peter Bro Miltersen,

• A key (or query) is one word (w bits long).

Chapter 2 Related Work

2.1 The Successor Problem Previous work on the successor problem include van Emde Boas priority queues [vEB77], Fusion trees, due to Fredman and Willard, ([FW93]), the Beame and Fich successor data structure [PF99], x-fast tries and y-fast tries, due to Willard, ([Wil83]) z-fast tries due to Belazzougui et al.,

See Table 2.1 for a comparison between these data structures with respect to the space and probe parameters under consideration. See Figure 2.1 for a graphical description of the minimal probe complexity as a function of n, w.

2.1.1 Blind Tries and Important Bits Blind tries (such as PATRICIA, [Mor68]) which can be used to answer successor queries, were devised to prevent excessive access to external memory. Blind tries delay making comparisons as long as possible by skipping over “unimportant” segments of the search pattern. See Figure 2.2.

Given some set of binary bit strings, S, consider the blind trie whose leaves are the elements of S. Deﬁne the “important” indices to be the indices of bits used to branch (somewhere) in the trie.

The number of important indices for S is no more than |S|.

** Table 2.1: Comparison of Probe and Space requirements of various data structures for the successor problem.**

Word length is w. Adding layers of γ-nodes to Fusion Trees, {x, y}-fast tries, or any of the probe-optimal Pˇtra¸cu & Thorup constructions, reduces the index space requirements to as O(n log w) bits while preserving the probe complexity, and increasing the number of operations by + log w.

Fusion Trees [FW93] are successor data structures that answer successor queries in time O(log n/ log w).

A fusion node, used in a Fusion tree, occupies one w bit word and can be used to index w1/5 keys in O(1) time. Like blind tries, the search is based on extracting and comparing “important bits” rather than the entire query.

In our terminology (to be deﬁned in section 3.3, Fusion trees apply a (w1/5, w4/5 ) bit-extractor to every key, producing a “sketch” of length w4/5 that contains w1/5 important bits of the key. The sketches of many keys (i.e., w/w4/5 = w1/5 keys) are packed into a single w bit word, this forms a fusion node.

The key observation is that multiple parallel operations can be performed on all these w1/5 sketches by single word operation. Omitting lots of (critical) detail, this allows Fusion trees to search amongst these w1/5 keys in O(1) time, resulting in the O(log n/ log w) bound above.

Note that the shorter the sketch, the more sketches can be packed into a word, the greater the parallelism and the greater the resulting speedup.

We remark that Andersson, Miltersen, and Thorup [AMT99] give an AC(0) implementation

produces a sketch of length k containing k bits of the key). Ignoring other diﬃculties, computing a [perfect] sketch in AC(0) is easy: just lead wires connecting the source bits to the target bits. With this interpretation, our bit-extractor is a software implementation in O(log w) time that implements the special purpose hardware implementation of [AMT99].

2.2 Implicit and Succinct Data Structures Space eﬃcient data structures have been extensively studied. Implicit data structures are data structures that require no extra space other than the space required for the data itself (see e.g.

** Figure 2.2: A Compact Trie and a Blind Trie.**

Testing bits 1,3,5,6, and 7 of a query will produce a candidate successor. As not all bits have been considered, this requires veriﬁcation.

o(1) of the information theoretic lower bound required to represent the data.

A motivation for small extra space is to devise data structures where most of the memory probes are to a small set of addresses, the so-called working set. When this set ﬁts in fast memory then overall performance is improved.

2.3 I/O Eﬃcient Data Structures I/O eﬃcient data structures [Vit06] seek to minimize the number of data transfers between diﬀerent levels of memory. Cache oblivious data structures [Bro04] are designed to achieve the same goal even when the architecture of the memory hierarchy is unknown.

2.4 Reducing Probe Complexity: New Motivation from Multicore

Multicore architectures pose new performance challenges [Dre07, Sha11]. When multiple cores access shared cache/memory, contention arises. Such contention is deviously problematic because it may cause serialization of memory accesses, making a mockery of multicore parallelism. Multiple memory banks and other hardware are attempts to deal with such problems, to various degrees.

number of probes within the index, become even more critical in the multicore setting.

Chapter 3 Our Results

3.1 A Succinct, Probe-Optimal Data-Structure for Successor Search In this paper we give data structures for successor queries that greatly improve upon the size of the index over previous designs, are optimal with respect to the total number of probes, and perform O(1) probes outside the index (one needs one extraneous probe simply so as to access the successor). The total number of operations required is bounded by the number of probes plus an extra O(log w) (inter-register) operations.

Fundamental to our construction are high outdegree single word indices to reduce space requirements and probes. We give three such variants: α nodes, β nodes, and γ nodes. Each of these index w/ log w keys and answer successor queries using only O(1) w-bit words, O(log w) time, and O(1) extra-index probes (in expectation for α and β nodes, worst case for γ nodes).

Combining such α, β, or γ nodes with space eﬃcient data structures such as Fusion Trees to index the buckets, gives us successor queries requiring only O(log n/ log w) probes and much reduced index size, while paying a minor increase in the number of inter-register operations (O(log w)), see last row of Table 2.1.

If the top levels consist of a y-fast-trie data structure, we get an improvement upon the recently introduced [probabilistic] z-fast-trie, [BBPV09, BBPV08], see Chapter 7 (we omit the “probabilistic” preﬁx hereinafter). The worst-case query time and probes improves from O(log n) to O(1) by using γ-nodes within the last-level buckets, and the query becomes deterministic.

The α node is simply a z-fast trie applied to w/ log w keys. Given the small number of keys, the zfast trie can be simpliﬁed. A major component of the z-fast trie involves translating a preﬁx match (between a query and a trie node) to the query rank. As there are only w/ log w keys involved, we can discard this part of the z-fast trie and store ranks explicitly in O(1) words.

One drawback of the original z-fast trie was that the worst case number of probes, external to the index, was O(log n). Therefore, when using α nodes, the worst case number of probes drops to O(log w).

Based on a diﬀerent set of ideas, β nodes are arguably simpler than the z-fast trie, and have the same performance as the α nodes.

Our last variant, γ nodes, has the advantages that it is deterministic and gives worst case O(1) non-index probes, and, furthermore, requires no multiplication.

We introduce highly eﬃcient bit-extractors and use them to build γ-nodes. Essentially, a bitextractor selects a multiset of bits from a binary input string and outputs a rearrangement of these bits within a shorter output string.

3.3 Introducing Bit-Extractors and Building a (k, k)-Bit Extractor A (k, L) bit-extractor, 1 ≤ k ≤ L ≤ w, consists of a preprocessing phase and a query phase, (see

**Figure 3.1):**

• The preprocessing phase: The input is a sequence of length k (with repetitions),

** Figure 3.1: A (k, L) bit-extractor, 1 ≤ k ≤ L ≤ w.**