Summary of Indexing and Hashing

By Ahmed Farahat - May 09, 2015

Summary

• Many queries reference only a small proportion of the records in a ﬁle. To reduce the overhead in searching for these records, we can construct indices for the ﬁles that store the database.

• Index-sequential ﬁles are one of the oldest index schemes used in database systems. To permit fast retrieval of records in search-key order, records are stored sequentially, and out-of-order records are chained together. To allow fast random access, we use an index structure.

• There are two types of indices that we can use: dense indices and sparse indices. Dense indices contain entries for every search-key value, whereas sparse indices contain entries only for some search-key values.

• If the sort order of a search key matches the sort order of a relation, an index on the search key is called a primary index. The other indices are called secondary indices. Secondary indices improve the performance of queries that use search keys other than the primary one. However, they impose an overhead on modiﬁcation of the database.

• The primary disadvantage of the index-sequential ﬁle organization is that performance degrades as the ﬁle grows. To overcome this deﬁciency, we can use a B+-tree index.

• A B+-tree index takes the form of a balanced tree, in which every path from the root of the tree to a leaf of the tree is of the same length. The height of a B+- tree is proportional to the logarithm to the base N of the number of records in the relation, where each nonleaf node stores N pointers; the value of N is often around 50 or 100. B+-trees are much shorter than other balanced binary- tree structures such as AVL trees, and therefore require fewer disk accesses to locate records.

• Lookup on B+-trees is straightforward and efﬁcient. Insertion and deletion, however, are somewhat more complicated, but still efﬁcient. The number of operations required for lookup, insertion, and deletion on B+-trees is proportional to the logarithm to the base N of the number of records in the relation, where each nonleaf node stores N pointers.

• We can use B+-trees for indexing a ﬁle containing records, as well as to organize records into a ﬁle.

• B-tree indices are similar to B+-tree indices. The primary advantage of a B-tree is that the B-tree eliminates the redundant storage of search-key values. The major disadvantages are overall complexity and reduced fanout for a given node size. System designers almost universally prefer B+-tree indices over B- tree indices in practice.

• Sequential ﬁle organizations require an index structure to locate data. File organizations based on hashing, by contrast, allow us to ﬁnd the address of a data item directly by computing a function on the search-key value of the de- sired record. Since we do not know at design time precisely which search-key values will be stored in the ﬁle, a good hash function to choose is one that as- signs search-key values to buckets such that the distribution is both uniform and random.

• Static hashing uses hash functions in which the set of bucket addresses is ﬁxed. Such hash functions cannot easily accommodate databases that grow signiﬁcantly larger over time. There are several dynamic hashing techniques that allow the hash function to be modiﬁed. One example is extendable hashing, which copes with changes in database size by splitting and coalescing buckets as the database grows and shrinks.

• We can also use hashing to create secondary indices; such indices are called hash indices. For notational convenience, we assume hash ﬁle organizations have an implicit hash index on the search key used for hashing.

• Ordered indices such as B+-trees and hash indices can be used for selections based on equality conditions involving single attributes. When multiple attributes are involved in a selection condition, we can intersect record identiﬁers retrieved from multiple indices.

• Grid ﬁles provide a general means of indexing on multiple attributes.

• Bitmap indices provide a very compact representation for indexing attributes with very few distinct values. Intersection operations are extremely fast on bitmaps, making them ideal for supporting queries on multiple attributes.

Search This Blog

Database Management System course

Summary of Indexing and Hashing

Summary

Comments

Post a Comment

Popular posts from this blog

Database System Architectures:Parallel Systems.

XML Document Schema

DATABASE DESIGN -2 part2