Query Optimization:Materialized Views
Materialized Views∗∗
When a view is defined, normally the database stores only the query defining the view. In contrast, a materialized view is a view whose contents are computed and stored. Materialized views constitute redundant data, in that their contents can be inferred from the view definition and the rest of the database contents. However, it is much cheaper in many cases to read the contents of a materialized view than to compute the contents of the view by executing the query defining the view.
Materialized views are important for improving performance in some applications. Consider this view, which gives the total loan amount at each branch:
Suppose the total loan amount at the branch is required frequently (before making a new loan, for example). Computing the view requires reading every loan tuple pertaining to the branch, and summing up the loan amounts, which can be time- consuming.
In contrast, if the view definition of the total loan amount were materialized, the total loan amount could be found by looking up a single tuple in the materialized view.
View Maintenance
A problem with materialized views is that they must be kept up-to-date when the data used in the view definition changes. For instance, if the amount value of a loan is updated, the materialized view would become inconsistent with the underlying data, and must be updated. The task of keeping a materialized view up-to-date with the underlying data is known as view maintenance.
Views can be maintained by manually written code: That is, every piece of code that updates the amount value of a loan can be modified to also update the total loan amount for the corresponding branch.
Another option for maintaining materialized views is to define triggers on insert, delete, and update of each relation in the view definition. The triggers must modify the contents of the materialized view, to take into account the change that caused the trigger to fire. A simplistic way of doing so is to completely recompute the materialized view on every update.
A better option is to modify only the affected parts of the materialized view, which is known as incremental view maintenance. We describe how to perform incremental view maintenance in Section 14.5.2.
Modern database systems provide more direct support for incremental view maintenance. Database system programmers no longer need to define triggers for view maintenance. Instead, once a view is declared to be materialized, the database system computes the contents of the view, and incrementally updates the contents when the underlying data changes.
Incremental View Maintenance
To understand how to incrementally maintain materialized views, we start off by considering individual operations, and then see how to handle a complete expression.
The changes to a relation that can cause a materialized view to become out-of-date are inserts, deletes, and updates. To simplify our description, we replace updates to a tuple by deletion of the tuple followed by insertion of the updated tuple. Thus, we need to consider only inserts and deletes. The changes (inserts and deletes) to a relation or expression are referred to as its differential.
Join Operation
Projection is a more difficult operation with which to deal. Consider a materialized view v = ΠA(r). Suppose the relation r is on the schema R = (A, B), and r contains two tuples (a, 2) and (a, 3). Then, ΠA(r) has a single tuple (a). If we delete the tuple (a, 2) from r, we cannot delete the tuple (a) from ΠA(r): If we did so, the result would be an empty relation, whereas in reality ΠA(r) still has a single tuple (a). The reason is that the same tuple (a) is derived in two ways, and deleting one tuple from r removes only one of the ways of deriving (a); the other is still present.
This reason also gives us the intuition for solution: For each tuple in a projection such as ΠA(r), we will keep a count of how many times it was derived.
When a set of tuples dr is deleted from r, for each tuple t in dr we do the following. Let t.A denote the projection of t on the attribute A. We find (t.A) in the materialized view, and decrease the count stored with it by 1. If the count becomes 0, (t.A) is deleted from the materialized view.
Handling insertions is relatively straightforward. When a set of tuples ir is in- serted into r, for each tuple t in ir we do the following. If (t.A) is already present in the materialized view, we increase the count stored with it by 1. If not, we add (t.A) to the materialized view, with the count set to 1.
Aggregation Operations
Aggregation operations proceed somewhat like projections. The aggregate operations in SQL are count, sum, avg, min, and max:
• count: Consider a materialized view v = AGcount(B)(r), which computes the count of the attribute B, after grouping r by attribute A.
When a set of tuples ir is inserted into r, for each tuple t in ir we do the following. We look for the group t.A in the materialized view. If it is not present, we add (t.A, 1) to the materialized view. If the group t.A is present, we add 1 to the count of the group.
When a set of tuples dr is deleted from r, for each tuple t in dr we do the following. We look for the group t.A in the materialized view, and subtract 1 from the count for the group. If the count becomes 0, we delete the tuple for the group t.A from the materialized view.
• sum: Consider a materialized view v = AGsum(B)(r).
When a set of tuples ir is inserted into r, for each tuple t in ir we do the following. We look for the group t.A in the materialized view. If it is not present, we add (t.A, t.B) to the materialized view; in addition, we store a count of 1 associated with (t.A, t.B), just as we did for projection. If the group t.A is present, we add the value of t.B to the aggregate value for the group, and add 1 to the count of the group.
When a set of tuples dr is deleted from r, for each tuple t in dr we do the following. We look for the group t.A in the materialized view, and subtract t.B from the aggregate value for the group. We also subtract 1 from the count for the group, and if the count becomes 0, we delete the tuple for the group t.A from the materialized view.
Without keeping the extra count value, we would not be able to distinguish a case where the sum for a group is 0 from the case where the last tuple in a group is deleted.
• avg: Consider a materialized view v = AGavg(B)(r).
Directly updating the average on an insert or delete is not possible, since it depends not only on the old average and the tuple being inserted/deleted, but also on the number of tuples in the group.
Instead, to handle the case of avg, we maintain the sum and count aggre- gate values as described earlier, and compute the average as the sum divided by the count.
• min, max: Consider a materialized view v = AGmin(B)(r). (The case of max is exactly equivalent.)
Handling insertions on r is straightforward. Maintaining the aggregate values min and max on deletions may be more expensive. For example, if the tuple corresponding to the minimum value for a group is deleted from r, we have to look at the other tuples of r that are in the same group to find the new minimum value.
Other Operations
The set operation intersection is maintained as follows. Given materialized view v = r ∩ s, when a tuple is inserted in r we check if it is present in s, and if so we add it to v. If a tuple is deleted from r, we delete it from the intersection if it is present.
The other set operations, union and set difference, are handled in a similar fashion; we leave details to you.
Outer joins are handled in much the same way as joins, but with some extra work.
In the case of deletion from r we have to handle tuples in s that no longer match any tuple in r. In the case of insertion to r, we have to handle tuples in s that did not match any tuple in r. Again we leave details to you.
Handling Expressions
So far we have seen how to update incrementally the result of a single operation. To handle an entire expression, we can derive expressions for computing the incremental change to the result of each subexpression, starting from the smallest subexpressions.
For example, suppose we wish to incrementally update a materialized view E1 E2 when a set of tuples ir is inserted into relation r. Let us assume r is used in E1 alone. Suppose the set of tuples to be inserted into E1 is given by expression D1. Then the expression D1 E2 gives the set of tuples to be inserted into E1 E2.
See the bibliographical notes for further details on incremental view maintenance with expressions.
Query Optimization and Materialized Views
Query optimization can be performed by treating materialized views just like regular relations. However, materialized views offer further opportunities for optimization:
• Rewriting queries to use materialized views:
query plan than optimizing the query as submitted. Thus, it is the job of the query optimizer to recognize when a materialized view can be used to speed up a query.
• Replacing a use of a materialized view by the view definition:
Suppose a materialized view v = r s is available, but without any index on it, and a user submits a query σA=10(v). Suppose also that s has an index on the common attribute B, and r has an index on attribute A. The best plan for this query may be to replace v by r s, which can lead to the query plan σA=10(r) s; the selection and join can be performed efficiently by using the indices on r.A and s.B, respectively. In contrast, evaluating the selection directly on v may require a full scan of v, which may be more expensive.
The bibliographical notes give pointers to research showing how to efficiently per- form query optimization with materialized views.
Another related optimization problem is that of materialized view selection, namely, “What is the best set of views to materialize?” This decision must be made on the basis of the system workload, which is a sequence of queries and updates that reflects the typical load on the system. One simple criterion would be to select a set of materialized views that minimizes the overall execution time of the workload of queries and updates, including the time taken to maintain the materialized views. Database administrators usually modify this criterion to take into account the importance of different queries and updates: Fast response may be required for some queries and updates, but a slow response may be acceptable for others.
Indices are just like materialized views, in that they too are derived data, can speed up queries, and may slow down updates. Thus, the problem of index selection is closely related, to that of materialized view selection, although it is simpler.
We examine these issues in more detail in Sections 21.2.5 and 21.2.6.
Some database systems, such as Microsoft SQL Server 7.5, and the RedBrick Data Warehouse from Informix, provide tools to help the database administrator with index and materialized view selection. These tools examine the history of queries and updates, and suggest indices and views to be materialized.
Comments
Post a Comment