The Relational Algebra
The Relational Algebra
The relational algebra is a procedural query language. It consists of a set of operations that take one or two relations as input and produce a new relation as their result. The fundamental operations in the relational algebra are select, project, union, set difference, Cartesian product, and rename. In addition to the fundamental operations, there are several other operations — namely, set intersection, natural join, division, and assignment. We will define these operations in terms of the fundamental operations.
Fundamental Operations
The select, project, and rename operations are called unary operations, because they operate on one relation. The other three operations operate on pairs of relations and are, therefore, called binary operations.
The Select Operation
The select operation selects tuples that satisfy a given predicate. We use the lowercase Greek letter sigma (σ) to denote selection. The predicate appears as a subscript to σ. The argument relation is in parentheses after the σ. Thus, to select those tuples of the loan relation where the branch is “Perryridge,” we write
If the loan relation is as shown in Figure 3.6, then the relation that results from the preceding query is as shown in Figure 3.10.
We can find all tuples in which the amount lent is more than $1200 by writing
σamount >1200 (loan)
In general, we allow comparisons using =, /=, <, ≤, >, ≥ in the selection predicate. Furthermore, we can combine several predicates into a larger predicate by using the connectives and (∧), or (∨), and not (¬). Thus, to find those tuples pertaining to loans of more than $1200 made by the Perryridge branch, we write
The selection predicate may include comparisons between two attributes. To illustrate, consider the relation loan-officer that consists of three attributes: customer-name, banker-name, and loan-number, which specifies that a particular banker is the loan officer for a loan that belongs to some customer. To find all customers who have the same name as their loan officer, we can write
The Project Operation
Suppose we want to list all loan numbers and the amount of the loans, but do not care about the branch name. The project operation allows us to produce this relation. The project operation is a unary operation that returns its argument relation, with certain attributes left out. Since a relation is a set, any duplicate rows are eliminated. Projection is denoted by the uppercase Greek letter pi (Π). We list those attributes that we wish to appear in the result as a subscript to Π. The argument relation follows in parentheses. Thus, we write the query to list all loan numbers and the amount of the loan as
Composition of Relational Operations
The fact that the result of a relational operation is itself a relation is important. Con- sider the more complicated query “Find those customers who live in Harrison.” We write:
Πcustomer -name (σcustomer -city = “Harrison” (customer ))
Notice that, instead of giving the name of a relation as the argument of the projection operation, we give an expression that evaluates to a relation.
In general, since the result of a relational-algebra operation is of the same type (relation) as its inputs, relational-algebra operations can be composed together into
a relational-algebra expression. Composing relational-algebra operations into relational-algebra expressions is just like composing arithmetic operations (such as +, −, ∗, and ÷) into arithmetic expressions. We study the formal definition of relational- algebra expressions in Section 3.2.2.
The Union Operation
Consider a query to find the names of all bank customers who have either an account or a loan or both. Note that the customer relation does not contain the information, since a customer does not need to have either an account or a loan at the bank. To answer this query, we need the information in the depositor relation (Figure 3.5) and in the borrower relation (Figure 3.7). We know how to find the names of all customers with a loan in the bank:
To answer the query, we need the union of these two sets; that is, we need all customer names that appear in either or both of the two relations. We find these data by the binary operation union, denoted, as in set theory, by ∪. So the expression needed is
The result relation for this query appears in Figure 3.12. Notice that there are 10 tuples in the result, even though there are seven distinct borrowers and six depositors. This apparent discrepancy occurs because Smith, Jones, and Hayes are borrowers as well as depositors. Since relations are sets, duplicate values are eliminated.
Observe that, in our example, we took the union of two sets, both of which consisted of customer-name values. In general, we must ensure that unions are taken be- tween compatible relations. For example, it would not make sense to take the union of the loan relation and the borrower relation. The former is a relation of three attributes; the latter is a relation of two. Furthermore, consider a union of a set of customer names and a set of cities. Such a union would not make sense in most situations.
Therefore, for a union operation r ∪ s to be valid, we require that two conditions hold:
1. The relations r and s must be of the same arity. That is, they must have the same number of attributes.
2. The domains of the ith attribute of r and the ith attribute of s must be the same, for all i.
Note that r and s can be, in general, temporary relations that are the result of relational- algebra expressions.
The Set Difference Operation
The set-difference operation, denoted by −, allows us to find tuples that are in one relation but are not in another. The expression r − s produces a relation containing
those tuples in r but not in s.
We can find all customers of the bank who have an account but not a loan by writing
The result relation for this query appears in Figure 3.13.
As with the union operation, we must ensure that set differences are taken be- tween compatible relations. Therefore, for a set difference operation r − s to be valid, we require that the relations r and s be of the same arity, and that the domains of the ith attribute of r and the ith attribute of s be the same.
The Cartesian-Product Operation
The Cartesian-product operation, denoted by a cross (×), allows us to combine in- formation from any two relations. We write the Cartesian product of relations r1 and
Recall that a relation is by definition a subset of a Cartesian product of a set of domains. From that definition, we should already have an intuition about the definition of the Cartesian-product operation. However, since the same attribute name may appear in both r1 and r2, we need to devise a naming schema to distinguish between these attributes. We do so here by attaching to an attribute the name of the relation from which the attribute originally came. For example, the relation schema
for r = borrower × loan is
(borrower.customer-name, borrower.loan-number, loan.loan-number, loan.branch-name, loan.amount)
With this schema, we can distinguish borrower.loan-number from loan.loan-number. For those attributes that appear in only one of the two schemas, we shall usually drop the relation-name prefix. This simplification does not lead to any ambiguity. We can then write the relation schema for r as
(customer-name, borrower.loan-number, loan.loan-number, branch-name, amount)
This naming convention requires that the relations that are the arguments of the Cartesian-product operation have distinct names. This requirement causes problems in some cases, such as when the Cartesian product of a relation with itself is desired. A similar problem arises if we use the result of a relational-algebra expression in a Cartesian product, since we shall need a name for the relation so that we can refer to the relation’s attributes. In Section 3.2.1.7, we see how to avoid these problems by using a rename operation.
Now that we know the relation schema for r = borrower × loan, what tuples appear in r? As you may suspect, we construct a tuple of r out of each possible pair of tuples: one from the borrower relation and one from the loan relation. Thus, r is a large relation, as you can see from Figure 3.14, which includes only a portion of the tuples that make up r.
Assume that we have n1 tuples in borrower and n2 tuples in loan. Then, there are n1 ∗ n2 ways of choosing a pair of tuples — one tuple from each relation; so there are n1 ∗ n2 tuples in r. In particular, note that for some tuples t in r, it may be that t[borrower.loan-number] /= t[loan.loan-number].
In general, if we have relations r1(R1) and r2(R2), then r1 × r2 is a relation whose schema is the concatenation of R1 and R2. Relation R contains all tuples t for which there is a tuple t1 in r1 and a tuple t2 in r2 for which t[R1] = t1[R1] and t[R2] = t2[R2].
Suppose that we want to find the names of all customers who have a loan at the Perryridge branch. We need the information in both the loan relation and the borrower relation to do so. If we write
σbranch -name = “Perryridge”(borrower × loan)
then the result is the relation in Figure 3.15. We have a relation that pertains to only the Perryridge branch. However, the customer-name column may contain customers
who do not have a loan at the Perryridge branch. (If you do not see why that is true, recall that the Cartesian product takes all possible pairings of one tuple from borrower with one tuple of loan.)
Since the Cartesian-product operation associates every tuple of loan with every tuple of borrower, we know that, if a customer has a loan in the Perryridge branch, then there is some tuple in borrower × loan that contains his name, and borrower.loan-number = loan.loan-number. So, if we write
we get only those tuples of borrower × loan that pertain to customers who have a loan at the Perryridge branch.
Finally, since we want only customer-name, we do a projection:
The Rename Operation
Unlike relations in the database, the results of relational-algebra expressions do not have a name that we can use to refer to them. It is useful to be able to give them names; the rename operator, denoted by the lowercase Greek letter rho (ρ), lets us do
returns the result of expression E under the name x.
A relation r by itself is considered a (trivial) relational-algebra expression. Thus, we can also apply the rename operation to a relation r to get the same relation under a new name.
A second form of the rename operation is as follows. Assume that a relational-algebra expression E has arity n. Then, the expression
returns the result of expression E under the name x, and with the attributes renamed to A1, A2,... , An.
To illustrate renaming a relation, we consider the query “Find the largest account balance in the bank.” Our strategy is to (1) compute first a temporary relation consisting of those balances that are not the largest and (2) take the set difference between the relation Πbalance (account ) and the temporary relation just computed, to obtain the result.
Step 1: To compute the temporary relation, we need to compare the values of all account balances. We do this comparison by computing the Cartesian product account × account and forming a selection to compare the value of any two balances appearing in one tuple. First, we need to devise a mechanism to distinguish between the two balance attributes. We shall use the rename operation to rename one reference to the account relation; thus we can reference the relation twice without ambiguity.
We can now write the temporary relation that consists of the balances that are not the largest:
This expression gives those balances in the account relation for which a larger balance appears somewhere in the account relation (renamed as d). The result contains all balances except the largest one. Figure 3.17 shows this relation.
Step 2: The query to find the largest account balance in the bank can be written as:
Figure 3.18 shows the result of this query.
As one more example of the rename operation, consider the query “Find the names of all customers who live on the same street and in the same city as Smith.” We can obtain Smith’s street and city by writing
However, in order to find other customers with this street and city, we must reference the customer relation a second time. In the following query, we use the rename operation on the preceding expression to give its result the name smith-addr, and to rename its attributes to street and city, instead of customer-street and customer-city:
The result of this query, when we apply it to the customer relation of Figure 3.4, ap- pears in Figure 3.19.
The rename operation is not strictly required, since it is possible to use a positional notation for attributes. We can name attributes of a relation implicitly by using a positional notation, where $1, $2, ... refer to the first attribute, the second attribute, and so on. The positional notation also applies to results of relational-algebra operations.
The following relational-algebra expression illustrates the use of positional notation with the unary operator σ:
If a binary operation needs to distinguish between its two operand relations, a similar positional notation can be used for relation names as well. For example, $R1 could refer to the first operand, and $R2 could refer to the second operand. However, the positional notation is inconvenient for humans, since the position of the attribute is a number, rather than an easy-to-remember attribute name. Hence, we do not use the positional notation in this textbook.
Formal Definition of the Relational Algebra
The operations in Section 3.2.1 allow us to give a complete definition of an expression in the relational algebra. A basic expression in the relational algebra consists of either one of the following:
• A relation in the database
• A constant relation
A constant relation is written by listing its tuples within { }, for example { (A-101, Downtown, 500) (A-215, Mianus, 700) }.
A general expression in relational algebra is constructed out of smaller subexpressions. Let E1 and E2 be relational-algebra expressions. Then, these are all relational algebra expressions:
Additional Operations
The fundamental operations of the relational algebra are sufficient to express any relational-algebra query.1 However, if we restrict ourselves to just the fundamental operations, certain common queries are lengthy to express. Therefore, we define ad- ditional operations that do not add any power to the algebra, but simplify common queries. For each new operation, we give an equivalent expression that uses only the fundamental operations.
The Set-Intersection Operation
The first additional-relational algebra operation that we shall define is set intersection (∩). Suppose that we wish to find all customers who have both a loan and an account. Using set intersection, we can write
The result relation for this query appears in Figure 3.20.
Note that we can rewrite any relational algebra expression that uses set intersection by replacing the intersection operation with a pair of set-difference operations as:
Thus, set intersection is not a fundamental operation and does not add any power to the relational algebra. It is simply more convenient to write r ∩ s than to write r − (r − s).
The Natural-Join Operation
It is often desirable to simplify certain queries that require a Cartesian product. Usually, a query that involves a Cartesian product includes a selection operation on the result of the Cartesian product. Consider the query “Find the names of all customers who have a loan at the bank, along with the loan number and the loan amount.” We first form the Cartesian product of the borrower and loan relations. Then, we select those tuples that pertain to only the same loan-number, followed by the projection of the resulting customer-name, loan-number, and amount:
The natural join is a binary operation that allows us to combine certain selections and a Cartesian product into one operation. It is denoted by the “join” symbol . The natural-join operation forms a Cartesian product of its two arguments, performs a selection forcing equality on those attributes that appear in both relation schemas, and finally removes duplicate attributes.
Although the definition of natural join is complicated, the operation is easy to apply. As an illustration, consider again the example “Find the names of all customers who have a loan at the bank, and find the amount of the loan.” We express this query
Since the schemas for borrower and loan (that is, Borrower-schema and Loan-schema) have the attribute loan-number in common, the natural-join operation considers only pairs of tuples that have the same value on loan-number. It combines each such pair of tuples into a single tuple on the union of the two schemas (that is, customer-name, branch-name, loan-number, amount). After performing the projection, we obtain the relation in Figure 3.21.
Consider two relation schemas R and S — which are, of course, lists of attribute names. If we consider the schemas to be sets, rather than lists, we can denote those attribute names that appear in both R and S by R ∩ S, and denote those attribute names that appear in R, in S, or in both by R ∪ S. Similarly, those attribute names that appear in R but not S are denoted by R − S, whereas S − R denotes those attribute names that appear in S but not in R. Note that the union, intersection, and difference operations here are on sets of attributes, rather than on relations.
We are now ready for a formal definition of the natural join. Consider two relations r(R) and s(S). The natural join of r and s, denoted by r R ∪ S formally defined as follows:
• Find the names of all branches with customers who have an account in the bank and who live in Harrison.
The result relation for this query appears in Figure 3.22.
Notice that we wrote customer account depositor without inserting parentheses to specify the order in which the natural-join operations on the three relations should be executed. In the preceding case, there are two possibilities:
We did not specify which expression we intended, because the two are equivalent. That is, the natural join is associative.
• Find all customers who have both a loan and an account at the bank.
Note that in Section 3.2.3.1 we wrote an expression for this query by using set intersection. We repeat this expression here.
The result relation for this query appeared earlier in Figure 3.20. This example illustrates a general fact about the relational algebra: It is possible to write several equivalent relational-algebra expressions that are quite different from one another.
• Let r(R) and s(S) be relations without any attributes in common; that is, R ∩ S = ∅. (∅ denotes the empty set.) Then, r s = r × s.
The theta join operation is an extension to the natural-join operation that allows us to combine a selection and a Cartesian product into a single operation. Consider relations r(R) and s(S), and let θ be a predicate on attributes in the schema R ∪ S.
The theta join operation r s is defined as follows:
The Division Operation
The division operation, denoted by ÷, is suited to queries that include the phrase “for all.” Suppose that we wish to find all customers who have an account at all the branches located in Brooklyn. We can obtain all branches in Brooklyn by the expression
The result relation for this expression appears in Figure 3.23.
We can find all (customer-name, branch-name) pairs for which the customer has an account at a branch by writing
Figure 3.24 shows the result relation for this expression.
Now, we need to find customers who appear in r2 with every branch name in r1. The operation that provides exactly those customers is the divide operation. We formulate the query by writing
The result of this expression is a relation that has the schema (customer-name) and that contains the tuple (Johnson).
Formally, let r(R) and s(S) be relations, and let S ⊆ R; that is, every attribute of schema S is also in schema R. The relation r ÷ s is a relation on schema R − S (that is, on the schema containing all attributes of schema R that are not in schema S). A tuple t is in r ÷ s if and only if both of two conditions hold:
It may surprise you to discover that, given a division operation and the schemas of the relations, we can, in fact, define the division operation in terms of the fundamental operations. Let r(R) and s(S) be given, with S ⊆ R:
To see that this expression is true, we observe that ΠR−S (r) gives us all tuples t that satisfy the first condition of the definition of division. The expression on the right side of the set difference operator
serves to eliminate those tuples that fail to satisfy the second condition of the definition of division. Let us see how it does so. Consider ΠR−S (r) × s. This relation is on schema R, and pairs every tuple in ΠR−S (r) with every tuple in s. The expression ΠR−S,S (r) merely reorders the attributes of r.
Thus, (ΠR−S (r) × s) − ΠR−S,S (r) gives us those pairs of tuples from ΠR−S (r) and s that do not appear in r. Ifa tuple tj is in
then there is some tuple ts in s that does not combine with tuple tj to form a tuple in r. Thus, tj holds a value for attributes R − S that does not appear in r ÷ s. It is these values that we eliminate from ΠR−S (r).
The Assignment Operation
It is convenient at times to write a relational-algebra expression by assigning parts of it to temporary relation variables. The assignment operation, denoted by ←, works like assignment in a programming language. To illustrate this operation, consider the definition of division in Section 3.2.3.3. We could write r ÷ s as
The evaluation of an assignment does not result in any relation being displayed to the user. Rather, the result of the expression to the right of the ← is assigned to the relation variable on the left of the ←. This relation variable may be used in subsequent expressions.
With the assignment operation, a query can be written as a sequential program consisting of a series of assignments followed by an expression whose value is displayed as the result of the query. For relational-algebra queries, assignment must always be made to a temporary relation variable. Assignments to permanent relations constitute a database modification. We discuss this issue in Section 3.4. Note that the assignment operation does not provide any additional power to the algebra.
It is, however, a convenient way to express complex queries.
Comments
Post a Comment