The Domain Relational Calculus
The Domain Relational Calculus∗∗
A second form of relational calculus, called domain relational calculus, uses domain variables that take on values from an attributes domain, rather than values for an entire tuple. The domain relational calculus, however, is closely related to the tuple relational calculus.
Domain relational calculus serves as the theoretical basis of the widely used QBE
language, just as relational algebra serves as the basis for the SQL language.
Formal Definition
An expression in the domain relational calculus is of the form
where x1, x2,..., xn represent domain variables. P represents a formula composed of atoms, as was the case in the tuple relational calculus. An atom in the domain relational calculus has one of the following forms:
• < x1, x2,..., xn > ∈ r, where r is a relation on n attributes and x1, x2,... , xn are domain variables or domain constants.
• x Θ y, where x and y are domain variables and Θ is a comparison operator (<, ≤, =, /=, >, ≥). We require that attributes x and y have domains that can be compared by Θ.
• x Θ c, where x is a domain variable, Θ is a comparison operator, and c is a constant in the domain of the attribute for which x is a domain variable.
We build up formulae from atoms by using the following rules:
• An atom is a formula.
• If P1 is a formula, then so are ¬P1 and (P1).
• If P1 and P2 are formulae, then so are P1 ∨ P2, P1 ∧ P2, and P1 ⇒ P2.
• If P1(x) is a formula in x, where x is a domain variable, then
Example Queries
We now give domain-relational-calculus queries for the examples that we considered earlier. Note the similarity of these expressions and the corresponding tuple- relational-calculus expressions.
• Find the loan number, branch name, and amount for loans of over $1200:
• Find all loan numbers for loans with an amount greater than $1200:
Although the second query appears similar to the one that we wrote for the tuple relational calculus, there is an important difference. In the tuple calculus, when we write ∃ s for some tuple variable s, we bind it immediately to a relation by writing ∃ s ∈ r. However, when we write ∃ b in the domain calculus, b refers not to a tuple, but rather to a domain value. Thus, the domain of variable b is unconstrained until the subformula < l, b,a > ∈ loan constrains b to branch names that appear in the loan relation. For example,
• Find the names of all customers who have a loan from the Perryridge branch and find the loan amount:
• Find the names of all customers who have a loan, an account, or both at the Perryridge branch:
• Find the names of all customers who have an account at all the branches lo- cated in Brooklyn:
In English, we interpret this expression as “The set of all (customer-name) tu- ples c such that, for all (branch-name, branch-city, assets) tuples, x, y, z, if the branch city is Brooklyn, then the following is true”:
- There exists a tuple in the relation account with account number a and branch name x.
- There exists a tuple in the relation depositor with customer c and account number a.”
Safety of Expressions
We noted that, in the tuple relational calculus (Section 3.6), it is possible to write expressions that may generate an infinite relation. That led us to define safety for tuple- relational-calculus expressions. A similar situation arises for the domain relational calculus. An expression such as
is unsafe, because it allows values in the result that are not in the domain of the expression.
For the domain relational calculus, we must be concerned also about the form of formulae within “there exists” and “for all” clauses. Consider the expression
where P is some formula involving x and z. We can test the first part of the formula, ∃ y (< x, y > ∈ r), by considering only the values in r. However, to test the second part of the formula, ∃ z (¬ (< x, z > ∈ r) ∧ P (x, z)), we must consider values for z that do not appear in r. Since all relations are finite, an infinite number of values do not appear in r. Thus, it is not possible, in general, to test the second part of the formula, without considering an infinite number of potential values for z. Instead, we add restrictions to prohibit expressions such as the preceding one.
In the tuple relational calculus, we restricted any existentially quantified variable to range over a specific relation. Since we did not do so in the domain calculus, we add rules to the definition of safety to deal with cases like our example. We say that an expression
is safe if all of the following hold:
1. All values that appear in tuples of the expression are values from dom(P).
2. For every “there exists” subformula of the form ∃ x (P1(x)), the subformula is true if and only if there is a value x in dom(P1) such that P1(x) is true.
3. For every “for all” subformula of the form ∀x (P1(x)), the subformula is true if and only if P1(x) is true for all values x from dom(P1).
The purpose of the additional rules is to ensure that we can test “for all” and “there exists” subformulae without having to test infinitely many possibilities. Consider the second rule in the definition of safety. For ∃ x (P1(x)) to be true, we need to find only one x for which P1(x) is true. In general, there would be infinitely many values to test. However, if the expression is safe, we know that we can restrict our attention to values from dom(P1). This restriction reduces to a finite number the tuples we must consider.
The situation for subformulae of the form ∀x (P1(x)) is similar. To assert that ∀x (P1(x)) is true, we must, in general, test all possible values, so we must examine infinitely many values. As before, if we know that the expression is safe, it is sufficient for us to test P1(x) for those values taken from dom(P1).
All the domain-relational-calculus expressions that we have written in the example queries of this section are safe.
Expressive Power of Languages
When the domain relational calculus is restricted to safe expressions, it is equivalent in expressive power to the tuple relational calculus restricted to safe expressions. Since we noted earlier that the restricted tuple relational calculus is equivalent to the relational algebra, all three of the following are equivalent:
• The basic relational algebra (without the extended relational algebra operations)
• The tuple relational calculus restricted to safe expressions
• The domain relational calculus restricted to safe expressions
We note that the domain relational calculus also does not have any equivalent of the aggregate operation, but it can be extended to support aggregation, and extending it to handle arithmatic expressions is straightforward.
Summary
• The relational data model is based on a collection of tables. The user of the database system may query these tables, insert new tuples, delete tuples, and update (modify) tuples. There are several languages for expressing these operations.
• The relational algebra defines a set of algebraic operations that operate on tables, and output tables as their results. These operations can be combined to get expressions that express desired queries. The algebra defines the basic operations used within relational query languages.
• The operations in relational algebra can be divided into Basic operations
Additional operations that can be expressed in terms of the basic operations
Extended operations, some of which add further expressive power to relational algebra
• Databases can be modified by insertion, deletion, or update of tuples. We used the relational algebra with the assignment operator to express these modifications.
• Different users of a shared database may benefit from individualized views of the database. Views are “virtual relations” defined by a query expression. We evaluate queries involving views by replacing the view with the expression that defines the view.
• Views are useful mechanisms for simplifying database queries, but modification of the database through views may cause problems. Therefore, database systems severely restrict updates through views.
• For reasons of query-processing efficiency, a view may be materialized — that is, the query is evaluated and the result stored physically. When database relations are updated, the materialized view must be correspondingly updated.
• The tuple relational calculus and the domain relational calculus are non- procedural languages that represent the basic power required in a relational query language. The basic relational algebra is a procedural language that is equivalent in power to both forms of the relational calculus when they are restricted to safe expressions.
• The relational algebra and the relational calculi are terse, formal languages that are inappropriate for casual users of a database system. Commercial data- base systems, therefore, use languages with more “syntactic sugar.” In Chapters 4 and 5, we shall consider the three most influential languages: SQL, which is based on relational algebra, and QBE and Datalog, which are based on domain relational calculus.
Review Terms
Exercises
Design a relational database for a university registrar’s office. The office maintains data about each class, including the instructor, the number of students enrolled, and the time and place of the class meetings. For each student – class pair, a grade is recorded.
Describe the differences in meaning between the terms relation and relation schema.
Illustrate your answer by referring to your solution to Exercise 3.1.
Design a relational database corresponding to the E-R diagram of Figure 3.38.
In Chapter 2, we saw how to represent many-to-many, many-to-one, one-to- many, and one-to-one relationship sets. Explain how primary keys help us to represent such relationship sets in the relational model.
Consider the relational database of Figure 3.39, where the primary keys are un- derlined. Give an expression in the relational algebra to express each of the fol- lowing queries:
a. Find the names of all employees who work for First Bank Corporation.
b. Find the names and cities of residence of all employees who work for First
Bank Corporation.
c. Find the names, street address, and cities of residence of all employees who
work for First Bank Corporation and earn more than $10,000 per annum.
d. Find the names of all employees in this database who live in the same city as the company for which they work.
e. Find the names of all employees who live in the same city and on the same street as do their managers.
f. Find the names of all employees in this database who do not work for First Bank Corporation.
g. Find the names of all employees who earn more than every employee of Small Bank Corporation.
h. Assume the companies may be located in several cities. Find all companies located in every city in which Small Bank Corporation is located.
Consider the relation of Figure 3.21, which shows the result of the query “Find the names of all customers who have a loan at the bank.” Rewrite the query to include not only the name, but also the city of residence for each customer. Observe that now customer Jackson no longer appears in the result, even though Jackson does in fact have a loan from the bank.
a. Explain why Jackson does not appear in the result.
b. Suppose that you want Jackson to appear in the result. How would you modify the database to achieve this effect?
c. Again, suppose that you want Jackson to appear in the result. Write a query using an outer join that accomplishes this desire without your having to modify the database.
The outer-join operations extend the natural-join operation so that tuples from the participating relations are not lost in the result of the join. Describe how the theta join operation can be extended so that tuples from the left, right, or both relations are not lost from the result of a theta join.
Consider the relational database of Figure 3.39. Give an expression in the relational algebra for each request:
a. Modify the database so that Jones now lives in Newtown.
b. Give all employees of First Bank Corporation a 10 percent salary raise.
c. Give all managers in this database a 10 percent salary raise.
d. Give all managers in this database a 10 percent salary raise, unless the salary would be greater than $100,000. In such cases, give only a 3 percent raise.
e. Delete all tuples in the works relation for employees of Small Bank Corporation.
Using the bank example, write relational-algebra queries to find the accounts held by more than two customers in the following ways:
a. Using an aggregate function.
b. Without using any aggregate functions.
Consider the relational database of Figure 3.39. Give a relational-algebra expression for each of the following queries:
a. Find the company with the most employees.
b. Find the company with the smallest payroll.
c. Find those companies whose employees earn a higher salary, on average, than the average salary at First Bank Corporation.
List two reasons why we may choose to define a view.
List two major problems with processing update operations expressed in terms of views.
Let the following relation schemas be given:
R = (A, B, C)
S = (D, E, F )
Let relations r(R) and s(S) be given. Give an expression in the tuple relational calculus that is equivalent to each of the following:
Let R = (A, B, C), and let r1 and r2 both be relations on schema R. Give an expression in the domain relational calculus that is equivalent to each of the following:
Repeat Exercise 3.5 using the tuple relational calculus and the domain relational calculus.
Let R = (A, B) and S = (A, C), and let r(R) and s(S) be relations. Write relational-algebra expressions equivalent to the following domain-relational- calculus expressions:
Let R = (A, B) and S = (A, C), and let r(R) and s(S) be relations. Using the special constant null, write tuple-relational-calculus expressions equivalent to each of the following:
List two reasons why null values might be introduced into the database.
Certain systems allow marked nulls. A marked null ⊥i is equal to itself, but if i /= j, then ⊥i /= ⊥j . One application of marked nulls is to allow certain updates through views. Consider the view loan-info (Section 3.5). Show how you can use marked nulls to allow the insertion of the tuple (“Johnson”, 1900) through loan info.
Bibliographical Notes
E. F. Codd of the IBM San Jose Research Laboratory proposed the relational model in the late 1960s; Codd [1970]. This work led to the prestigious ACM Turing Award to Codd in 1981; Codd [1982].
After Codd published his original paper, several research projects were formed with the goal of constructing practical relational database systems, including System R at the IBM San Jose Research Laboratory, Ingres at the University of California at Berkeley, Query-by-Example at the IBM T. J. Watson Research Center, and the Peterlee Relational Test Vehicle (PRTV) at the IBM Scientific Center in Peterlee, United Kingdom. System R is discussed in Astrahan et al. [1976], Astrahan et al. [1979], and Chamberlin et al. [1981]. Ingres is discussed in Stonebraker [1980], Stonebraker [1986b], and Stonebraker et al. [1976]. Query-by-example is described in Zloof [1977]. PRTV is described in Todd [1976].
Many relational-database products are now commercially available. These include IBM’s DB2, Ingres, Oracle, Sybase, Informix, and Microsoft SQL Server. Database products for personal computers include Microsoft Access, dBase, and FoxPro. In- formation about the products can be found in their respective manuals.
General discussion of the relational data model appears in most database texts. Atzeni and Antonellis [1993] and Maier [1983] are texts devoted exclusively to the relational data model. The original definition of relational algebra is in Codd [1970]; that of tuple relational calculus is in Codd [1972]. A formal proof of the equivalence of tuple relational calculus and relational algebra is in Codd [1972].
Several extensions to the relational calculus have been proposed. Klug [1982] and Escobar-Molano et al. [1993] describe extensions to scalar aggregate functions. Ex- tensions to the relational model and discussions of incorporation of null values in the relational algebra (the RM/T model), as well as outer joins, are in Codd [1979]. Codd [1990] is a compendium of E. F. Codd’s papers on the relational model. Outer joins are also discussed in Date [1993b]. The problem of updating relational databases through views is addressed by Bancilhon and Spyratos [1981], Cosmadakis and Pa- padimitriou [1984], Dayal and Bernstein [1978], and Langerak [1990]. Section 14.5 covers materialized view maintenance, and references to literature on view maintenance can be found at the end of that chapter.
Comments
Post a Comment