Relational-Database Design:Functional Dependencies.
Functional Dependencies
Functional dependencies play a key role in differentiating good database designs from bad database designs. A functional dependency is a type of constraint that is a generalization of the notion of key, as discussed in Chapters 2 and 3.
Basic Concepts
Functional dependencies are constraints on the set of legal relations. They allow us to express facts about the enterprise that we are modeling with our database.
In Chapter 2, we defined the notion of a superkey as follows. Let R be a relation schema. A subset K of R is a superkey of R if, in any legal relation r(R), for all pairs t1 and t2 of tuples in r such that t1 j= t2, then t1[K] j= t2[K]. That is, no two tuples in any legal relation r(R) may have the same value on attribute set K.
The notion of functional dependency generalizes the notion of superkey. Consider a relation schema R, and let α ⊆ R and β ⊆ R. The functional dependency
α → β
holds on schema R if, in any legal relation r(R), for all pairs of tuples t1 and t2 in r such that t1[α] = t2[α], it is also the case that t1[β] = t2[β].
Using the functional-dependency notation, we say that K is a superkey of R if K → R. That is, K is a superkey if, whenever t1[K] = t2[K], it is also the case that t1[R] = t2[R] (that is, t1 = t2).
Functional dependencies allow us to express constraints that we cannot express with superkeys. Consider the schema Loan-info-schema = (loan-number, branch-name, customer-name, amount) which is simplification of the Lending-schema that we saw earlier. The set of functional dependencies that we expect to hold on this relation schema is
loan-number → amount
loan-number → branch-name
We would not, however, expect the functional dependency
loan-number → customer-name
to hold, since, in general, a given loan can be made to more than one customer (for example, to both members of a husband – wife pair).
We shall use functional dependencies in two ways:
1. To test relations to see whether they are legal under a given set of functional dependencies. If a relation r is legal under a set F of functional dependencies, we say that r satisfies F.
2. To specify constraints on the set of legal relations. We shall thus concern our- selves with only those relations that satisfy a given set of functional dependencies. If we wish to constrain ourselves to relations on schema R that satisfy a set F of functional dependencies, we say that F holds on R.
Let us consider the relation r of Figure 7.2, to see which functional dependencies are satisfied. Observe that A → C is satisfied. There are two tuples that have an A value of a1. These tuples have the same C value — namely, c1. Similarly, the two tuples with an A value of a2 have the same C value, c2. There are no other pairs of distinct tuples that have the same A value. The functional dependency C → A is not satisfied, however. To see that it is not, consider the tuples t1 = (a2, b3, c2, d3) and
can have streets with the same name. Thus, it is possible, at some time, to have an instance of the customer relation in which customer-street → customer-city is not satisfied. So, we would not include customer-street → customer-city in the set of functional dependencies that hold on Customer-schema.
In the loan relation (on Loan-schema) of Figure 7.4, we see that the dependency loan- number → amount is satisfied. In contrast to the case of customer-city and customer street in Customer-schema, we do believe that the real-world enterprise that we are modeling requires each loan to have only one amount. Therefore, we want to require that loan-number → amount be satisfied by the loan relation at all times. In other words, we require that the constraint loan-number → amount hold on Loan-schema.
In the branch relation of Figure 7.5, we see that branch-name → assets is satisfied, as is assets → branch-name. We want to require that branch-name → assets hold on Branch-schema. However, we do not wish to require that assets → branch-name hold, since it is possible to have several branches that have the same asset value.
In what follows, we assume that, when we design a relational database, we first list those functional dependencies that must always hold. In the banking example, our list of dependencies includes the following:
Closure of a Set of Functional Dependencies
It is not sufficient to consider the given set of functional dependencies. Rather, we need to consider all functional dependencies that hold. We shall see that, given a set F of functional dependencies, we can prove that certain other functional dependencies hold. We say that such functional dependencies are “logically implied” by F.
More formally, given a relational schema R, a functional dependency f on R is logically implied by a set of functional dependencies F on R if every relation instance r(R) that satisfies F also satisfies f .
Suppose we are given a relation schema R = (A, B, C, G, H, I) and the set of functional dependencies
is logically implied. That is, we can show that, whenever our given set of functional dependencies holds on a relation, A → H must also hold on the relation. Suppose that
Therefore, we have shown that, whenever t1 and t2 are tuples such that t1[A] = t2[A], it must be that t1[H] = t2[H]. But that is exactly the definition of A → H.
Let F be a set of functional dependencies. The closure of F, denoted by F +, is the set of all functional dependencies logically implied by F. Given F, we can compute F + directly from the formal definition of functional dependency. If F were large, this process would be lengthy and difficult. Such a computation of F + requires arguments of the type just used to show that A → H is in the closure of our example set of dependencies.
Axioms, or rules of inference, provide a simpler technique for reasoning about functional dependencies. In the rules that follow, we use Greek letters (α, β, γ, ... ) for sets of attributes, and uppercase Roman letters from the beginning of the alphabet for individual attributes. We use αβ to denote α ∪ β.
We can use the following three rules to find logically implied functional dependencies. By applying these rules repeatedly, we can find all of F +, given F. This collection of rules is called Armstrong’s axioms in honor of the person who first proposed it.
• Reflexivity rule. If α is a set of attributes and β ⊆ α, then α → β holds.
• Augmentation rule. If α → β holds and γ is a set of attributes, then γα → γβ holds.
• Transitivity rule. If α → β holds and β → γ holds, then α → γ holds.
Armstrong’s axioms are sound, because they do not generate any incorrect functional dependencies. They are complete, because, for a given set F of functional dependencies, they allow us to generate all F +. The bibliographical notes provide references for proofs of soundness and completeness.
Although Armstrong’s axioms are complete, it is tiresome to use them directly for the computation of F +. To simplify matters further, we list additional rules. It is possible to use Armstrong’s axioms to prove that these rules are correct (see Exercises 7.8, 7.9, and 7.10).
• Union rule. If α → β holds and α → γ holds, then α → βγ holds.
• Decomposition rule. If α → βγ holds, then α → β holds and α → γ holds.
• Pseudo transitivity rule. If α → β holds and γβ → δ holds, then αγ → δ holds.
Let us apply our rules to the example of schema R = (A, B, C, G, H, I) and the set F of functional dependencies {A → B, A → C, CG → H, CG → I, B → H}. We
list several members of F + here:
Figure 7.6 shows a procedure that demonstrates formally how to use Armstrong’s axioms to compute F +. In this procedure, when a functional dependency is added to F +, it may be already present, and in that case there is no change to F +. We will also see an alternative way of computing F + in Section 7.3.3.
The left-hand and right-hand sides of a functional dependency are both subsets of R. Since a set of size n has 2n subsets, there are a total of 2 × 2n = 2n+1 possible
functional dependencies, where n is the number of attributes in R. Each iteration of the repeat loop of the procedure, except the last iteration, adds at least one functional dependency to F +. Thus, the procedure is guaranteed to terminate.
Closure of Attribute Sets
To test whether a set α is a superkey, we must devise an algorithm for computing the set of attributes functionally determined by α. One way of doing this is to compute F +, take all functional dependencies with α as the left-hand side, and take the union of the right-hand sides of all such dependencies. However, doing so can be expensive, since F + can be large.
An efficient algorithm for computing the set of attributes functionally determined by α is useful not only for testing whether α is a superkey, but also for several other tasks, as we will see later in this section.
Let α be a set of attributes. We call the set of all attributes functionally determined by α under a set F of functional dependencies the closure of α under F; we denote it by α+. Figure 7.7 shows an algorithm, written in pseudocode, to compute α+. The
input is a set F of functional dependencies and the set α of attributes. The output is stored in the variable result.
To illustrate how the algorithm works, we shall use it to compute (AG)+ with the functional dependencies defined in Section 7.3.2. We start with result = AG . The first time that we execute the while loop to test each functional dependency, we find that
The second time that we execute the while loop, no new attributes are added to result, and the algorithm terminates.
Let us see why the algorithm of Figure 7.7 is correct. The first step is correct, since α → α always holds (by the reflexivity rule). We claim that, for any subset β of result, α → β. Since we start the while loop with α → result being true, we can add γ to result only if β ⊆ result and β → γ. But then result → β by the reflexivity rule, so α → β by transitivity. Another application of transitivity shows that α → γ (using α → β and β → γ). The union rule implies that α → result ∪ γ, so α functionally determines any new result generated in the while loop. Thus, any attribute returned by the algorithm is in α+.
It is easy to see that the algorithm finds all α+. If there is an attribute in α+ that is not yet in result, then there must be a functional dependency β → γ for which β ⊆ result, and at least one attribute in γ is not in result.
It turns out that, in the worst case, this algorithm may take an amount of time quadratic in the size of F. There is a faster (although slightly more complex) algorithm that runs in time linear in the size of F; that algorithm is presented as part of Exercise 7.14.
There are several uses of the attribute closure algorithm:
Canonical Cover
Suppose that we have a set of functional dependencies F on a relation schema. When- ever a user performs an update on the relation, the database system must ensure that the update does not violate any functional dependencies, that is, all the functional dependencies in F are satisfied in the new database state.
The system must roll back the update if it violates any functional dependencies in the set F .
We can reduce the effort spent in checking for violations by testing a simplified set of functional dependencies that has the same closure as the given set. Any database that satisfies the simplified set of functional dependencies will also satisfy the original set, and vice versa, since the two sets have the same closure. However, the sim- plified set is easier to test. We shall see how the simplified set can be constructed in a moment. First, we need some definitions.
An attribute of a functional dependency is said to be extraneous if we can remove it without changing the closure of the set of functional dependencies. The formal definition of extraneous attributes is as follows. Consider a set F of functional de- pendencies and the functional dependency α → β in F.
For example, suppose we have the functional dependencies AB → C and A → C in F . Then, B is extraneous in AB → C. As another example, suppose we have the functional dependencies AB → CD and A → C in F . Then C would be extraneous in the right-hand side of AB → CD.
Beware of the direction of the implications when using the definition of extraneous attributes: If you exchange the left-hand side with right-hand side, the implication will always hold. That is, (F − {α → β}) ∪ {(α − A) → β} always logically implies F, and also F always logically implies (F − {α → β}) ∪ {α → (β − A)} Here is how we can test efficiently if an attribute is extraneous. Let R be the relation schema, and let F be the given set of functional dependencies that hold on R.
Consider an attribute A in a dependency α → β.
• No functional dependency in Fc contains an extraneous attribute.
• Each left side of a functional dependency in Fc is unique. That is, there are no two dependencies α1 → β1 and α2 → β2 in Fc such that α1 = α2.
A canonical cover for a set of functional dependencies F can be computed as depicted in Figure 7.8. It is important to note that when checking if an attribute is extraneous, the check uses the dependencies in the current value of Fc, and not the dependencies in F . If a functional dependency contains only one attribute in its right-hand side, for example A → C, and that attribute is found to be extraneous, we would get a functional dependency with an empty right-hand side. Such functional dependencies should be deleted.
The canonical cover of F , Fc, can be shown to have the same closure as F ; hence, testing whether Fc is satisfied is equivalent to testing whether F is satisfied. However, Fc is minimal in a certain sense — it does not contain extraneous attributes, and it
Given a set F of functional dependencies, it may be that an entire functional de- pendency in the set is extraneous, in the sense that dropping it does not change the closure of F . We can show that a canonical cover Fc of F contains no such extraneous functional dependency. Suppose that, to the contrary, there were such an extraneous functional dependency in Fc. The right-side attributes of the dependency would then be extraneous, which is not possible by the definition of canonical covers.
A canonical cover might not be unique. For instance, consider the set of functional dependencies F = {A → BC, B → AC, and C → AB}. If we apply the extra neity test to A → BC, we find that both B and C are extraneous under F . However, it is incorrect to delete both! The algorithm for finding the canonical cover picks one of the two, and deletes it. Then, 1. If C is deleted, we get the set F l = {A → B, B → AC, and C → AB}. Now, B is not extraneous in the right hand side of A → B under F l. Continuing the algorithm, we find A and B are extraneous in the right-hand side of C → AB, leading to two canonical covers Fc = {A → B, B → C, and C → A}, and Fc = {A → B, B → AC, and C → B}.
2. If B is deleted, we get the set {A → C, B → AC, and C → AB}. This case is symmetrical to the previous case, leading to the canonical covers Fc = {A → C, C → B, and B → A}, and Fc = {A → C, B → C, and C → AB}.
As an exercise, can you find one more canonical cover for F ?
Comments
Post a Comment