Relational-Database Design:Functional Dependencies.

Functional Dependencies

Functional dependencies play a key role in differentiating good database designs from bad database designs. A functional dependency is a type of constraint that is a generalization of the notion of key, as discussed in Chapters 2 and 3.

Basic Concepts

Functional dependencies are constraints on the set of legal relations. They allow us to express facts about the enterprise that we are modeling with our database.

In Chapter 2, we deﬁned the notion of a superkey as follows. Let R be a relation schema. A subset K of R is a superkey of R if, in any legal relation r(R), for all pairs t1 and t2 of tuples in r such that t1 j= t2, then t1[K] j= t2[K]. That is, no two tuples in any legal relation r(R) may have the same value on attribute set K.

The notion of functional dependency generalizes the notion of superkey. Consider a relation schema R, and let α ⊆ R and β ⊆ R. The functional dependency

α → β

holds on schema R if, in any legal relation r(R), for all pairs of tuples t1 and t2 in r such that t1[α] = t2[α], it is also the case that t1[β] = t2[β].

Using the functional-dependency notation, we say that K is a superkey of R if K → R. That is, K is a superkey if, whenever t1[K] = t2[K], it is also the case that t1[R] = t2[R] (that is, t1 = t2).

Functional dependencies allow us to express constraints that we cannot express with superkeys. Consider the schema Loan-info-schema = (loan-number, branch-name, customer-name, amount) which is simpliﬁcation of the Lending-schema that we saw earlier. The set of functional dependencies that we expect to hold on this relation schema is

loan-number → amount

loan-number → branch-name

We would not, however, expect the functional dependency

loan-number → customer-name

to hold, since, in general, a given loan can be made to more than one customer (for example, to both members of a husband – wife pair).

We shall use functional dependencies in two ways:

1. To test relations to see whether they are legal under a given set of functional dependencies. If a relation r is legal under a set F of functional dependencies, we say that r satisﬁes F.

2. To specify constraints on the set of legal relations. We shall thus concern our- selves with only those relations that satisfy a given set of functional dependencies. If we wish to constrain ourselves to relations on schema R that satisfy a set F of functional dependencies, we say that F holds on R.

Let us consider the relation r of Figure 7.2, to see which functional dependencies are satisﬁed. Observe that A → C is satisﬁed. There are two tuples that have an A value of a1. These tuples have the same C value — namely, c1. Similarly, the two tuples with an A value of a2 have the same C value, c2. There are no other pairs of distinct tuples that have the same A value. The functional dependency C → A is not satisﬁed, however. To see that it is not, consider the tuples t1 = (a2, b3, c2, d3) and

can have streets with the same name. Thus, it is possible, at some time, to have an instance of the customer relation in which customer-street → customer-city is not satisﬁed. So, we would not include customer-street → customer-city in the set of functional dependencies that hold on Customer-schema.

In the loan relation (on Loan-schema) of Figure 7.4, we see that the dependency loan- number → amount is satisﬁed. In contrast to the case of customer-city and customer street in Customer-schema, we do believe that the real-world enterprise that we are modeling requires each loan to have only one amount. Therefore, we want to require that loan-number → amount be satisﬁed by the loan relation at all times. In other words, we require that the constraint loan-number → amount hold on Loan-schema.

In the branch relation of Figure 7.5, we see that branch-name → assets is satisﬁed, as is assets → branch-name. We want to require that branch-name → assets hold on Branch-schema. However, we do not wish to require that assets → branch-name hold, since it is possible to have several branches that have the same asset value.

In what follows, we assume that, when we design a relational database, we ﬁrst list those functional dependencies that must always hold. In the banking example, our list of dependencies includes the following:

Closure of a Set of Functional Dependencies

It is not sufﬁcient to consider the given set of functional dependencies. Rather, we need to consider all functional dependencies that hold. We shall see that, given a set F of functional dependencies, we can prove that certain other functional dependencies hold. We say that such functional dependencies are “logically implied” by F.

More formally, given a relational schema R, a functional dependency f on R is logically implied by a set of functional dependencies F on R if every relation instance r(R) that satisﬁes F also satisﬁes f .

Suppose we are given a relation schema R = (A, B, C, G, H, I) and the set of functional dependencies

is logically implied. That is, we can show that, whenever our given set of functional dependencies holds on a relation, A → H must also hold on the relation. Suppose that

Therefore, we have shown that, whenever t1 and t2 are tuples such that t1[A] = t2[A], it must be that t1[H] = t2[H]. But that is exactly the deﬁnition of A → H.

Let F be a set of functional dependencies. The closure of F, denoted by F +, is the set of all functional dependencies logically implied by F. Given F, we can compute F + directly from the formal deﬁnition of functional dependency. If F were large, this process would be lengthy and difﬁcult. Such a computation of F + requires arguments of the type just used to show that A → H is in the closure of our example set of dependencies.

Axioms, or rules of inference, provide a simpler technique for reasoning about functional dependencies. In the rules that follow, we use Greek letters (α, β, γ, ... ) for sets of attributes, and uppercase Roman letters from the beginning of the alphabet for individual attributes. We use αβ to denote α ∪ β.

We can use the following three rules to ﬁnd logically implied functional dependencies. By applying these rules repeatedly, we can ﬁnd all of F +, given F. This collection of rules is called Armstrong’s axioms in honor of the person who ﬁrst proposed it.

• Reﬂexivity rule. If α is a set of attributes and β ⊆ α, then α → β holds.

• Augmentation rule. If α → β holds and γ is a set of attributes, then γα → γβ holds.

• Transitivity rule. If α → β holds and β → γ holds, then α → γ holds.

Armstrong’s axioms are sound, because they do not generate any incorrect functional dependencies. They are complete, because, for a given set F of functional dependencies, they allow us to generate all F +. The bibliographical notes provide references for proofs of soundness and completeness.

Although Armstrong’s axioms are complete, it is tiresome to use them directly for the computation of F +. To simplify matters further, we list additional rules. It is possible to use Armstrong’s axioms to prove that these rules are correct (see Exercises 7.8, 7.9, and 7.10).

• Union rule. If α → β holds and α → γ holds, then α → βγ holds.

• Decomposition rule. If α → βγ holds, then α → β holds and α → γ holds.

• Pseudo transitivity rule. If α → β holds and γβ → δ holds, then αγ → δ holds.

Let us apply our rules to the example of schema R = (A, B, C, G, H, I) and the set F of functional dependencies {A → B, A → C, CG → H, CG → I, B → H}. We

list several members of F + here:

Figure 7.6 shows a procedure that demonstrates formally how to use Armstrong’s axioms to compute F +. In this procedure, when a functional dependency is added to F +, it may be already present, and in that case there is no change to F +. We will also see an alternative way of computing F + in Section 7.3.3.

The left-hand and right-hand sides of a functional dependency are both subsets of R. Since a set of size n has 2n subsets, there are a total of 2 × 2n = 2n+1 possible

functional dependencies, where n is the number of attributes in R. Each iteration of the repeat loop of the procedure, except the last iteration, adds at least one functional dependency to F +. Thus, the procedure is guaranteed to terminate.

Closure of Attribute Sets

To test whether a set α is a superkey, we must devise an algorithm for computing the set of attributes functionally determined by α. One way of doing this is to compute F +, take all functional dependencies with α as the left-hand side, and take the union of the right-hand sides of all such dependencies. However, doing so can be expensive, since F + can be large.

An efﬁcient algorithm for computing the set of attributes functionally determined by α is useful not only for testing whether α is a superkey, but also for several other tasks, as we will see later in this section.

Let α be a set of attributes. We call the set of all attributes functionally determined by α under a set F of functional dependencies the closure of α under F; we denote it by α+. Figure 7.7 shows an algorithm, written in pseudocode, to compute α+. The

input is a set F of functional dependencies and the set α of attributes. The output is stored in the variable result.

To illustrate how the algorithm works, we shall use it to compute (AG)+ with the functional dependencies deﬁned in Section 7.3.2. We start with result = AG . The ﬁrst time that we execute the while loop to test each functional dependency, we ﬁnd that

The second time that we execute the while loop, no new attributes are added to result, and the algorithm terminates.

Let us see why the algorithm of Figure 7.7 is correct. The ﬁrst step is correct, since α → α always holds (by the reﬂexivity rule). We claim that, for any subset β of result, α → β. Since we start the while loop with α → result being true, we can add γ to result only if β ⊆ result and β → γ. But then result → β by the reﬂexivity rule, so α → β by transitivity. Another application of transitivity shows that α → γ (using α → β and β → γ). The union rule implies that α → result ∪ γ, so α functionally determines any new result generated in the while loop. Thus, any attribute returned by the algorithm is in α+.

It is easy to see that the algorithm ﬁnds all α+. If there is an attribute in α+ that is not yet in result, then there must be a functional dependency β → γ for which β ⊆ result, and at least one attribute in γ is not in result.

It turns out that, in the worst case, this algorithm may take an amount of time quadratic in the size of F. There is a faster (although slightly more complex) algorithm that runs in time linear in the size of F; that algorithm is presented as part of Exercise 7.14.

There are several uses of the attribute closure algorithm:

Canonical Cover

Suppose that we have a set of functional dependencies F on a relation schema. When- ever a user performs an update on the relation, the database system must ensure that the update does not violate any functional dependencies, that is, all the functional dependencies in F are satisﬁed in the new database state.

The system must roll back the update if it violates any functional dependencies in the set F .

We can reduce the effort spent in checking for violations by testing a simpliﬁed set of functional dependencies that has the same closure as the given set. Any database that satisﬁes the simpliﬁed set of functional dependencies will also satisfy the original set, and vice versa, since the two sets have the same closure. However, the sim- pliﬁed set is easier to test. We shall see how the simpliﬁed set can be constructed in a moment. First, we need some deﬁnitions.

An attribute of a functional dependency is said to be extraneous if we can remove it without changing the closure of the set of functional dependencies. The formal deﬁnition of extraneous attributes is as follows. Consider a set F of functional de- pendencies and the functional dependency α → β in F.

For example, suppose we have the functional dependencies AB → C and A → C in F . Then, B is extraneous in AB → C. As another example, suppose we have the functional dependencies AB → CD and A → C in F . Then C would be extraneous in the right-hand side of AB → CD.

Beware of the direction of the implications when using the deﬁnition of extraneous attributes: If you exchange the left-hand side with right-hand side, the implication will always hold. That is, (F − {α → β}) ∪ {(α − A) → β} always logically implies F, and also F always logically implies (F − {α → β}) ∪ {α → (β − A)} Here is how we can test efﬁciently if an attribute is extraneous. Let R be the relation schema, and let F be the given set of functional dependencies that hold on R.

Consider an attribute A in a dependency α → β.

• No functional dependency in Fc contains an extraneous attribute.

• Each left side of a functional dependency in Fc is unique. That is, there are no two dependencies α1 → β1 and α2 → β2 in Fc such that α1 = α2.

A canonical cover for a set of functional dependencies F can be computed as depicted in Figure 7.8. It is important to note that when checking if an attribute is extraneous, the check uses the dependencies in the current value of Fc, and not the dependencies in F . If a functional dependency contains only one attribute in its right-hand side, for example A → C, and that attribute is found to be extraneous, we would get a functional dependency with an empty right-hand side. Such functional dependencies should be deleted.

The canonical cover of F , Fc, can be shown to have the same closure as F ; hence, testing whether Fc is satisﬁed is equivalent to testing whether F is satisﬁed. However, Fc is minimal in a certain sense — it does not contain extraneous attributes, and it

Given a set F of functional dependencies, it may be that an entire functional de- pendency in the set is extraneous, in the sense that dropping it does not change the closure of F . We can show that a canonical cover Fc of F contains no such extraneous functional dependency. Suppose that, to the contrary, there were such an extraneous functional dependency in Fc. The right-side attributes of the dependency would then be extraneous, which is not possible by the deﬁnition of canonical covers.

A canonical cover might not be unique. For instance, consider the set of functional dependencies F = {A → BC, B → AC, and C → AB}. If we apply the extra neity test to A → BC, we ﬁnd that both B and C are extraneous under F . However, it is incorrect to delete both! The algorithm for ﬁnding the canonical cover picks one of the two, and deletes it. Then, 1. If C is deleted, we get the set F l = {A → B, B → AC, and C → AB}. Now, B is not extraneous in the right hand side of A → B under F l. Continuing the algorithm, we ﬁnd A and B are extraneous in the right-hand side of C → AB, leading to two canonical covers Fc = {A → B, B → C, and C → A}, and Fc = {A → B, B → AC, and C → B}.

2. If B is deleted, we get the set {A → C, B → AC, and C → AB}. This case is symmetrical to the previous case, leading to the canonical covers Fc = {A → C, C → B, and B → A}, and Fc = {A → C, B → C, and C → AB}.

As an exercise, can you ﬁnd one more canonical cover for F ?

Search This Blog

Database Management System course