Relational-Database Design: Fourth Normal Form and More Normal Forms

Fourth Normal Form

Some relation schemas, even though they are in BCNF, do not seem to be sufﬁciently normalized, in the sense that they still suffer from the problem of repetition of information. Consider again our banking example. Assume that, in an alternative design for the bank database schema, we have the schema BC-schema = (loan-number, customer-name, customer-street, customer-city) The astute reader will recognize this schema as a non-BCNF schema because of the functional dependency customer-name → customer-street customer-city

that we asserted earlier, and because customer-name is not a key for BC-schema. How- ever, assume that our bank is attracting wealthy customers who have several ad- dresses (say, a winter home and a summer home). Then, we no longer wish to enforce the functional dependency customer-name → customer-street customer-city. If we remove this functional dependency, we ﬁnd BC-schema to be in BCNF with respect to our modiﬁed set of functional dependencies. Yet, even though BC-schema is now in BCNF, we still have the problem of repetition of information that we had earlier.

To deal with this problem, we must deﬁne a new form of constraint, called a multivalued dependency. As we did for functional dependencies, we shall use multivalued dependencies to deﬁne a normal form for relation schemas. This normal form, called fourth normal form (4NF), is more restrictive than BCNF. We shall see that every 4NF schema is also in BCNF, but there are BCNF schemas that are not in 4NF.

Multivalued Dependencies

Functional dependencies rule out certain tuples from being in a relation. If A → B, then we cannot have two tuples with the same A value but different B values. Multivalued dependencies, on the other hand, do not rule out the existence of certain tuples. Instead, they require that other tuples of a certain form be present in the rela- tion. For this reason, functional dependencies sometimes are referred to as equality- generating dependencies, and multivalued dependencies are referred to as tuple- generating dependencies.

Let R be a relation schema and let α ⊆ R and β ⊆ R. The multivalued dependency

This deﬁnition is less complicated than it appears to be. Figure 7.16 gives a tabular picture of t1, t2, t3, and t4. Intuitively, the multivalued dependency α →→ β says that the relationship between α and β is independent of the relationship between α and R − β. If the multivalued dependency α →→ β is satisﬁed by all relations on schema R, then α →→ β is a trivial multivalued dependency on schema R. Thus, α →→ β is trivial if β ⊆ α or β ∪ α = R.

To illustrate the difference between functional and multivalued dependencies, we consider the BC-schema again, and the relation bc (BC-schema) of Figure 7.17. We must repeat the loan number once for each address a customer has, and we must repeat the address for each loan a customer has. This repetition is unnecessary, since the relationship between a customer and his address is independent of the relationship between that customer and a loan. If a customer (say, Smith) has a loan (say, loan number L-23), we want that loan to be associated with all Smith’s addresses. Thus, the relation of Figure 7.18 is illegal. To make this relation legal, we need to add the tuples (L-23, Smith, Main, Manchester) and (L-27, Smith, North, Rye) to the bc relation of Figure 7.18.

Comparing the preceding example with our deﬁnition of multivalued dependency, we see that we want the multivalued dependency

customer-name →→ customer-street customer-city

to hold. (The multivalued dependency customer-name →→ loan-number will do as well.

We shall soon see that they are equivalent.)

As with functional dependencies, we shall use multivalued dependencies in two ways:

1. To test relations to determine whether they are legal under a given set of functional and multivalued dependencies

2. To specify constraints on the set of legal relations; we shall thus concern our- selves with only those relations that satisfy a given set of functional and multivalued dependencies

Note that, if a relation r fails to satisfy a given multivalued dependency, we can construct a relation rl that does satisfy the multivalued dependency by adding tuples to r.

Let D denote a set of functional and multivalued dependencies. The closure D+ of D is the set of all functional and multivalued dependencies logically implied by D. As we did for functional dependencies, we can compute D+ from D, using the formal deﬁnitions of functional dependencies and multivalued dependencies. We can man- age with such reasoning for very simple multivalued dependencies. Luckily, multi- valued dependencies that occur in practice appear to be quite simple. For complex dependencies, it is better to reason about sets of dependencies by using a system of inference rules. (Section C.1.1 of the appendix outlines a system of inference rules for multivalued dependencies.)

From the deﬁnition of multivalued dependency, we can derive the following rule:

• If α → β, then α →→ β.

In other words, every functional dependency is also a multivalued dependency.

Deﬁnition of Fourth Normal Form

Consider again our BC-schema example in which the multivalued dependency customer-name →→ customer-street customer-city holds, but no nontrivial functional dependencies hold. We saw in the opening paragraphs of Section 7.8 that, although BC-schema is in BCNF, the design is not ideal, since we must repeat a customer’s address information for each loan. We shall see that we can use the given multivalued dependency to improve the database design, by decomposing BC-schema into a fourth normal form decomposition.

A relation schema R is in fourth normal form (4NF) with respect to a set D of functional and multivalued dependencies if, for all multivalued dependencies in D+ of the form α →→ β, where α ⊆ R and β ⊆ R, at least one of the following holds

• α →→ β is a trivial multivalued dependency.

• α is a superkey for schema R.

A database design is in 4NF if each member of the set of relation schemas that constitutes the design is in 4NF.

Note that the deﬁnition of 4NF differs from the deﬁnition of BCNF in only the use of multivalued dependencies instead of functional dependencies. Every 4NF schema is in BCNF. To see this fact, we note that, if a schema R is not in BCNF, then there is

a nontrivial functional dependency α → β holding on R, where α is not a superkey. Since α → β implies α →→ β, R cannot be in 4NF.

Let R be a relation schema, and let R1, R2,..., Rn be a decomposition of R. To check if each relation schema Ri in the decomposition is in 4NF, we need to ﬁnd what multivalued dependencies hold on each Ri. Recall that, fora set F of functional dependencies, the restriction Fi of F to Ri is all functional dependencies in F + that include only attributes of Ri. Now consider a set D of both functional and multivalued dependencies. The restriction of D to Ri is the set Di consisting of

1. All functional dependencies in D+ that include only attributes of Ri

2. All multivalued dependencies of the form

Decomposition Algorithm

The analogy between 4NF and BCNF applies to the algorithm for decomposing a schema into 4NF. Figure 7.19 shows the 4NF decomposition algorithm. It is identical to the BCNF decomposition algorithm of Figure 7.13, except that it uses multivalued, instead of functional, dependencies and uses the restriction of D+ to Ri.

If we apply the algorithm of Figure 7.19 to BC-schema, we ﬁnd that customer-name →→ loan-number is a nontrivial multivalued dependency, and customer-name is not a superkey for BC-schema. Following the algorithm, we replace BC-schema by two schemas:

This pair of schemas, which is in 4NF, eliminates the problem we encountered earlier with the redundancy of BC-schema.

As was the case when we were dealing solely with functional dependencies, we are interested in decompositions that are lossless-join decompositions and that pre- serve dependencies. The following fact about multivalued dependencies and lossless joins shows that the algorithm of Figure 7.19 generates only lossless-join decompositions:

• Let R be a relation schema, and let D be a set of functional and multivalued dependencies on R. Let R1 and R2 form a decomposition of R. This decomposition is a lossless-join decomposition of R if and only if at least one of the following multivalued dependencies is in D+:

More Normal Forms

The fourth normal form is by no means the “ultimate” normal form. As we saw earlier, multivalued dependencies help us understand and tackle some forms of repetition of information that cannot be understood in terms of functional dependencies. There are types of constraints called join dependencies that generalize multi- valued dependencies, and lead to another normal form called project-join normal form (PJNF) (PJNF is called ﬁfth normal form in some books). There is a class of even more general constraints, which leads to a normal form called domain-key normal form.

A practical problem with the use of these generalized constraints is that they are not only hard to reason with, but there is also no set of sound and complete inference rules for reasoning about the constraints. Hence PJNF and domain-key normal form are used quite rarely. Appendix C provides more details about these normal forms.

Conspicuous by its absence from our discussion of normal forms is second normal form (2NF). We have not discussed it, because it is of historical interest only. We simply deﬁne it, and let you experiment with it in Exercise 7.26.

Search This Blog

Database Management System course