Relational-Database Design:Third Normal Form
Third Normal Form
As we saw earlier, there are relational schemas where a BCNF decomposition cannot be dependency preserving. For such schemas, we have two alternatives if we wish to check if an update violates any functional dependencies:
• Pay the extra cost of computing joins to test for violations.
• Use an alternative decomposition, third normal form (3NF), which we present below, which makes testing of updates cheaper. Unlike BCNF, 3NF decompositions may contain some redundancy in the decomposed schema.
We shall see that it is always possible to find a lossless-join, dependency-preserving decomposition that is in 3NF. Which of the two alternatives to choose is a design decision to be made by the database designer on the basis of the application requirements.
Definition
BCNF requires that all nontrivial dependencies be of the form α → β, where α is a superkey. 3NF relaxes this constraint slightly by allowing nontrivial functional de- pendencies whose left side is not a superkey.
A relation schema R is in third normal form (3NF) with respect to a set F of functional dependencies if, for all functional dependencies in F + of the form α → β, where α ⊆ R and β ⊆ R, at least one of the following holds:
• α → β is a trivial functional dependency.
• α is a superkey for R.
• Each attributeq A in β − α is contained in a candidate key for R.
Note that the third condition above does not say that a single candidate key should contain all the attributes in β − α; each attribute A in β − α may be contained in a different candidate key.
The first two alternatives are the same as the two alternatives in the definition of BCNF. The third alternative of the 3NF definition seems rather unintuitive, and it is not obvious why it is useful. It represents, in some sense, a minimal relaxation of the BCNF conditions that helps ensure that every schema has a dependency-preserving decomposition into 3NF. Its purpose will become more clear later, when we study decomposition into 3NF.
Observe that any schema that satisfies BCNF also satisfies 3NF, since each of its functional dependencies would satisfy one of the first two alternatives. BCNF is therefore a more restrictive constraint than is 3NF.
The definition of 3NF allows certain functional dependencies that are not allowed in BCNF. A dependency α → β that satisfies only the third alternative of the 3NF
definition is not allowed in BCNF, but is allowed in 3NF.1
Let us return to our Banker-schema example (Section 7.6). We have shown that this relation schema does not have a dependency-preserving, lossless-join decomposition into BCNF. This schema, however, turns out to be in 3NF. To see that it is, we note that {customer-name, branch-name} is a candidate key for Banker-schema, so the only attribute not contained in a candidate key for Banker-schema is banker-name. The only nontrivial functional dependencies of the form
α → banker-name
include {customer-name, branch-name} as part of α. Since {customer-name, branch-name} is a candidate key, these dependencies do not violate the definition of 3NF.
As an optimization when testing for 3NF, we can consider only functional dependencies in the given set F , rather than in F +. Also, we can decompose the dependencies in F so that their right-hand side consists of only single attributes, and use the resultant set in place of F .
Given a dependency α → β, we can use the same attribute-closure – based technique that we used for BCNF to check if α is a superkey. If α is not a superkey, we have to verify whether each attribute in β is contained in a candidate key of R; this test is rather more expensive, since it involves finding candidate keys. In fact, testing for 3NF has been shown to be NP-hard; thus, it is very unlikely that there is a polynomial time complexity algorithm for the task.
Decomposition Algorithm
Figure 7.14 shows an algorithm for finding a dependency-preserving, lossless-join decomposition into 3NF. The set of dependencies Fc used in the algorithm is a canonical
cover for F. Note that the algorithm considers the set of schemas Rj ,j = 1, 2,... , i; initially i = 0, and in this case the set is empty.
To illustrate the algorithm of Figure 7.14, consider the following extension to the Banker-schema in Section 7.6:
Banker-info-schema = (branch-name, customer-name, banker-name, office-number)
The main difference here is that we include the banker’s office number as part of the information. The functional dependencies for this relation schema are
banker-name → branch-name office-number
customer-name branch-name → banker-name
The for loop in the algorithm causes us to include the following schemas in our decomposition:
Banker-office-schema = (banker-name, branch-name, office-number)
Banker-schema = (customer-name, branch-name, banker-name)
Since Banker-schema contains a candidate key for Banker-info-schema, we are finished with the decomposition process.
The algorithm ensures the preservation of dependencies by explicitly building a schema for each dependency in a canonical cover. It ensures that the decomposition is a lossless-join decomposition by guaranteeing that at least one schema contains a candidate key for the schema being decomposed. Exercise 7.19 provides some insight into the proof that this suffices to guarantee a lossless join.
This algorithm is also called the 3NF synthesis algorithm, since it takes a set of dependencies and adds one schema at a time, instead of decomposing the initial schema repeatedly. The result is not uniquely defined, since a set of functional dependencies can have more than one canonical cover, and, further, in some cases the result of the algorithm depends on the order in which it considers the dependencies in Fc.
If a relation Ri is in the decomposition generated by the synthesis algorithm, then Ri is in 3NF. Recall that when we test for 3NF, it suffices to consider functional dependencies whose right-hand side is a single attribute. Therefore, to see that Ri is in 3NF, you must convince yourself that any functional dependency γ → B that holds on Ri satisfies the definition of 3NF. Assume that the dependency that generated Ri in the synthesis algorithm is α → β. Now, B must be in α or β, since B is in Ri and α → β generated Ri. Let us consider the three possible cases:
• B is in both α and β. In this case, the dependency α → β would not have been in Fc since B would be extraneous in β. Thus, this case cannot hold.
• B is in β but not α. Consider two cases:
Interestingly, the algorithm we described for decomposition into 3NF can be implemented in polynomial time, even though testing a given relation to see if it satisfies 3NF is NP-hard.
Comparison of BCNF and 3NF
Of the two normal forms for relational-database schemas, 3NF and BCNF, there are advantages to 3NF in that we know that it is always possible to obtain a 3NF design without sacrificing a lossless join or dependency preservation. Nevertheless, there are disadvantages to 3NF: If we do not eliminate all transitive relations schema dependencies, we may have to use null values to represent some of the possible meaningful relationships among data items, and there is the problem of repetition of information.
As an illustration of the null value problem, consider again the Banker-schema and its associated functional dependencies. Since banker-name → branch-name, we may want to represent relationships between values for banker-name and values for branch- name in our database. If we are to do so, however, either there must be a corresponding value for customer-name, or we must use a null value for the attribute customer- name.
As an illustration of the repetition of information problem, consider the instance of Banker-schema in Figure 7.15. Notice that the information indicating that Johnson is working at the Perryridge branch is repeated.
Recall that our goals of database design with functional dependencies are:
1. BCNF
2. Lossless join
3. Dependency preservation
Since it is not always possible to satisfy all three, we may be forced to choose between BCNF and dependency preservation with 3NF.
It is worth noting that SQL does not provide a way of specifying functional dependencies, except for the special case of declaring superkeys by using the primary key or unique constraints. It is possible, although a little complicated, to write assertions that enforce a functional dependency (see Exercise 7.15); unfortunately, testing the assertions would be very expensive in most database systems. Thus even if we had a dependency-preserving decomposition, if we use standard SQL we would not be able to efficiently test a functional dependency whose left-hand side is not a key.
Although testing functional dependencies may involve a join if the decomposition is not dependency preserving, we can reduce the cost by using materialized views, which many database systems support. Given a BCNF decomposition that is not dependency preserving, we consider each dependency in a minimum cover Fc that is not preserved in the decomposition. For each such dependency α → β, we define a materialized view that computes a join of all relations in the decomposition, and projects the result on αβ. The functional dependency can be easily tested on the ma- terialized view, by means of a constraint unique (α). On the negative side, there is a space and time overhead due to the materialized view, but on the positive side, the application programmer need not worry about writing code to keep redundant data consistent on updates; it is the job of the database system to maintain the materialized view, that is, keep up up to date when the database is updated. (Later in the book, in Section 14.5, we outline how a database system can perform materialized view maintenance efficiently.)
Thus, in case we are not able to get a dependency-preserving BCNF decomposition, it is generally preferable to opt for BCNF, and use techniques such as materialized views to reduce the cost of checking functional dependencies.
Comments
Post a Comment