Relational-Database Design:Boyce – Codd Normal Form
Boyce – Codd Normal Form
Using functional dependencies, we can define several normal forms that represent “good” database designs. In this section we cover BCNF (defined below), and later, in Section 7.7, we cover 3NF.
Definition
One of the more desirable normal forms that we can obtain is Boyce – Codd normal form (BCNF). A relation schema R is in BCNF with respect to a set F of functional dependencies if, for all functional dependencies in F + of the form α → β, where α ⊆ R and β ⊆ R, at least one of the following holds:
• α → β is a trivial functional dependency (that is, β ⊆ α).
• α is a superkey for schema R.
A database design is in BCNF if each member of the set of relation schemas that constitutes the design is in BCNF.
As an illustration, consider the following relation schemas and their respective functional dependencies:
• Customer-schema = (customer-name, customer-street, customer-city) customer-name → customer-street customer-city
• Branch-schema = (branch-name, assets, branch-city) branch-name → assets branch-city
• Loan-info-schema = (branch-name, customer-name, loan-number, amount) loan-number → amount branch-name
We claim that Customer-schema is in BCNF. We note that a candidate key for the schema is customer-name. The only nontrivial functional dependencies that hold on Customer-schema have customer-name on the left side of the arrow. Since customer-name is a candidate key, functional dependencies with customer-name on the left side do not violate the definition of BCNF. Similarly, it can be shown easily that the relation schema Branch-schema is in BCNF.
The schema Loan-info-schema, however, is not in BCNF. First, note that loan-number is not a superkey for Loan-info-schema, since we could have a pair of tuples representing a single loan made to two people — for example, (Downtown, John Bell, L-44, 1000) (Downtown, Jane Bell, L-44, 1000) Because we did not list functional dependencies that rule out the preceding case, loan- number is not a candidate key. However, the functional dependency loan-number → amount is nontrivial. Therefore, Loan-info-schema does not satisfy the definition of BCNF.
We claim that Loan-info-schema is not in a desirable form, since it suffers from the problem of repetition of information that we described in Section 7.2. We observe that, if there are several customer names associated with a loan, in a relation on Loan-info- schema, then we are forced to repeat the branch name and the amount once for each customer. We can eliminate this redundancy by redesigning our database such that all schemas are in BCNF. One approach to this problem is to take the existing non- BCNF design as a starting point, and to decompose those schemas that are not in BCNF. Consider the decomposition of Loan-info-schema into two schemas:
applies to the Loan-schema, and that only trivial functional dependencies apply to Borrower-schema. Although loan-number is not a superkey for Loan-info-schema, it is a candidate key for Loan-schema. Thus, both schemas of our decomposition are in BCNF. It is now possible to avoid redundancy in the case where there are several customers associated with a loan. There is exactly one tuple for each loan in the relation on Loan-schema, and one tuple for each customer of each loan in the relation on Borrower-schema. Thus, we do not have to repeat the branch name and the amount once for each customer associated with a loan.
Often testing of a relation to see if it satisfies BCNF can be simplified:
• To check if a nontrivial dependency α → β causes a violation of BCNF, compute α+ (the attribute closure of α), and verify that it includes all attributes of R; that is, it is a superkey of R.
• To check if a relation schema R is in BCNF, it suffices to check only the dependencies in the given set F for violation of BCNF, rather than check all dependencies in F +.
We can show that if none of the dependencies in F causes a violation of BCNF, then none of the dependencies in F + will cause a violation of BCNF either.
Unfortunately, the latter procedure does not work when a relation is decomposed. That is, it does not suffice to use F when we test a relation Ri, in a decomposition of R, for violation of BCNF. For example, consider relation schema R (A, B, C, D, E), with functional dependencies F containing A → B and BC → D. Suppose this were decomposed into R1(A, B) and R2(A, C, D, E). Now, neither of the dependencies in F contains only attributes from (A, C, D, E) so we might be misled into thinking R2 satisfies BCNF. In fact, there is a dependency AC → D in F + (which can be inferred using the pseudotransitivity rule from the two dependencies in F ), which shows that R2 is not in BCNF. Thus, we may need a dependency that is in F +, but is not in F , to show that a decomposed relation is not in BCNF.
An alternative BCNF test is sometimes easier than computing every dependency in F +. To check if a relation Ri in a decomposition of R is in BCNF, we apply this test:
• For every subset α of attributes in Ri, check that α+ (the attribute closure of α under F ) either includes no attribute of Ri − α, or includes all attributes of Ri.
If the condition is violated by some set of attributes α in Ri, consider the following functional dependency, which can be shown to be present in F +: α → (α+ − α) ∩ Ri.
The above dependency shows that Ri violates BCNF, and is a “witness” for the violation. The BCNF decomposition algorithm, which we shall see in Section 7.6.2, makes use of the witness.
Decomposition Algorithm
We are now able to state a general method to decompose a relation schema so as to satisfy BCNF. Figure 7.13 shows an algorithm for this task. If R is not in BCNF, we can decompose R into a collection of BCNF schemas R1, R2,... , Rn by the algorithm. The algorithm uses dependencies (“witnesses”) that demonstrate violation of BCNF to perform the decomposition.
The decomposition that the algorithm generates is not only in BCNF, but is also a lossless-join decomposition. To see why our algorithm generates only lossless-join decompositions, we note that, when we replace a schema Ri with (Ri − β) and (α, β), the dependency α → β holds, and (Ri − β) ∩ (α, β) = α.
We apply the BCNF decomposition algorithm to the Lending-schema schema that we used in Section 7.2 as an example of a poor database design:
Thus, the decomposition of Lending-schema results in the three relation schemas Branch- schema, Loan-schema, and Borrower-schema, each of which is in BCNF. These relation schemas are the same as those in Section 7.5, where we demonstrated that the resulting decomposition is both a lossless-join decomposition and a dependency-preserving decomposition.
The BCNF decomposition algorithm takes time exponential in the size of the initial schema, since the algorithm for checking if a relation in the decomposition satisfies BCNF can take exponential time. The bibliographical notes provide references to an algorithm that can compute a BCNF decomposition in polynomial time. However, the algorithm may “overnormalize,” that is, decompose a relation unnecessarily.
Dependency Preservation
Not every BCNF decomposition is dependency preserving. As an illustration, con- sider the relation schema
The decomposed schemas preserve only banker-name → branch-name (and trivial dependencies), but the closure of {banker-name → branch-name} does not include customer-name branch-name → banker-name. The violation of this dependency cannot be detected unless a join is computed.
To see why the decomposition of Banker-schema into the schemas Banker-branch-schema and Customer-banker-schema is not dependency preserving, we apply the algorithm of Figure 7.12. We find that the restrictions F1 and F2 of F to each schema are:
(For brevity, we do not show trivial functional dependencies.) It is easy to see that the dependency customer-name branch-name → banker-name is not in (F1 ∪ F2)+ even though it is in F +. Therefore, (F1 ∪ F2)+ j= F +, and the decomposition is not dependency preserving.
This example demonstrates that not every BCNF decomposition is dependency preserving. Moreover, it is easy to see that any BCNF decomposition of Banker-schema must fail to preserve customer-name branch-name → banker-name. Thus, the example shows that we cannot always satisfy all three design goals:
1. Lossless join
2. BCNF
3. Dependency preservation
Recall that lossless join is an essential condition for a decomposition, to avoid loss of information. We are therefore forced to give up either BCNF or dependency preservation. In Section 7.7 we present an alternative normal form, called third normal form, which is a small relaxation of BCNF; the motivation for using third normal form is that there is always a dependency preserving decomposition into third nor- mal form.
There are situations where there is more than one way to decompose a schema into BCNF. Some of these decompositions may be dependency preserving, while others may not. For instance, suppose we have a relation schema R(A, B, C) with the functional dependencies A → B and B → C. From this set we can derive the further dependency A → C. If we used the dependency A → B (or equivalently, A → C) to decompose R, we would end up with two relations R1(A, B) and R2(A, C); the dependency B → C would not be preserved.
If instead we used the dependency B → C to decompose R, we would end up with two relations R1(A, B) and R2(B, C), which are in BCNF, and the decomposition is also dependency preserving. Clearly the decomposition into R1(A, B) and R2(B, C) is preferable. In general, the database designer should therefore look at alternative decompositions, and pick a dependency preserving decomposition where possible.
Comments
Post a Comment