Integrity and Security:Referential Integrity
Referential Integrity
Often, we wish to ensure that a value that appears in one relation for a given set of attributes also appears for a certain set of attributes in another relation. This condition is called referential integrity.
Basic Concepts
Consider a pair of relations r(R) and s(S), and the natural join r s. There may beatuple tr in r that does not join with any tuple in s. That is, there is no ts in s such that tr [R ∩ S] = ts[R ∩ S]. Such tuples are called dangling tuples. Depending on the entity set or relationship set being modeled, dangling tuples may or may not be acceptable.
In Section 3.3.3, we considered a modified form of join — the outer join — to operate on relations containing dangling tuples. Here, our concern is not with queries, but rather with when we should permit dangling tuples to exist in the database.
Suppose there is a tuple t1 in the account relation with t1[branch-name] = “Lunartown,” but there is no tuple in the branch relation for the Lunartown branch. This situation would be undesirable. We expect the branch relation to list all bank branches.
Therefore, tuple t1 would refer to an account at a branch that does not exist. Clearly, we would like to have an integrity constraint that prohibits dangling tuples of this sort.
Not all instances of dangling tuples are undesirable, however. Assume that there is a tuple t2 in the branch relation with t2[branch-name] = “Mokan,” but there is no tuple in the account relation for the Mokan branch. In this case, a branch exists that has no accounts. Although this situation is not common, it may arise when a branch is opened or is about to close. Thus, we do not want to prohibit this situation.
The distinction between these two examples arises from two facts:
• The attribute branch-name in Account-schema is a foreign key referencing the primary key of Branch-schema.
• The attribute branch-name in Branch-schema is not a foreign key.
(Recall from Section 3.1.3 that a foreign key is a set of attributes in a relation schema that forms a primary key for another schema.)
In the Lunartown example, tuple t1 in account has a value on the foreign key branch-name that does not appear in branch. In the Mokan-branch example, tuple t2 in branch has a value on branch-name that does not appear in account, but branch-name is not a foreign key. Thus, the distinction between our two examples of dangling tuples is the presence of a foreign key.
Let r1(R1) and r2(R2) be relations with primary keys K1 and K2, respectively. We say that a subset α of R2 is a foreign key referencing K1 in relation r1 if it is required that, for every t2 in r2, there must be a tuple t1 in r1 such that t1[K1] = t2[α]. Re- quirements of this form are called referential integrity constraints, or subset dependencies. The latter term arises because the preceding referential-integrity constraint can be written as Πα (r2) ⊆ ΠK1 (r1). Note that, for a referential-integrity constraint to make sense, either α must be equal to K1, or α and K1 must be compatible sets of attributes.
Referential Integrity and the E-R Model
Referential-integrity constraints arise frequently. If we derive our relational-database schema by constructing tables from E-R diagrams, as we did in Chapter 2, then every
relation arising from a relationship set has referential-integrity constraints. Figure 6.1 shows an n-ary relationship set R, relating entity sets E1, E2,... , En. Let Ki denote the primary key of Ei. The attributes of the relation schema for relationship set R include K1 ∪ K2 ∪ ··· ∪ Kn. The following referential integrity constraints are then present: For each i, Ki in the schema for R is a foreign key referencing Ki in the relation schema generated from entity set Ei Another source of referential-integrity constraints is weak entity sets. Recall from Chapter 2 that the relation schema for a weak entity set must include the primary key of the entity set on which the weak entity set depends. Thus, the relation schema for each weak entity set includes a foreign key that leads to a referential-integrity constraint.
Database Modification
Database modifications can cause violations of referential integrity. We list here the test that we must make for each type of database modification to preserve the follow- ing referential-integrity constraint:
If this set is not empty, either the delete command is rejected as an error, or the tuples that reference t1 must themselves be deleted. The latter solution may lead to cascading deletions, since tuples may reference tuples that reference t1, and so on.
• Update. We must consider two cases for update: updates to the referencing relation (r2), and updates to the referenced relation (r1).
If a tuple t2 is updated in relation r2, and the update modifies values for the foreign key α, then a test similar to the insert case is made. Let t2l denote the new value of tuple t2. The system must ensure that
If a tuple t1 is updated in r1, and the update modifies values for the primary key (K), then a test similar to the delete case is made. The system must compute
using the old value of t1 (the value before the update is applied). If this set is not empty, the update is rejected as an error, or the update is cascaded in a manner similar to delete.
Referential Integrity in SQL
Foreign keys can be specified as part of the SQL create table statement by using the foreign key clause. We illustrate foreign-key declarations by using the SQL DDL definition of part of our bank database, shown in Figure 6.2.
By default, a foreign key references the primary key attributes of the referenced table. SQL also supports a version of the references clause where a list of attributes of the referenced relation can be specified explicitly. The specified list of attributes must be declared as a candidate key of the referenced relation.
We can use the following short form as part of an attribute definition to declare that the attribute forms a foreign key:
When a referential-integrity constraint is violated, the normal procedure is to reject the action that caused the violation. However, a foreign key clause can specify that if a delete or update action on the referenced relation violates the constraint, then, instead of rejecting the action, the system must take steps to change the tuple in the referencing relation to restore the constraint. Consider this definition of an integrity constraint on the relation account:
Because of the clause on delete cascade associated with the foreign-key declaration, if a delete of a tuple in branch results in this referential-integrity constraint being violated, the system does not reject the delete. Instead, the delete “cascades” to the
account relation, deleting the tuple that refers to the branch that was deleted. Similarly, the system does not reject an update to a field referenced by the constraint if it violates the constraint; instead, the system updates the field branch-name in the referencing tuples in account to the new value as well. SQL also allows the foreign key clause to specify actions other than cascade, if the constraint is violated: The referencing field (here, branch-name) can be set to null (by using set null in place of cascade), or to the default value for the domain (by using set default).
If there is a chain of foreign-key dependencies across multiple relations, a deletion or update at one end of the chain can propagate across the entire chain. An interesting case where the foreign key constraint on a relation references the same relation appears in Exercise 6.4. If a cascading update or delete causes a constraint violation that cannot be handled by a further cascading operation, the system aborts the trans- action. As a result, all the changes caused by the transaction and its cascading actions are undone.
Null values complicate the semantics of referential integrity constraints in SQL. Attributes of foreign keys are allowed to be null, provided that they have not other-wise been declared to be non-null. If all the columns of a foreign key are non-null in a given tuple, the usual definition of foreign-key constraints is used for that tuple. If any of the foreign-key columns is null, the tuple is defined automatically to satisfy the constraint.
This definition may not always be the right choice, so SQL also provides constructs that allow you to change the behavior with null values; we do not discuss the constructs here. To avoid such complexity, it is best to ensure that all columns of a foreign key specification are declared to be non-null.
Transactions may consist of several steps, and integrity constraints may be violated temporarily after one step, but a later step may remove the violation. For instance, suppose we have a relation marriedperson with primary key name, and an at- tribute spouse, and suppose that spouse is a foreign key on marriedperson. That is, the constraint says that the spouse attribute must contain a name that is present in the per- son table. Suppose we wish to note the fact that John and Mary are married to each other by inserting two tuples, one for John and one for Mary, in the above relation. The insertion of the first tuple would violate the foreign key constraint, regardless of which of the two tuples is inserted first. After the second tuple is inserted the foreign key constraint would hold again.
To handle such situations, integrity constraints are checked at the end of a trans- action, and not at intermediate steps.1
Comments
Post a Comment