Data-Deﬁnition Language

In most of our discussions of SQL and relational databases, we have accepted a set of relations as given. Of course, the set of relations in a database must be speciﬁed to the system by means of a data deﬁnition language (DDL).

The SQL DDL allows speciﬁcation of not only a set of relations, but also information about each relation, including

• The schema for each relation

• The domain of values associated with each attribute

• The integrity constraints

• The set of indices to be maintained for each relation

• The security and authorization information for each relation

• The physical storage structure of each relation on disk

We discuss here schema deﬁnition and domain values; we defer discussion of the other SQL DDL features to Chapter 6.

Domain Types in SQL

The SQL standard supports a variety of built-in domain types, including:

• char(n): A ﬁxed-length character string with user-speciﬁed length n. The full form, character, can be used instead.

• varchar(n): A variable-length character string with user-speciﬁed maximum length n. The full form, character varying, is equivalent.

• int: An integer (a ﬁnite subset of the integers that is machine dependent). The full form, integer, is equivalent.

• smallint: A small integer (a machine-dependent subset of the integer domain type).

• numeric(p, d): A ﬁxed-point number with user-speciﬁed precision. The number consists of p digits (plus a sign), and d of the p digits are to the right of the decimal point. Thus, numeric(3,1) allows 44.5 to be stored exactly, but neither 444.5 or 0.32 can be stored exactly in a ﬁeld of this type.

• real, double precision: Floating-point and double-precision ﬂoating-point numbers with machine-dependent precision.

• ﬂoat(n): A ﬂoating-point number, with precision of at least n digits.

• date: A calendar date containing a (four-digit) year, month, and day of the month.

• time: The time of day, in hours, minutes, and seconds. A variant, time(p), can be used to specify the number of fractional digits for seconds (the default being 0). It is also possible to store time zone information along with the time.

• timestamp: A combination of date and time. A variant, timestamp(p), can be used to specify the number of fractional digits for seconds (the default here being 6).

Date and time values can be speciﬁed like this:

date ’2001-04-25’

time ’09:30:00’

timestamp ’2001-04-25 10:29:01.45’

Dates must be speciﬁed in the format year followed by month followed by day, as shown. The seconds ﬁeld of time or timestamp can have a fractional part, as in the timestamp above. We can use an expression of the form cast e as t to convert a character string (or string valued expression) e to the type t, where t is one of date, time, or timestamp. The string must be in the appropriate format as illustrated at the be- ginning of this paragraph.

To extract individual ﬁelds of a date or time value d, we can use extract (ﬁeld from d), where ﬁeld can be one of year, month, day, hour, minute, or second.

SQL allows comparison operations on all the domains listed here, and it allows both arithmetic and comparison operations on the various numeric domains. SQL also provides a data type called interval, and it allows computations based on dates and times and on intervals. For example, if x and y are of type date, then x − y is an interval whose value is the number of days from date x to date y. Similarly, adding or subtracting an interval to a date or time gives back a date or time, respectively.

It is often useful to compare values from compatible domains. For example, since every small integer is an integer, a comparison x < y, where x is a small integer and y is an integer (or vice versa), makes sense. We make such a comparison by casting small integer x as an integer. A transformation of this sort is called a type coercion.

Type coercion is used routinely in common programming languages, as well as in database systems.

As an illustration, suppose that the domain of customer-name is a character string of length 20, and the domain of branch-name is a character string of length 15. Although the string lengths might differ, standard SQL will consider the two domains compatible.

As we discussed in Chapter 3, the null value is a member of all domains. For certain attributes, however, null values may be inappropriate. Consider a tuple in the customer relation where customer-name is null. Such a tuple gives a street and city for an anonymous customer; thus, it does not contain useful information. In cases such as this, we wish to forbid null values, and we do so by restricting the domain of customer-name to exclude null values.

SQL allows the domain declaration of an attribute to include the speciﬁcation not null and thus prohibits the insertion of a null value for this attribute. Any database modiﬁcation that would cause a null to be inserted in a not null domain generates an error diagnostic. There are many situations where we want to avoid null values. In particular, it is essential to prohibit null values in the primary key of a relation schema. Thus, in our bank example, in the customer relation, we must prohibit a null value for the attribute customer-name, which is the primary key for customer.

Schema Deﬁnition in SQL

We deﬁne an SQL relation by using the create table command:

where r is the name of the relation, each Ai is the name of an attribute in the schema of relation r, and Di is the domain type of values in the domain of attribute Ai. The allowed integrity constraints include

• primary key (Aj1 , Aj2 ,... , Ajm ): The primary key speciﬁcation says that at- tributes Aj1 , Aj2 ,... , Ajm form the primary key for the relation. The primary key attributes are required to be non-null and unique; that is, no tuple can have a null value for a primary key attribute, and no two tuples in the relation can be equal on all the primary-key attributes.1 Although the primary key speciﬁ- cation is optional, it is generally a good idea to specify a primary key for each relation.

• check(P): The check clause speciﬁes a predicate P that must be satisﬁed by every tuple in the relation.

The create table command also includes other integrity constraints, which we shall discuss in Chapter 6.

Figure 4.8 presents a partial SQL DDL deﬁnition of our bank database. Note that, as in earlier chapters, we do not attempt to model precisely the real world in the bank-database example. In the real world, multiple people may have the same name, so customer-name would not be a primary key customer; a customer-id would more likely be used as a primary key. We use customer-name as a primary key to keep our database schema simple and short.

If a newly inserted or modiﬁed tuple in a relation has null values for any primary key attribute, or if the tuple has the same value on the primary-key attributes as does another tuple in the relation, SQL ﬂags an error and prevents the update. Similarly, it ﬂags an error and prevents the update if the check condition on the tuple fails.

By default null is a legal value for every attribute in SQL, unless the attribute is speciﬁcally stated to be not null. An attribute can be declared to be not null in the following way:

account-number char(10) not null

SQL also supports an integrity constraint

unique (Aj1 , Aj2 ,..., Ajm )

The unique speciﬁcation says that attributes Aj1 , Aj2 ,... , Ajm form a candidate key; that is, no two tuples in the relation can be equal on all the primary-key attributes. However, candidate key attributes are permitted to be null unless they have explicitly been declared to be not null. Recall that a null value does not equal any other value. The treatment of nulls here is the same as that of the unique construct deﬁned in Section 4.6.4.

A common use of the check clause is to ensure that attribute values satisfy speciﬁed conditions, in effect creating a powerful type system. For instance, the check clause in the create table command for relation branch checks that the value of assets is nonnegative. As another example, consider the following:

Here, we use the check clause to simulate an enumerated type, by specifying that degree-level must be one of ’Bachelors’, ’Masters’, or ’Doctorate’. We consider more general forms of check conditions, as well as a class of constraints called referential integrity constraints, in Chapter 6.

A newly created relation is empty initially. We can use the insert command to load data into the relation. Many relational-database products have special bulk loader utilities to load an initial set of tuples into a relation.

To remove a relation from an SQL database, we use the drop table command. The drop table command deletes all information about the dropped relation from the database. The command

drop table r

is a more drastic action than

delete from r

The latter retains relation r, but deletes all tuples in r. The former deletes not only all tuples of r, but also the schema for r. After r is dropped, no tuples can be inserted into r unless it is re-created with the create table command.

We use the alter table command to add attributes to an existing relation. All tuples in the relation are assigned null as the value for the new attribute. The form of the

alter table command is

alter table r add AD

where r is the name of an existing relation, A is the name of the attribute to be added, and D is the domain of the added attribute. We can drop attributes from a relation by the command

alter table r drop A

where r is the name of an existing relation, and A is the name of an attribute of the relation. Many database systems do not support dropping of attributes, although they will allow an entire table to be dropped.

Search This Blog

Database Management System course