The Uniﬁed Modeling Language UML

Entity-relationship diagrams help model the data representation component of a soft- ware system. Data representation, however, forms only one part of an overall system design. Other components include models of user interactions with the system, speciﬁcation of functional modules of the system and their interaction, etc. The Uniﬁed Modeling Language (UML), is a proposed standard for creating speciﬁcations of various components of a software system. Some of the parts of UML are:

• Class diagram. A class diagram is similar to an E-R diagram. Later in this section we illustrate a few features of class diagrams and how they relate to E-R diagrams.

• Use case diagram. Use case diagrams show the interaction between users and the system, in particular the steps of tasks that users perform (such as with- drawing money or registering for a course).

• Activity diagram. Activity diagrams depict the ﬂow of tasks between various components of a system.

• Implementation diagram. Implementation diagrams show the system components and their interconnections, both at the software component level and the hardware component level.

We do not attempt to provide detailed coverage of the different parts of UML here. See the bibliographic notes for references on UML. Instead we illustrate some features of UML through examples.

Figure 2.28 shows several E-R diagram constructs and their equivalent UML class diagram constructs. We describe these constructs below. UML shows entity sets as boxes and, unlike E-R, shows attributes within the box rather than as separate ellipses. UML actually models objects, whereas E-R models entities. Objects are like entities, and have attributes, but additionally provide a set of functions (called methods) that can be invoked to compute values on the basis of attributes of the objects, or to update the object itself. Class diagrams can depict methods in addition to at- tributes. We cover objects in Chapter 8.

We represent binary relationship sets in UML by just drawing a line connecting the entity sets. We write the relationship set name adjacent to the line. We may also specify the role played by an entity set in a relationship set by writing the role name on the line, adjacent to the entity set. Alternatively, we may write the relationship set name in a box, along with attributes of the relationship set, and connect the box by a dotted line to the line depicting the relationship set. This box can then be treated as

an entity set, in the same way as an aggregation in E-R diagrams and can participate in relationships with other entity sets.

Nonbinary relationships cannot be directly represented in UML — they have to be converted to binary relationships by the technique we have seen earlier in Section 2.4.3.

Cardinality constraints are speciﬁed in UML in the same way as in E-R diagrams, in the form l..h, where l denotes the minimum and h the maximum number of relation- ships an entity can participate in. However, you should be aware that the positioning of the constraints is exactly the reverse of the positioning of constraints in E-R diagrams, as shown in Figure 2.28. The constraint 0..∗ on the E2 side and 0..1 on the E1 side means that each E2 entity can participate in at most one relationship, whereas each E1 entity can participate in many relationships; in other words, the relationship is many to one from E2 to E1.

Single values such as 1 or ∗ may be written on edges; the single value 1 on an edge is treated as equivalent to 1..1, while ∗ is equivalent to 0..∗.

We represent generalization and specialization in UML by connecting entity sets by a line with a triangle at the end corresponding to the more general entity set.

For instance, the entity set person is a generalization of customer and employee. UML diagrams can also represent explicitly the constraints of disjoint/overlapping on generalizations. Figure 2.28 shows disjoint and overlapping generalizations of customer and employee to person. Recall that if the customer/employee to person generalization is disjoint, it means that no one can be both a customer and an employee. An overlapping generalization allows a person to be both a customer and an employee.

Summary

• The entity-relationship (E-R) data model is based on a perception of a real world that consists of a set of basic objects called entities, and of relationships among these objects.

• The model is intended primarily for the database-design process. It was developed to facilitate database design by allowing the speciﬁcation of an enterprise schema. Such a schema represents the overall logical structure of the database. This overall structure can be expressed graphically by an E-R diagram.

• An entity is an object that exists in the real world and is distinguishable from other objects. We express the distinction by associating with each entity a set of attributes that describes the object.

• A relationship is an association among several entities. The collection of all entities of the same type is an entity set, and the collection of all relationships of the same type isa relationship set.

• Mapping cardinalities express the number of entities to which another entity can be associated via a relationship set.

• A superkey of an entity set is a set of one or more attributes that, taken collectively, allows us to identify uniquely an entity in the entity set. We choose a minimal superkey for each entity set from among its superkeys; the minimal superkey is termed the entity set’s primary key. Similarly, a relationship set is a set of one or more attributes that, taken collectively, allows us to identify uniquely a relationship in the relationship set. Likewise, we choose a minimal superkey for each relationship set from among its superkeys; this is the relationship set’s primary key.

• An entity set that does not have sufﬁcient attributes to form a primary key is termed a weak entity set. An entity set that has a primary key is termed a strong entity set.

• Specialization and generalization deﬁne a containment relationship between a higher-level entity set and one or more lower-level entity sets. Specialization is the result of taking a subset of a higher-level entity set to form a lower- level entity set. Generalization is the result of taking the union of two or more disjoint (lower-level) entity sets to produce a higher-level entity set. The at- tributes of higher-level entity sets are inherited by lower-level entity sets.

• Aggregation is an abstraction in which relationship sets (along with their associated entity sets) are treated as higher-level entity sets, and can participate in relationships.

• The various features of the E-R model offer the database designer numerous choices in how to best represent the enterprise being modeled. Concepts and objects may, in certain cases, be represented by entities, relationships, or at- tributes. Aspects of the overall structure of the enterprise may be best de- scribed by using weak entity sets, generalization, specialization, or aggregation. Often, the designer must weigh the merits of a simple, compact model versus those of a more precise, but more complex, one.

• A database that conforms to an E-R diagram can be represented by a collection of tables. For each entity set and for each relationship set in the database, there is a unique table that is assigned the name of the corresponding entity set or relationship set. Each table has a number of columns, each of which has a unique name. Converting database representation from an E-R diagram to a table format is the basis for deriving a relational-database design from an E-R diagram.

• The uniﬁed modeling language (UML) provides a graphical means of modeling various components of a software system. The class diagram component of UML is based on E-R diagrams. However, there are some differences be- tween the two that one must beware of.

Review Terms

• Entity-relationship data model

• Entity

• Entity set

• Attributes

• Domain

• Simple and composite attributes

• Single-valued and multivalued attributes

• Null value

• Derived attribute

• Relationship, and relationship set

• Role

Exercises

Explain the distinctions among the terms primary key, candidate key, and superkey.

Construct an E-R diagram for a car-insurance company whose customers own one or more cars each. Each car has associated with it zero to any number of recorded accidents.

Construct an E-R diagram for a hospital with a set of patients and a set of medical doctors. Associate with each patient a log of the various tests and examinations conducted.

A university registrar’s ofﬁce maintains data about the following entities: (a) courses, including number, title, credits, syllabus, and prerequisites; (b) course offerings, including course number, year, semester, section number, instructor(s), timings, and classroom; (c) students, including student-id, name, and program; and (d) instructors, including identiﬁcation number, name, department, and title. Further, the enrollment of students in courses and grades awarded to students in each course they are enrolled for must be appropriately modeled.

Construct an E-R diagram for the registrar’s ofﬁce. Document all assumptions that you make about the mapping constraints.

Consider a database used to record the marks that students get in different ex- ams of different course offerings.

a. Construct an E-R diagram that models exams as entities, and uses a ternary relationship, for the above database.

b. Construct an alternative E-R diagram that uses only a binary relationship between students and course-offerings. Make sure that only one relationship exists between a particular student and course-offering pair, yet you can represent the marks that a student gets in different exams of a course offering.

Construct appropriate tables for each of the E-R diagrams in Exercises 2.2 to 2.4.

Design an E-R diagram for keeping track of the exploits of your favourite sports team. You should store the matches played, the scores in each match, the players in each match and individual player statistics for each match. Summary statistics should be modeled as derived attributes Extend the E-R diagram of the previous question to track the same information for all teams in a league.

Explain the difference between a weak and a strong entity set.

We can convert any weak entity set to a strong entity set by simply adding appropriate attributes. Why, then, do we have weak entity sets?

Deﬁne the concept of aggregation. Give two examples of where this concept is useful.

Consider the E-R diagram in Figure 2.29, which models an online bookstore.

a. List the entity sets and their primary keys.

b. Suppose the bookstore adds music cassettes and compact disks to its collection. The same music item may be present in cassette or compact disk format, with differing prices. Extend the E-R diagram to model this addition, ignoring the effect on shopping baskets.

c. Now extend the E-R diagram, using generalization, to model the case where

a shopping basket may contain any combination of books, music cassettes, or compact disks.

Consider an E-R diagram in which the same entity set appears several times.

Why is allowing this redundancy a bad practice that one should avoid whenever possible?

Consider a university database for the scheduling of classrooms for ﬁnal exams.

This database could be modeled as the single entity set exam, with attributes course-name, section-number, room-number, and time. Alternatively, one or more additional entity sets could be deﬁned, along with relationship sets to replace some of the attributes of the exam entity set, as

• course with attributes name, department, and c-number

• section with attributes s-number and enrollment, and dependent as a weak

entity set on course

• room with attributes r-number, capacity, and building

a. Show an E-R diagram illustrating the use of all three additional entity sets listed.

b. Explain what application characteristics would inﬂuence a decision to include or not to include each of the additional entity sets.

When designing an E-R diagram for a particular enterprise, you have several alternatives from which to choose.

a. What criteria should you consider in making the appropriate choice?

b. Design three alternative E-R diagrams to represent the university registrar’s ofﬁce of Exercise 2.4. List the merits of each. Argue in favor of one of the alternatives.

An E-R diagram can be viewed as a graph. What do the following mean in terms of the structure of an enterprise schema?

a. The graph is disconnected.

b. The graph is acyclic.

In Section 2.4.3, we represented a ternary relationship (Figure 2.30a) using bi- nary relationships, as shown in Figure 2.30b. Consider the alternative shown in

Figure 2.30c. Discuss the relative merits of these two alternative representations of a ternary relationship by binary relationships.

Consider the representation of a ternary relationship using binary relationships as described in Section 2.4.3 (shown in Figure 2.30b.)

a. Show a simple instance of E, A, B, C, RA, RB , and RC that cannot correspond to any instance of A, B, C, and R.

b. Modify the E-R diagram of Figure 2.30b to introduce constraints that will guarantee that any instance of E, A, B, C, RA, RB , and RC that satisﬁes the constraints will correspond to an instance of A, B, C, and R.

c. Modify the translation above to handle total participation constraints on the ternary relationship.

d. The above representation requires that we create a primary key attribute for E. Show how to treat E as a weak entity set so that a primary key attribute is not required.

A weak entity set can always be made into a strong entity set by adding to its attributes the primary key attributes of its identifying entity set. Outline what sort of redundancy will result if we do so.

Design a generalization – specialization hierarchy for a motor-vehicle sales company. The company sells motorcycles, passenger cars, vans, and buses. Justify your placement of attributes at each level of the hierarchy. Explain why they should not be placed at a higher or lower level.

Explain the distinction between condition-deﬁned and user-deﬁned constraints.

Which of these constraints can the system check automatically? Explain your answer.

Explain the distinction between disjoint and overlapping constraints.

Explain the distinction between total and partial constraints.

Figure 2.31 shows a lattice structure of generalization and specialization. For entity sets A, B, and C, explain how attributes are inherited from the higher- level entity sets X and Y . Discuss how to handlea case where an attribute of X has the same name as some attribute of Y .

Draw the UML equivalents of the E-R diagrams of Figures 2.9c, 2.10, 2.12, 2.13 and 2.17.

Consider two separate banks that decide to merge. Assume that both banks use exactly the same E-R database schema — the one in Figure 2.22. (This assumption is, of course, highly unrealistic; we consider the more realistic case in Section 19.8.) If the merged bank is to have a single database, there are several potential problems:

• The possibility that the two original banks have branches with the same name

• The possibility that some customers are customers of both original banks

• The possibility that some loan or account numbers were used at both original banks (for different loans or accounts, of course)

For each of these potential problems, describe why there is indeed a potential for difﬁculties. Propose a solution to the problem. For your solution, explain any changes that would have to be made and describe what their effect would be on the schema and the data.

Reconsider the situation described for Exercise 2.26 under the assumption that one bank is in the United States and the other is in Canada. As before, the banks use the schema of Figure 2.22, except that the Canadian bank uses the social-insurance number assigned by the Canadian government, whereas the U.S. bank uses the social-security number to identify customers. What problems (be-

yond those identiﬁed in Exercise 2.24) might occur in this multinational case? How would you resolve them? Be sure to consider both the scheme and the actual data values in constructing your answer.

Bibliographical Notes

The E-R data model was introduced by Chen [1976]. A logical design methodology for relational databases using the extended E-R model is presented by Teorey et al. [1986]. Mapping from extended E-R models to the relational model is discussed by Lyngbaek and Vianu [1987] and Markowitz and Shoshani [1992]. Various data-manipulation languages for the E-R model have been proposed: GERM (Benneworth et al. [1981]), GORDAS (Elmasri and Wiederhold [1981]), and ERROL (Markowitz and Raz [1983]). A graphical query language for the E-R database was proposed by Zhang and Mendelzon [1983] and Elmasri and Larson [1985].

Smith and Smith [1977] introduced the concepts of generalization, specialization, and aggregation and Hammer and McLeod [1980] expanded them. Lenzerini and Santucci [1983] used the concepts in deﬁning cardinality constraints in the E-R model.

Thalheim [2000] provides a detailed textbook coverage of research in E-R modeling. Basic textbook discussions are offered by Batini et al. [1992] and Elmasri and Navathe [2000]. Davis et al. [1983] provide a collection of papers on the E-R model.

Tools

Many database systems provide tools for database design that support E-R diagrams. These tools help a designer create E-R diagrams, and they can automatically create corresponding tables in a database. See bibliographic notes of Chapter 1 for references to database system vendor’s Web sites. There are also some database- independent data modeling tools that support E-R diagrams and UML class diagrams. These include Rational Rose (www.rational.com/products/rose), Visio Enterprise (see www.visio.com), and ERwin (search for ERwin at the site www.cai.com/products).

Search This Blog

Database Management System course