Entity-Relationship Model:Basic Concepts
Entity-Relationship Model
The entity-relationship (E-R) data model perceives the real world as consisting of basic objects, called entities, and relationships among these objects. It was developed to facilitate database design by allowing specification of an enterprise schema, which represents the overall logical structure of a database. The E-R data model is one of several semantic data models; the semantic aspect of the model lies in its representation of the meaning of the data. The E-R model is very useful in mapping the meanings and interactions of real-world enterprises onto a conceptual schema. Because of this usefulness, many database-design tools draw on concepts from the E-R model.
Basic Concepts
The E-R data model employs three basic notions: entity sets, relationship sets, and attributes.
Entity Sets
An entity is a “thing” or “object” in the real world that is distinguishable from all other objects. For example, each person in an enterprise is an entity. An entity has a set of properties, and the values for some set of properties may uniquely identify an entity. For instance, a person may have a person-id property whose value uniquely identifies that person. Thus, the value 677-89-9011 for person-id would uniquely identify one particular person in the enterprise. Similarly, loans can be thought of as entities, and loan number L-15 at the Perryridge branch uniquely identifies a loan entity. An entity may be concrete, such as a person or a book, or it may be abstract, such as a loan, or a holiday, or a concept.
An entity set is a set of entities of the same type that share the same properties, or attributes. The set of all persons who are customers at a given bank, for example, can be defined as the entity set customer. Similarly, the entity set loan might represent the set of all loans awarded by a particular bank. The individual entities that constitute a set are said to be the extension of the entity set. Thus, all the individual bank customers are the extension of the entity set customer.
Entity sets do not need to be disjoint. For example, it is possible to define the entity set of all employees of a bank (employee) and the entity set of all customers of the bank (customer). A person entity may be an employee entity, a customer entity, both, or neither.
An entity is represented by a set of attributes. Attributes are descriptive proper- ties possessed by each member of an entity set. The designation of an attribute for an entity set expresses that the database stores similar information concerning each entity in the entity set; however, each entity may have its own value for each attribute. Possible attributes of the customer entity set are customer-id, customer-name, customer- street, and customer-city. In real life, there would be further attributes, such as street number, apartment number, state, postal code, and country, but we omit them to keep our examples simple. Possible attributes of the loan entity set are loan-number and amount.
Each entity has a value for each of its attributes. For instance, a particular customer entity may have the value 321-12-3123 for customer-id, the value Jones for customer- name, the value Main for customer-street, and the value Harrison for customer-city.
The customer-id attribute is used to uniquely identify customers, since there may be more than one customer with the same name, street, and city. In the United States, many enterprises find it convenient to use the social-security number of a person1 as an attribute whose value uniquely identifies the person. In general the enterprise would have to create and assign a unique identifier for each customer.
For each attribute, there is a set of permitted values, called the domain, or value set, of that attribute. The domain of attribute customer-name might be the set of all text strings of a certain length. Similarly, the domain of attribute loan-number might be the set of all strings of the form “L-n” where n is a positive integer.
A database thus includes a collection of entity sets, each of which contains any number of entities of the same type. Figure 2.1 shows part of a bank database that consists of two entity sets: customer and loan.
Formally, an attribute of an entity set is a function that maps from the entity set into a domain. Since an entity set may have several attributes, each entity can be described by a set of (attribute, data value) pairs, one pair for each attribute of the entity set. For example, a particular customer entity may be described by the set {(customer-id, 677- 89-9011), (customer-name, Hayes), (customer-street, Main), (customer-city, Harrison)},
meaning that the entity describes a person named Hayes whose customer identifier is 677-89-9011 and who resides at Main Street in Harrison. We can see, at this point, an integration of the abstract schema with the actual enterprise being modeled. The attribute values describing an entity will constitute a significant portion of the data stored in the database.
An attribute, as used in the E-R model, can be characterized by the following attribute types.
• Simple and composite attributes. In our examples thus far, the attributes have been simple; that is, they are not divided into subparts. Composite attributes, on the other hand, can be divided into subparts (that is, other attributes). For example, an attribute name could be structured as a composite attribute consisting of first-name, middle-initial, and last-name. Using composite attributes in a design schema is a good choice if a user will wish to refer to an entire attribute on some occasions, and to only a component of the attribute on other occasions. Suppose we were to substitute for the customer entity-set attributes customer-street and customer-city the composite attribute address with the attributes street, city, state, and zip-code.2 Composite attributes help us to group together related attributes, making the modeling cleaner.
Note also that a composite attribute may appear as a hierarchy. In the composite attribute address, its component attribute street can be further divided into street-number, street-name, and apartment-number. Figure 2.2 depicts these examples of composite attributes for the customer entity set.
• Single-valued and multivalued attributes. The attributes in our examples all have a single value for a particular entity. For instance, the loan-number at- tribute for a specific loan entity refers to only one loan number. Such attributes are said to be single valued. There may be instances where an attribute has a set of values for a specific entity. Consider an employee entity set with the attribute phone-number. An employee may have zero, one, or several phone numbers, and different employees may have different numbers of phones. This type of attribute is said to be multivalued. As another example, an attribute
dependent-name of the employee entity set would be multivalued, since any particular employee may have zero, one, or more dependent(s).
Where appropriate, upper and lower bounds may be placed on the number of values in a multivalued attribute. For example, a bank may limit the number of phone numbers recorded for a single customer to two. Placing bounds in this case expresses that the phone-number attribute of the customer entity set may have between zero and two values.
• Derived attribute. The value for this type of attribute can be derived from the values of other related attributes or entities. For instance, let us say that the customer entity set has an attribute loans-held, which represents how many loans a customer has from the bank. We can derive the value for this attribute by counting the number of loan entities associated with that customer.
As another example, suppose that the customer entity set has an attribute age, which indicates the customer’s age. If the customer entity set also has an attribute date-of-birth, we can calculate age from date-of-birth and the current date. Thus, age is a derived attribute. In this case, date-of-birth may be referred to as a base attribute, or a stored attribute. The value of a derived attribute is not stored, but is computed when required.
An attribute takes a null value when an entity does not have a value for it. The null value may indicate “not applicable” — that is, that the value does not exist for the entity. For example, one may have no middle name. Null can also designate that an attribute value is unknown. An unknown value may be either missing (the value does exist, but we do not have that information) or not known (we do not know whether or not the value actually exists).
For instance, if the name value for a particular customer is null, we assume that the value is missing, since every customer must have a name. A null value for the apartment-number attribute could mean that the address does not include an apartment number (not applicable), that an apartment number exists but we do not know what it is (missing), or that we do not know whether or not an apartment number is part of the customer’s address (unknown).
A database for a banking enterprise may include a number of different entity sets. For example, in addition to keeping track of customers and loans, the bank also provides accounts, which are represented by the entity set account with attributes account-number and balance. Also, if the bank has a number of different branches, then we may keep information about all the branches of the bank. Each branch entity set may be described by the attributes branch-name, branch-city, and assets.
Relationship Sets
A relationship is an association among several entities. For example, we can define a relationship that associates customer Hayes with loan L-15. This relationship specifies that Hayes is a customer with loan number L-15.
A relationship set is a set of relationships of the same type. Formally, it is a mathematical relation on n ≥ 2 (possibly nondistinct) entity sets. If E1, E2,... , En are entity sets, then a relationship set R is a subset of {(e1, e2,..., en) | e1 ∈ E1, e2 ∈ E2,... , en ∈ En} where (e1, e2,... , en) is a relationship.
Consider the two entity sets customer and loan in Figure 2.1. We define the relationship set borrower to denote the association between customers and the bank loans that the customers have. Figure 2.3 depicts this association.
As another example, consider the two entity sets loan and branch. We can define the relationship set loan-branch to denote the association between a bank loan and the branch in which that loan is maintained.
The association between entity sets is referred to as participation; that is, the entity sets E1, E2,... , En participate in relationship set R. A relationship instance in an E-R schema represents an association between the named entities in the real-world enterprise that is being modeled. As an illustration, the individual customer entity Hayes, who has customer identifier 677-89-9011, and the loan entity L-15 participate in a relationship instance of borrower. This relationship instance represents that, in the real-world enterprise, the person called Hayes who holds customer-id 677-89-9011 has taken the loan that is numbered L-15.
The function that an entity plays in a relationship is called that entity’s role. Since entity sets participating in a relationship set are generally distinct, roles are implicit and are not usually specified. However, they are useful when the meaning of a relationship needs clarification. Such is the case when the entity sets of a relationship set are not distinct; that is, the same entity set participates in a relationship set more than once, in different roles. In this type of relationship set, sometimes called a re- cursive relationship set, explicit role names are necessary to specify how an entity participates in a relationship instance. For example, consider an entity set employee that records information about all the employees of the bank. We may have a relationship set works-for that is modeled by ordered pairs of employee entities. The first employee of a pair takes the role of worker, whereas the second takes the role of man- ager. In this way, all relationships of works-for are characterized by (worker, manager) pairs; (manager, worker) pairs are excluded.
A relationship may also have attributes called descriptive attributes. Consider a relationship set depositor with entity sets customer and account. We could associate the attribute access-date to that relationship to specify the most recent date on which a customer accessed an account. The depositor relationship among the entities corresponding to customer Jones and account A-217 has the value “23 May 2001” for at- tribute access-date, which means that the most recent date that Jones accessed account A-217 was 23 May 2001.
As another example of descriptive attributes for relationships, suppose we have entity sets student and course which participate in a relationship set registered-for. We may wish to store a descriptive attribute for-credit with the relationship, to record whether a student has taken the course for credit, or is auditing (or sitting in on) the course.
A relationship instance in a given relationship set must be uniquely identifiable from its participating entities, without using the descriptive attributes. To understand this point, suppose we want to model all the dates when a customer accessed an account. The single-valued attribute access-date can store a single access date only . We cannot represent multiple access dates by multiple relationship instances between the same customer and account, since the relationship instances would not be uniquely identifiable using only the participating entities. The right way to handle this case is to create a multivalued attribute access-dates, which can store all the access dates.
However, there can be more than one relationship set involving the same entity sets. In our example the customer and loan entity sets participate in the relationship set borrower. Additionally, suppose each loan must have another customer who serves as a guarantor for the loan. Then the customer and loan entity sets may participate in another relationship set, guarantor.
The relationship sets borrower and loan-branch provide an example of a binary relationship set — that is, one that involves two entity sets. Most of the relationship sets in a database system are binary. Occasionally, however, relationship sets involve more than two entity sets.
As an example, consider the entity sets employee, branch, and job. Examples of job entities could include manager, teller, auditor, and so on. Job entities may have the attributes title and level. The relationship set works-on among employee, branch, and job is an example of a ternary relationship. A ternary relationship among Jones, Perryridge, and manager indicates that Jones acts as a manager at the Perryridge branch. Jones could also act as auditor at the Downtown branch, which would be represented by another relationship. Yet another relationship could be between Smith, Downtown, and teller, indicating Smith acts as a teller at the Downtown branch.
The number of entity sets that participate in a relationship set is also the degree of the relationship set. A binary relationship set is of degree 2; a ternary relationship set is of degree 3.
Comments
Post a Comment