Database Systems versus File Systems

Consider part of a savings-bank enterprise that keeps information about all customers and savings accounts. One way to keep the information on a computer is to store it in operating system ﬁles. To allow users to manipulate the information, the system has a number of application programs that manipulate the ﬁles, including

• A program to debit or credit an account

• A program to add a new account

• A program to ﬁnd the balance of an account

• A program to generate monthly statements

System programmers wrote these application programs to meet the needs of the bank.

New application programs are added to the system as the need arises. For example, suppose that the savings bank decides to offer checking accounts. As a result, the bank creates new permanent ﬁles that contain information about all the checking accounts maintained in the bank, and it may have to write new application programs to deal with situations that do not arise in savings accounts, such as overdrafts. Thus, as time goes by, the system acquires more ﬁles and more application programs.

This typical ﬁle-processing system is supported by a conventional operating sys- tem. The system stores permanent records in various ﬁles, and it needs different application programs to extract records from, and add records to, the appropriate ﬁles. Before database management systems (DBMSs) came along, organizations usu- ally stored information in such systems.

Keeping organizational information in a ﬁle-processing system has a number of major disadvantages:

• Data redundancy and inconsistency. Since different programmers create the ﬁles and application programs over a long period, the various ﬁles are likely to have different formats and the programs may be written in several programming languages. Moreover, the same information may be duplicated in several places (ﬁles). For example, the address and telephone number of a particular customer may appear in a ﬁle that consists of savings-account records and in a ﬁle that consists of checking-account records. This redundancy leads to higher storage and access cost. In addition, it may lead to data inconsistency; that is, the various copies of the same data may no longer agree. For example, a changed customer address may be reﬂected in savings-account records but not elsewhere in the system.

• Difﬁculty in accessing data. Suppose that one of the bank ofﬁcers needs to ﬁnd out the names of all customers who live within a particular postal-code area. The ofﬁcer asks the data-processing department to generate such a list. Because the designers of the original system did not anticipate this request, there is no application program on hand to meet it. There is, however, an application program to generate the list of all customers. The bank ofﬁcer has now two choices: either obtain the list of all customers and extract the needed information manually or ask a system programmer to write the necessary application program. Both alternatives are obviously unsatisfactory. Suppose that such a program is written, and that, several days later, the same ofﬁcer needs to trim that list to include only those customers who have an account balance of $10,000 or more. As expected, a program to generate such a list does not exist. Again, the ofﬁcer has the preceding two options, neither of which is satisfactory.

The point here is that conventional ﬁle-processing environments do not al- low needed data to be retrieved in a convenient and efﬁcient manner. More

responsive data-retrieval systems are required for general use.

• Data isolation. Because data are scattered in various ﬁles, and ﬁles may be in different formats, writing new application programs to retrieve the appropriate data is difﬁcult.

• Integrity problems. The data values stored in the database must satisfy certain types of consistency constraints. For example, the balance of a bank ac- count may never fall below a prescribed amount (say, $25). Developers enforce these constraints in the system by adding appropriate code in the various application programs. However, when new constraints are added, it is difﬁcult to change the programs to enforce them. The problem is compounded when constraints involve several data items from different ﬁles.

• Atomicity problems. A computer system, like any other mechanical or electrical device, is subject to failure. In many applications, it is crucial that, if a failure occurs, the data be restored to the consistent state that existed prior to the failure. Consider a program to transfer $50 from account A to account B. If a system failure occurs during the execution of the program, it is possible that the $50 was removed from account A but was not credited to account B, resulting in an inconsistent database state. Clearly, it is essential to database consistency that either both the credit and debit occur, or that neither occur. That is, the funds transfer must be atomic — it must happen in its entirety or not at all. It is difﬁcult to ensure atomicity in a conventional ﬁle-processing system.

• Concurrent-access anomalies. For the sake of overall performance of the sys- tem and faster response, many systems allow multiple users to update the data simultaneously. In such an environment, interaction of concurrent up- dates may result in inconsistent data. Consider bank account A, containing $500. If two customers withdraw funds (say $50 and $100 respectively) from account A at about the same time, the result of the concurrent executions may leave the account in an incorrect (or inconsistent) state. Suppose that the programs executing on behalf of each withdrawal read the old balance, reduce that value by the amount being withdrawn, and write the result back. If the two programs run concurrently, they may both read the value $500, and write back $450 and $400, respectively. Depending on which one writes the value last, the account may contain either $450 or $400, rather than the correct value of $350. To guard against this possibility, the system must maintain some form of supervision. But supervision is difﬁcult to provide because data may be accessed by many different application programs that have not been coordinated previously.

• Security problems. Not every user of the database system should be able to access all the data. For example, in a banking system, payroll personnel need to see only that part of the database that has information about the various bank employees. They do not need access to information about customer ac- counts. But, since application programs are added to the system in an ad hoc manner, enforcing such security constraints is difﬁcult.

These difﬁculties, among others, prompted the development of database systems. In what follows, we shall see the concepts and algorithms that enable database systems to solve the problems with ﬁle-processing systems. In most of this book, we use a bank enterprise as a running example of a typical data-processing application found in a corporation.

Search This Blog

Database Management System course

Database Systems versus File Systems

Database Systems versus File Systems

Comments

Post a Comment

Popular posts from this blog

Advanced Transaction Processing:Transaction Management in Multidatabases

Database Management Systems:Architecture of DBMS and Data Independence.

Drawing Graphs:Visibility Drawing