Object-Relational Databases:Functions and Procedures

Functions and Procedures

SQL:1999 allows the definition of functions, procedures, and methods. These can be defined either by the procedural component of SQL:1999, or by an external programming language such as Java, C, or C++. We look at definitions in SQL:1999 first, and then see how to use definitions in external languages. Several database systems sup- port their own procedural languages, such as PL/SQL in Oracle and Transact SQL in Microsoft SQL Server. These resemble the procedural part of SQL:1999, but there are differences in syntax and semantics; see the respective system manuals for further details.

SQL Functions and Procedures

Suppose that we want a function that, given the title of a book, returns the count of the number of authors, using the 4NF schema. We can define the function this way:

image

Functions are particularly useful with specialized data types such as images and geometric objects. For instance, a polygon data type used in a map database may have an associated function that checks if two polygons overlap, and an image data type may have associated functions to compare two images for similarity. Functions may be written in an external language such as C, as we see in Section 9.6.2. Some database systems also support functions that return relations, that is, multisets of tuples, although such functions are not supported by SQL:1999.

Methods, which we saw in Section 9.2.2, can be viewed as functions associated with structured types. They have an implicit first parameter called self, which is set to the structured type value on which the method is invoked. Thus, the body of the method can refer to an attribute a of the value by using self.a. These attributes can also be updated by the method.

SQL:1999 also supports procedures. The author-count function could instead be written as a procedure:

image

SQL:1999 permits more than one procedure of the same name, so long as the number of arguments of the procedures with the same name is different. The name, along with the number of arguments, is used to identify the procedure. SQL:1999 also permits more than one function with the same name, so long as the different functions with the same name either have different numbers of arguments, or for functions with the same number of arguments, differ in the type of at least one argument.

External Language Routines

SQL:1999 allows us to define functions in a programming language such as C or C++. Functions defined in this fashion can be more efficient than functions defined in SQL, and computations that cannot be carried out in SQL can be executed by these functions. An example of the use of such functions would be to perform a complex arith- metic computation on the data in a tuple.

External procedures and functions can be specified in this way:

image

The external language procedures need to deal with null values and exceptions. They must therefore have several extra parameters: an sqlstate value to indicate failure/success status, a parameter to store the return value of the function, and indicator variables for each parameter/function result to indicate if the value is null. An extra line parameter style general added to the declaration above indicates that the external procedures/functions take only the arguments shown and do not deal with null values or exceptions.

Functions defined in a programming language and compiled outside the database system may be loaded and executed with the database system code. However, doing so carries the risk that a bug in the program can corrupt the database internal structures, and can bypass the access-control functionality of the database system. Database systems that are concerned more about efficient performance than about security may execute procedures in such a fashion.

Database systems that are concerned about security would typically execute such code as part of a separate process, communicate the parameter values to it, and fetch results back, via interprocess communication.

If the code is written in a language such as Java, there is a third possibility: executing the code in a “sandbox” within the database process itself. The sandbox prevents the Java code from carrying out any reads or updates directly on the database.

Procedural Constructs

SQL:1999 supports a variety of procedural constructs, which gives it almost all the power of a general purpose programming language. The part of the SQL:1999 standard that deals with these constructs is called the Persistent Storage Module (PSM).

A compound statement is of the form begin ... end, and it may contain multiple SQL statements between the begin and the end. Local variables can be declared within a compound statement, as we have seen in Section 9.6.1. SQL:1999 supports while statements and repeat statements by this syntax:

image

This code does not do anything useful; it is simply meant to show the syntax of while and repeat loops. We will see more meaningful uses later.

There is also a for loop, which permits iteration over all results of a query:

image

The program implicitly opens a cursor when the for loop begins execution and uses it to fetch the values one row at a time into the for loop variable (r, in the above example). It is possible to give a name to the cursor, by inserting the text cn cursor for just after the keyword as, where cn is the name we wish to give to the cursor. The cursor name can be used to perform update/delete operations on the tuple being pointed to by the cursor. The statement leave can be used to exit the loop, while iterate starts on the next tuple, from the beginning of the loop, skipping the remaining statements.

The conditional statements supported by SQL:1999 include if-then-else statements statements by using this syntax:

image

This code assumes that l, m, and h are integer variables, and r is a row variable. If we replace the line “set n = n+ r.balance” in the for loop of the preceding paragraph by the if-then-else code, the loop would compute the total balances of accounts that fall under the low, medium, and high balance categories respectively.

SQL:1999 also supports a case statement similar to the C/C++ language case statement (in addition to case expressions, which we saw in Chapter 4).

Finally, SQL:1999 includes the concept of signaling exception conditions, and declaring handlers that can handle the exception, as in this code:

image

The statements between the begin and the end can raise an exception by executing signal out-of-stock. The handler says that if the condition arises, the action to be taken is to exit the enclosing begin end statement. Alternative actions would be continue, which continues execution from the next statement following the one that raised the exception. In addition to explicitly defined conditions, there are also predefined conditions such as sqlexception, sqlwarning, and not found.

Figure 9.5 provides a larger example of the use of SQL:1999 procedural constructs. The procedure findEmpl computes the set of all direct and indirect employees of a given manager (specified by the parameter mgr), and stores the resulting employee names in a relation called empl, which is assumed to exist already. The relation man- ager(empname, mgrname), specifying who works directly for which manager, is as- sumed to be available. The set of all direct/indirect employees is basically the transitive closure of the relation manager. We saw how to express such a query by recursion in Chapter 5 (Section 5.2.6).

The procedure uses two temporary tables, newemp and temp. The procedure inserts all employees who directly work for mgr into newemp before the repeat loop. The repeat loop first adds all employees in newemp to empl. Next, it computes employees who work for those in newemp, except those who have already been found to be

image

employees of mgr, and stores them in the temporary table temp. Finally, it replaces the contents of newemp by the contents of temp. The repeat loop terminates when it finds no new (indirect) employees.

We note that the use of the except clause in the procedure ensures that the procedure works even in the (abnormal) case where there is a cycle of management. For example, if a works for b, b works for c, and c works for a, there is a cycle.

While cycles may be unrealistic in management control, cycles are possible in other applications. For instance, suppose we have a relation flights(to, from) that says which cities can be reached from which other cities by a direct flight. We can modify the find Empl procedure to find all cities that are reachable by a sequence of one or more flights from a given city. All we have to do is to replace manager by flight and replace attribute names correspondingly. In this situation there can be cycles of reachability, but the procedure would work correctly since it would eliminate cities that have already been seen.

Object-Oriented versus Object-Relational

We have now studied object-oriented databases built around persistent programming languages, as well as object-relational databases, which are object-oriented data- bases built on top of the relation model. Database systems of both types are on the market, and a database designer needs to choose the kind of system that is appropriate to the needs of the application.

Persistent extensions to programming languages and object-relational systems target different markets. The declarative nature and limited power (compared to a programming language) of the SQL language provides good protection of data from programming errors, and makes high-level optimizations, such as reducing I/O, relatively easy. (We cover optimization of relational expressions in Chapter 13.) Object-relational systems aim at making data modeling and querying easier by using complex data types. Typical applications include storage and querying of complex data, including multimedia data.

A declarative language such as SQL, however, imposes a significant performance penalty for certain kinds of applications that run primarily in main memory, and that perform a large number of accesses to the database. Persistent programming languages target such applications that have high performance requirements. They provide low-overhead access to persistent data, and eliminate the need for data translation if the data are to be manipulated by a programming language. However, they are more susceptible to data corruption by programming errors, and usually do not have a powerful querying capability. Typical applications include CAD databases.

We can summarize the strengths of the various kinds of database systems in this way:

Relational systems: simple data types, powerful query languages, high protection

Persistent-programming-language – based OODBs: complex data types, integration with programming language, high performance

Object-relational systems: complex data types, powerful query languages, high protection

These descriptions hold in general, but keep in mind that some database systems blur the boundaries. For example, some object-oriented database systems built around a persistent programming language are implemented on top of a relational database system. Such systems may provide lower performance than object-oriented database systems built directly on a storage system, but provide some of the stronger protection guarantees of relational systems.

Many object-relational database systems are built on top of existing relational database systems. To do so, the complex data types supported by object-relational systems need to be translated to the simpler type system of relational databases.

To understand how the translation is done, we need only look at how some features of the E-R model are translated into relations. For instance, multivalued at- tributes in the E-R model correspond to set-valued attributes in the object-relational model. Composite attributes roughly correspond to structured types. ISA hierarchies in the E-R model correspond to table inheritance in the object-relational model. The techniques for converting E-R model features to tables, which we saw in Section 2.9, can be used, with some extensions, to translate object-relational data to relational data.

Comments

Popular posts from this blog

XML Document Schema

Extended Relational-Algebra Operations.

Distributed Databases:Concurrency Control in Distributed Databases