Concurrency Control:Log-Based Recovery.

Log-Based Recovery

The most widely used structure for recording database modifications is the log. The log is a sequence of log records, recording all the update activities in the database. There are several types of log records. An update log record describes a single data- base write. It has these fields:

Transaction identifier is the unique identifier of the transaction that performed the write operation.

Data-item identifier is the unique identifier of the data item written. Typically, it is the location on disk of the data item.

Old value is the value of the data item prior to the write.

New value is the value that the data item will have after the write.

Other special log records exist to record significant events during transaction processing, such as the start of a transaction and the commit or abort of a transaction. We denote the various types of log records as:

image

Whenever a transaction performs a write, it is essential that the log record for that write be created before the database is modified. Once a log record exists, we can output the modification to the database if that is desirable. Also, we have the ability to undo a modification that has already been output to the database. We undo it by using the old-value field in log records.

For log records to be useful for recovery from system and disk failures, the log must reside in stable storage. For now, we assume that every log record is written to the end of the log on stable storage as soon as it is created. In Section 17.7, we shall see when it is safe to relax this requirement so as to reduce the overhead imposed by logging. In Sections 17.4.1 and 17.4.2, we shall introduce two techniques for using the log to ensure transaction atomicity despite failures. Observe that the log contains a complete record of all database activity. As a result, the volume of data stored in the log may become unreasonably large. In Section 17.4.3, we shall show when it is safe to erase log information.

Deferred Database Modification

The deferred-modification technique ensures transaction atomicity by recording all database modifications in the log, but deferring the execution of all write operations of a transaction until the transaction partially commits. Recall that a transaction is said to be partially committed once the final action of the transaction has been executed. The version of the deferred-modification technique that we describe in this section assumes that transactions are executed serially.

When a transaction partially commits, the information on the log associated with the transaction is used in executing the deferred writes. If the system crashes before the transaction completes its execution, or if the transaction aborts, then the information on the log is simply ignored.

The execution of transaction Ti proceeds as follows. Before Ti starts its execution, a record <Ti start> is written to the log. A write(X) operation by Ti results in the writing of a new record to the log. Finally, when Ti partially commits, a record <Ti commit> is written to the log.

When transaction Ti partially commits, the records associated with it in the log are used in executing the deferred writes. Since a failure may occur while this updating is taking place, we must ensure that, before the start of these updates, all the log records are written out to stable storage. Once they have been written, the actual updating takes place, and the transaction enters the committed state.

Observe that only the new value of the data item is required by the deferred-modification technique. Thus, we can simplify the general update-log record structure that we saw in the previous section, by omitting the old-value field.

To illustrate, reconsider our simplified banking system. Let T0 be a transaction that transfers $50 from account A to account B:

image

Suppose that these transactions are executed serially, in the order T0 followed by T1, and that the values of accounts A, B, and C before the execution took place were $1000, $2000, and $700, respectively. The portion of the log containing the relevant information on these two transactions appears in Figure 17.2.

There are various orders in which the actual outputs can take place to both the database system and the log as a result of the execution of T0 and T1. One such order

image

appears in Figure 17.3. Note that the value of A is changed in the database only after the record <T0, A, 950> has been placed in the log.

Using the log, the system can handle any failure that results in the loss of information on volatile storage. The recovery scheme uses the following recovery procedure:

• redo(Ti) sets the value of all data items updated by transaction Ti to the new values.

The set of data items updated by Ti and their respective new values can be found in the log.

The redo operation must be idempotent; that is, executing it several times must be equivalent to executing it once. This characteristic is required if we are to guarantee correct behavior even if a failure occurs during the recovery process.

After a failure, the recovery subsystem consults the log to determine which trans- actions need to be redone. Transaction Ti needs to be redone if and only if the log contains both the record <Ti start> and the record <Ti commit>. Thus, if the system crashes after the transaction completes its execution, the recovery scheme uses the information in the log to restore the system to a previous consistent state after the transaction had completed.

As an illustration, let us return to our banking example with transactions T0 and T1 executed one after the other in the order T0 followed by T1. Figure 17.2 shows the log that results from the complete execution of T0 and T1. Let us suppose that the

image

image

system crashes before the completion of the transactions, so that we can see how the recovery technique restores the database to a consistent state. Assume that the crash occurs just after the log record for the step

write(B)

of transaction T0 has been written to stable storage. The log at the time of the crash appears in Figure 17.4a. When the system comes back up, no redo actions need to be taken, since no commit record appears in the log. The values of accounts A and B remain $1000 and $2000, respectively. The log records of the incomplete transaction T0 can be deleted from the log.

Now, let us assume the crash comes just after the log record for the step

write(C)

of transaction T1 has been written to stable storage. In this case, the log at the time of the crash is as in Figure 17.4b. When the system comes back up, the operation redo(T0) is performed, since the record

<T0 commit>

appears in the log on the disk. After this operation is executed, the values of accounts A and B are $950 and $2050, respectively. The value of account C remains $700. As before, the log records of the incomplete transaction T1 can be deleted from the log.

Finally, assume that a crash occurs just after the log record

<T1 commit>

is written to stable storage. The log at the time of this crash is as in Figure 17.4c. When the system comes back up, two commit records are in the log: one for T0 and one for T1. Therefore, the system must perform operations redo(T0) and redo(T1), in the order in which their commit records appear in the log. After the system executes these operations, the values of accounts A, B, and C are $950, $2050, and $600, respectively.

Finally, let us consider a case in which a second system crash occurs during recovery from the first crash. Some changes may have been made to the database as a result of the redo operations, but all changes may not have been made. When the sys- tem comes up after the second crash, recovery proceeds exactly as in the preceding examples. For each commit record

<Ti commit>

found in the log, the the system performs the operation redo(Ti). In other words, it restarts the recovery actions from the beginning. Since redo writes values to the database independent of the values currently in the database, the result of a successful second attempt at redo is the same as though redo had succeeded the first time.

Immediate Database Modification

The immediate-modification technique allows database modifications to be output to the database while the transaction is still in the active state. Data modifications written by active transactions are called uncommitted modifications. In the event of a crash or a transaction failure, the system must use the old-value field of the log records described in Section 17.4 to restore the modified data items to the value they had prior to the start of the transaction. The undo operation, described next, accomplishes this restoration.

Before a transaction Ti starts its execution, the system writes the record <Ti start> to the log. During its execution, any write(X) operation by Ti is preceded by the writing of the appropriate new update record to the log. When Ti partially commits, the system writes the record <Ti commit> to the log.

Since the information in the log is used in reconstructing the state of the database, we cannot allow the actual update to the database to take place before the corresponding log record is written out to stable storage. We therefore require that, before execution of an output(B) operation, the log records corresponding to B be written onto stable storage. We shall return to this issue in Section 17.7.

As an illustration, let us reconsider our simplified banking system, with transactions T0 and T1 executed one after the other in the order T0 followed by T1. The portion of the log containing the relevant information concerning these two transactions appears in Figure 17.5.

Figure 17.6 shows one possible order in which the actual outputs took place in both the database system and the log as a result of the execution of T0 and T1. Notice that

image

image

this order could not be obtained in the deferred-modification technique of Section 17.4.1.

Using the log, the system can handle any failure that does not result in the loss of information in nonvolatile storage. The recovery scheme uses two recovery procedures:

• undo(Ti) restores the value of all data items updated by transaction Ti to the old values.

• redo(Ti) sets the value of all data items updated by transaction Ti to the new values.

The set of data items updated by Ti and their respective old and new values can be found in the log.

The undo and redo operations must be idempotent to guarantee correct behavior even if a failure occurs during the recovery process.

After a failure has occurred, the recovery scheme consults the log to determine which transactions need to be redone, and which need to be undone:

• Transaction Ti needs to be undone if the log contains the record <Ti start>, but does not contain the record <Ti commit>.

• Transaction Ti needs to be redone if the log contains both the record <Ti start> and the record <Ti commit>.

As an illustration, return to our banking example, with transaction T0 and T1 executed one after the other in the order T0 followed by T1. Suppose that the system crashes before the completion of the transactions. We shall consider three cases. The state of the logs for each of these cases appears in Figure 17.7.

First, let us assume that the crash occurs just after the log record for the step

write(B)

image

of transaction T0 has been written to stable storage (Figure 17.7a). When the system comes back up, it finds the record <T0 start> in the log, but no corresponding <T0 commit> record. Thus, transaction T0 must be undone, so an undo(T0) is performed. As a result, the values in accounts A and B (on the disk) are restored to $1000 and $2000, respectively.

Next, let us assume that the crash comes just after the log record for the step

write(C)

of transaction T1 has been written to stable storage (Figure 17.7b). When the system comes back up, two recovery actions need to be taken. The operation undo(T1) must be performed, since the record <T1 start> appears in the log, but there is no record <T1 commit>. The operation redo(T0) must be performed, since the log contains both the record <T0 start> and the record <T0 commit>. At the end of the entire recovery procedure, the values of accounts A, B, and C are $950, $2050, and $700, respectively. Note that the undo(T1) operation is performed before the redo(T0). In this example, the same outcome would result if the order were reversed. However, the order of doing undo operations first, and then redo operations, is important for the recovery algorithm that we shall see in Section 17.6.

Finally, let us assume that the crash occurs just after the log record

<T1 commit>

has been written to stable storage (Figure 17.7c). When the system comes back up, both T0 and T1 need to be redone, since the records <T0 start> and <T0 commit> appear in the log, as do the records <T1 start> and <T1 commit>. After the system performs the recovery procedures redo(T0) and redo(T1), the values in accounts A, B, and C are $950, $2050, and $600, respectively.

Checkpoints

When a system failure occurs, we must consult the log to determine those transactions that need to be redone and those that need to be undone. In principle, we need to search the entire log to determine this information. There are two major difficulties with this approach:

1. The search process is time consuming.

2. Most of the transactions that, according to our algorithm, need to be redone have already written their updates into the database. Although redoing them will cause no harm, it will nevertheless cause recovery to take longer.

To reduce these types of overhead, we introduce checkpoints. During execution, the system maintains the log, using one of the two techniques described in Sections 17.4.1 and 17.4.2. In addition, the system periodically performs checkpoints, which require the following sequence of actions to take place:

1. Output onto stable storage all log records currently residing in main memory.

2. Output to the disk all modified buffer blocks.

3. Output onto stable storage a log record <checkpoint>.

Transactions are not allowed to perform any update actions, such as writing to a buffer block or writing a log record, while a checkpoint is in progress.

The presence of a <checkpoint> record in the log allows the system to streamline its recovery procedure. Consider a transaction Ti that committed prior to the check- point. For such a transaction, the <Ti commit> record appears in the log before the <checkpoint> record. Any database modifications made by Ti must have been writ- ten to the database either prior to the checkpoint or as part of the checkpoint itself.

Thus, at recovery time, there is no need to perform a redo operation on Ti.

This observation allows us to refine our previous recovery schemes. (We continue to assume that transactions are run serially.) After a failure has occurred, the recovery scheme examines the log to determine the most recent transaction Ti that started executing before the most recent checkpoint took place. It can find such a transaction by searching the log backward, from the end of the log, until it finds the first <checkpoint> record (since we are searching backward, the record found is the final <checkpoint> record in the log); then it continues the search backward until it finds the next <Ti start> record. This record identifies a transaction Ti.

Once the system has identified transaction Ti, the redo and undo operations need to be applied to only transaction Ti and all transactions Tj that started executing after transaction Ti. Let us denote these transactions by the set T. The remainder (earlier part) of the log can be ignored, and can be erased whenever desired. The exact recovery operations to be performed depend on the modification technique being used. For the immediate-modification technique, the recovery operations are:

• For all transactions Tk in T that have no <Tk commit> record in the log, exe- cute undo(Tk ).

• For all transactions Tk in T such that the record <Tk commit> appears in the log, execute redo(Tk ).

Obviously, the undo operation does not need to be applied when the deferred-modification technique is being employed.

As an illustration, consider the set of transactions {T0, T1,.. ., T100} executed in the order of the subscripts. Suppose that the most recent checkpoint took place during the execution of transaction T67. Thus, only transactions T67, T68,.. ., T100 need to be considered during the recovery scheme. Each of them needs to be redone if it has committed; otherwise, it needs to be undone.

In Section 17.6.3, we consider an extension of the checkpoint technique for concurrent transaction processing.

Comments

Popular posts from this blog

XML Document Schema

Extended Relational-Algebra Operations.

Distributed Databases:Concurrency Control in Distributed Databases