Information Technology: Transaction in DBMS

Transaction

o Collection of operations that form a single logical unit of work are called transactions.
o A database system must ensure proper execution of transactions despite failures- either the entire transaction executes or none of it does.
o A transaction is a unit of program execution that accesses and possibly updates various data items.
o To ensure integrity of data, we require that the database system maintain the following properties of the transactions:

• Atomicity: Either all operations of the transaction are reflected properly in the database, or none are.
• Consistency: Execution of a transaction in isolation (i.e. with no other transaction executing concurrently) preserves the consistency of the database.
• Isolation: Even though multiple transactions may execute concurrently, the system guarantees that, for every pair of transactions Ti and Tj , it appears to Ti that either Tj finished execution before Ti started, or Tj started execution after Ti finished. Thus, each transaction is unaware of other transactions executing concurrently in the system.
• Durability: After a transaction completes successfully, the changes it has made to the database persist, even if there are system failures.

Transaction State

A transaction must be in one of the following states:

Active: the initial state; the transaction stays in this state while it is executing
Partially committed: after the final statement has been committed
Aborted: after the transaction has been rolled back and the database has been restored to its state prior to the start of the transaction
Committed: after successful completion

State diagram of a transaction

Scheduling and Serializability

• The execution sequences of transactions are called schedules. They represent the chronological order in which instructions are executed in the system.
• A schedule must preserve the order in which the instructions appear in each individual transaction.

The way in which the transactions are scheduled such that there is the same effect as if they were carried out serially, i.e. one by one is known as serializability.
In other words serializabiltiy ensures the equivalence of a schedule with the serial schedule.

Conflict Serializability
We say that a schedule S is conflict serializable, if it is conflict serializable to a serial schedule.

Lets consider a schedule S in which there are two consecutive instructions Ii and Ij, of transactions Ti and Tj. If Ii and Ij refer to different data items, then we can swap Ii and Ij without affecting the results of any instruction in the schedule. However, if Ii and Ij refer to the same data item Q, then the order of the two steps may matter. Consider the read and write operations which leads to four different cases:

1. Ii = read(Q), Ij = read(Q). The order of Ii and Ij does not matter, since the same value of Q is read by Ti and Tj regardless of the order.

2. Ii = read(Q), Ij = write(Q). If Ii comes before Ij, then Ti does not read the value of Q that is written by Tj in instruction Ij. If Ij comes before Ii, the Ti reads the value of Q that is written by Tj. Thus, the order of Ii and Ij matters.

3. Ii = write(Q), Ij = read(Q). The order of Ii and Ij matters for reasons similar to the previous case.

4. Ii = write(Q), Ij = write(Q). Since both of these are write operations, the order of these instructions does not affect either Ti or Tj. However, the value obtained by the next read(Q) instruction of S is affected, since the result of only the latter the latter of the two write instructions is preserved in the database. If there is no other write(Q) instruction after Ii and Ij in S, then the order of Ii and Ij directly affects the final value of Q in the database state that results from schedule S.

Schedule 1
T1 T2
Read(A)
Write(A)
Read(A)
Write(A)
Read(B)
Write(B)
Read(B)
Write(B)

Schedule 2 (with non-conflicting instructions
Write(A) of T2 and Read(B) of T1 swapped)
T1 T2
Read(A)
Write(A)
Read(A)
Read(B)
Write(A)
Write(B)
Read(B)
Write(B)

Schedule 3 with the following swaps of non-conflicting instructions:
Read(B) of T1 with the read(A) of T2
Write(B) of T1 with Write(A) of T2
Write(B) of T1 with read(A) of T2

T1 T2
Read(A)
Write(A)
Read(B)
Write(B)
Read(A)
Write(A)
Read(B)
Write(B)

Schedule 3 is a serial schedule. Thus, we have shown that Schedule 1 is equivalent to a serial schedule.

Consider Schedule 4
T3 T4
Read(Q)
Write(Q)
Write(Q)

Schedule 4 is not conflict serializable, since it is not equivalent to either the serial schedule or

View Serializability
View serializabiltiy is a less stringent form of equivalence than conflict serializability.

Consider schedules S and S’, where the same set of transactions participates in both schedules. The schedules S and S’ are view equivalent if three conditions meet:

1) For each data item Q, if transaction Ti reads initial value of Q in schedule S, then transaction Ti must, in schedule S’, also read the initial value of Q.

2) For each data item Q, if transaction Ti executes read(Q) in schedule S, and if that value was produced by a write(Q) operation executed by transaction Tj, then read(Q) operation of transaction Ti must, in schedule S’, also read the value of Q that was produced by the same write(Q) operation of transaction Tj.

3) For each data item Q, the transaction (if any) that performs the final write(Q) operation is schedule S must perform the final write(Q) operation in schedule S’.

Conditions 1 and 2 ensure that each transaction reads the same values in both schedules and therefore, performs the same computation. Condition 3, coupled with conditions 1 and 2, ensures that both schedules result in the same final state.

Consider Schedule 5
T3 T4 T5
Read(Q)
Write(Q)
Write(Q)
Write(Q)

Schedule 9 is view serializable as it is view equivalent to the serial schedule , since the single read(Q) instruction reads the initial value of Q in both schedules, and T6 performs the final write of Q in both schedules.

Schedule 9 is not conflict-serializable since every pair of consecutive instructions conflicts, and thus, no swapping of instruction is possible. Thus, every conflict-serializable schedule is also view-serialiable but there are view-serializable schedules that are not conflict serializable.

Concurrency Control

One of the fundamental properties of a transaction is isolation. When several transactions execute concurrently in the database, however, the isolation property may no longer be preserved. To ensure that it is preserved, the system must control the interaction among the concurrent transactions; this control is achieved through one of a variety of mechanisms called concurrency control schemes.

Locking and Lock based protocols
Among different modes is which a data item may be locked, we discuss the following two modes:

Shared: If a transaction Ti has obtained a shared-mode lock (denoted by S) on item Q, then Ti can read, but cannot write, Q.

Exclusive: If a transaction Ti has obtained an exclusive-mode lock (denoted by X) on item Q, then Ti can both read and write Q.

Given a set of lock modes, we can define a compatibility function on them as follows:

Let A and B represent arbitrary lock modes. Suppose, that a transaction Ti requests a lock of mode A on item Q on which another transaction Tj currently holds a lock of mode B. If transaction Ti can be granted a lock on Q immediately, in spite of the presence of the mode B lock, then we say mode A is compatible with mode B. Such a function can be represented by a lock-compatibility matrix as follows

S X
True False
False False
S
X

To access a data item, transaction Ti must first lock that item. If the data item is already locked by another transaction in an incompatible mode, the concurrency-control manager will not grant lock until all incompatible locks held by other transactions have been released.

Let A and B be two accounts that are accessed by transactions T1 and T2. Transaction T1 transfers $50 from account B to account A. Transaction T2 displays the total amount of money in accounts A and B.

T1: lock-X(B); T2: lock-S(A);
read(B); read(A);
B: B – 50; unlock(A);
write(B); lock – S(B);
unlock(B); read(B);
lock – X(A); unlock(B);
read(A); display(A+B).
A:= A + 50;
write(A);
unlock(A).

T1 T2 Concurrency control manager
lock-X(B)
Grant-X(B,T1)
Read(B)
B:- B-50
Write(B)
Unlock(B)
Lock-S(A)
Grant-S(A,T2)
Read(A)
Unlock(A)
Lock-S(B)
Grant-S(B,T2)
Read(B)
Unlock(B)
Display(A+B)
Lock-X(A)
Grant-X(A,T2)
Read(A)
A:= A+50
Write(A)
Unlock(A)

Suppose that the values of accounts A and B are $100 and $200 respectively. If these two transactions are executed serially, either in order T1, T2 or the order T2, T1, then transaction T2 will display the value $300.
But if these transactions are executed in the schedule shown in the table above, T2 displays $250 which is incorrect.
The reason for this mistake is that transaction T1 unlocked data item B too early, as a result of which T2 saw an inconsistent state.

This problem can be overcome by delaying the unlocking to the end of transaction. If T1 and T2 are executed as shown below:

T1:lock-X(B); T2: lock-S(A);
read(B); read(A);
B: B – 50;
write(B); lock – S(B);
read(B);
lock – X(A);
read(A); display(A+B).
A:=A+50; unlock(A);
write(A); unlock(B);
unlock(B);
unlock(A).

We shall require that each transaction in the system follow a set of rules, called a locking protocol, indicating when a transaction may lock and unlock each of the data items. Locking protocols restrict the number of possible schedules. The set of all such schedules is a proper subset of all possible serializable schedules.

Let {T0, T1, …….Tn} be a set of transactions participating in a schedule S. We say that Ti precedes Tj in S, written Ti Tj, if there exists a data item Q such that Ti has held lock mode A on Q, and Tj has held lock mode B on Q later, and comp (A,B) = false. If Ti Tj, then that precedence implies that in any equivalent serial schedule, Ti must appear before Tj.

A schedule S is legal under a given locking protocol if S is a possible schedule for a set of transactions that follow the rules of the locking protocol.

Granting of Locks

When a transaction Ti never gets a lock on a data item because other transactions which generates request after Ti get the lock on that item, then the transaction Ti may never make progress and is said to be starved.

We can avoid starvation of transactions by granting locks in the following manner:
When a transaction Ti requests a lock on a data item Q in a particular mode M, the concurrency-control manager grants the lock provided that
1. There is no other transaction holding a lock on Q in a mode that conflicts with M
2. There is no other transaction that is waiting for a lock on Q, and that made its lock request before Ti.
The Two-Phase Locking Protocol

This protocol ensures serializability. It requires that each transaction issue lock and unlock requests in two phases:

1. Growing phase: A transaction may obtain locks, but may not release the lock.

2. Shrinking phase: A transaction may release locks, but may not obtain any new locks.

Initially, a transaction is in the growing phase. The transaction is in the growing phase. The transaction acquires locks as needed. Once the transaction acquires locks as needed. Once the transaction releases a lock, it enters the shrinking phase, and it can issue no more lock requests.

Strict two-phase locking protocol:
• A modification of two-phase locking
• This protocol requires not only that locking be two phase, but also that all exclusive-mode locks taken by a transaction be held until that transaction commits.
• This protocol avoids cascading rollbacks.

Another variant of two phase locking is the rigorous two-phase locking protocol, which requires that all locks be held until the transaction commits.

Time-stamp Based Protocols
The locking protocols determine the order between every pair of conflicting transactions at execution time by the first lock that both members of the pair request that involves incompatible nodes. Another method for determining the serializability order is to select an ordering among the transactions in advance. The most common method for doing so is to use a timestamp-ordering scheme.

With each transaction Ti in the system, a unique fixed timestamp is fixed, denoted by TS(Ti). This timestamp is assigned before Ti starts execution. If a transaction Ti has been assigned timestamp TS(Ti), and a new transaction Tj enters the system, then TS(Ti) < TS(Tj). There are two simple method for implementing this scheme: 1. Use the value of the system clock as the timestamp; i.e. a transaction’s timestamp is equal to the value of the clock when the transaction enters the system. 2. Use a logical counter that is connected that in incremented after a new timestamp has been assigned; i.e. a transaction’s timestamp is equal to the value of the counter when the transaction enters the system. To implement the timestamp scheme, we associate with each data item Q two timestamp values: W-timestamp(Q) denotes the largest timestamp of any transaction that executed write(Q) successfully. R-timestamp(Q) denotes the largest timestamp of any transaction that executes read(Q) successfully. Timestamp-Ordering Protocol The timestamp-ordering protocol ensures that any conflicting read and write operations are executed in timestamp order. This protocol operates as follows: 1. Suppose that transaction Ti issues read(Q). a. If TS(Ti) < W-timestamp(Q), then Ti needs to read a value of Q that was already overwritten. Hence, the read operation is rejected, and Ti is rolled back. b. If TS(Ti) > W-timestamp(Q), then read operation is executed, and R-timestamp(Q) is set to the maximum of R-timestamp(Q) and TS(Ti).

2. Suppose that transaction Ti issues write(Q).
a. If TS(Ti) < R-timestamp(Q), then the value of Q that Ti is producing was needed previously, and the system assumed that that value would never be produced. Hence, the system rejects the write operation and rolls Ti back.
b. If TS(Ti) < W-timestamp(Q), then Ti is attempting to write an obsolete value of Q. Hence, the system rejects this write operation and rolls Ti back.
c. Otherwise, the system executes the write operation and sets W-timestamp(Q) to TS(Ti).

The timestamp-ordering protocol ensures conflict serializability because conflicting operations are processed in timestamp order.
The protocol ensures freedom from deadlock, since no transaction ever waits. However, there is a possibility of starvation of long transactions if a sequence of conflicting short transactions causes repeated restarting of the long transaction.
If a transaction is found to be getting restarted repeatedly, conflicting transactions need to temporarily blocked to enable the transaction to finish.

Deadlock Handling
A system is in a deadlock state if there exists a set of transactions such that every transaction in the set is waiting for another transaction in the set.
More precisely, there exists a set of waiting transactions {T0, T1,……..,Tn} such that T0 is waiting for a data item that T1 holds, and T1 is waiting for a data item that T2 holds, and Tn is waiting for a data item that T0 holds. None of the transactions can make progress in such a situation.
Two principal methods for dealing with deadlock problem is to use a deadlock prevention protocol or use a deadlock detection and recovery scheme.

Deadlock Prevention
There are two approaches to deadlock prevention.
One approach ensures that no cyclic wait can occur by ordering the requests for locks, or requiring all locks to be acquired together.
The other approach performs transaction rollbacks instead of waiting for a lock, whenever the wait could potentially result in a deadlock.

The simplest scheme under the first approach requires that each transaction locks all its data items before it begins execution. Either all are locked in one stop or none are locked. There are two main disadvantages of this protocol:
• It is often hard to predict, before transaction begins, what data items need to be locked
• Data item utilization may be very low, since many of the data items may be locked but unused for a long time

Another approach for preventing deadlocks is to impose an ordering of all data items, and to require that a transaction lock data items only in a sequence consistent with the ordering.

The second approach for preventing deadlocks is to use preemption and transaction rollbacks. Two different deadlock prevention schemes using timestamps have been proposed:
• The wait-die scheme is a non-preemptive technique. When a transaction Ti requests a data item currently held by Tj, Ti is allowed to wait only if it has a timestamp earlier than that of Tj (i.e. Ti is older that Tj). Otherwise Ti rolled back (dies).
• The wound-wait scheme is a preemptive technique. When transaction Ti requests a data item currently held by Tj, Ti is allowed to wait only if it has a timestamp larger than that of Ti (i.e. Ti is younger that Tj). Otherwise, Tj is rolled back (Tj is wounded by Ti).
In the wait-die scheme, an older transaction must wait for a younger one to release its data item. So, older the transaction gets, the more it needs to wait. In wound-wait scheme an older transaction never waits for a younger transaction.
The major problem with both of these schemes is that unnecessary rollbacks may occur.

Deadlock Detection and Recovery
To detect deadlock and recover from it, a system must:
• Maintain information about the current allocation of data items to transactions, as well as any outstanding data item requests.
• Provide an algorithm that uses this information to determine whether a system has entered a deadlock state.
• Recover from the deadlock when the detection algorithm determines that a deadlock exists.

Deadlock Detection
Deadlocks can be described precisely in terms of a directed graph called a wait-for graph. This graph consists of a pair G = (V,E), where V is a set of vertices and E is a set of edges. The set of vertices consists of all the transactions in the system. Each element in the set E of edges is an ordered pair Ti Tj. If Ti Tj is in E, then there is a directed edge from transaction Ti to Tj, implying that transaction Ti is waiting for transaction Tj to release a data item that it needs.
When transaction Ti requests a data item currently being held by transaction Tj, then the edge Ti Tj is inserted in the wait-for graph.
A deadlock exists if and only if the wait-for graph contains a cycle. To detect deadlocks, the system needs to maintain the wait-for graph, and periodically invoke an algorithm that searches for a cycle in the graph.

Recovery from Deadlock
When a deadlock algorithm determines that a deadlock exists, the system must recover from the deadlock. The most common solution is to roll back one or more transactions to break the deadlock. Three actions need to be taken:

1) Selection of a victim: We must determine which transaction to roll back to break the deadlock. We should roll back those transactions that will incur the minimum cost. Many factors may determine the cost of a rollback such as:
How long the transaction has computed, and how much longer the transaction will compute before it completes its designated task?
How many data items the transaction has used?
How many more data items the transaction needs for it to be complete?
How many transactions will be involved in the rollback?

2) Rollback: We must determine how far the selected transaction should be rolled back.
Simplest solution is total rollback. However it is more effective to perform a partial rollback although it requires some additional overheads.

3) Starvation: Some transaction may always be selected and never get a chance to proceed and starve. We must ensure that transaction can be picked as a victim only a finite number of times. For this the number of rollbacks can be include in the cost factor.

Information Technology

Wednesday, September 1, 2010

Transaction in DBMS

No comments:

Post a Comment