Transactions and Concurrency: Ensuring Data Integrity and Consistency in a Database Management System
Introduction
to Transactions
Transactions are a fundamental concept
in database management systems that allow multiple operations to be treated as
a single unit of work. They are used to ensure the integrity and consistency of
the data in a database by either completing all the operations within the
transaction successfully, or rolling back all the operations if any of them
fail.
A transaction typically consists of a
set of database operations that are executed as a single unit. For example, a
transaction might involve transferring funds from one bank account to another,
updating the balance of the two accounts, and recording the transaction in the
bank's records. All of these operations are treated as a single unit of work,
and either all of them are completed successfully, or none of them are
completed at all.
Transactions are important because
they help ensure the integrity and consistency of the data in a database.
Without transactions, it would be possible for one operation to complete while
another operation fails, resulting in lost or corrupted data. Transactions also
help prevent data corruption by rolling back any operations that violate the
rules or constraints of the database.
Examples of transactions in real-world
scenarios include a bank transferring funds from one account to another, an
online shopping cart adding items to a customer's order and charging their
credit card, and a reservation system checking the availability of a hotel room
and then booking it. In each of these examples, the operations involved are
treated as a single unit of work, ensuring the integrity and consistency of the
data in the database.
Properties of
a Transaction
There are several properties or
characteristics that are typically associated with a transaction in the context
of computer science and database management. These properties are often
referred to as the "ACID properties," which stands for Atomicity,
Consistency, Isolation, and Durability.
1.
Atomicity: This property refers to the all-or-nothing
nature of a transaction. If a transaction consists of multiple actions, either
all of the actions will be completed successfully, or none of them will be
completed at all. This ensures that the database remains in a consistent state,
even if there are errors or failures during the transaction.
2.
Consistency: A transaction must leave the database in a
consistent state, meaning that all constraints and rules defined by the
database must be satisfied after the transaction is completed.
3.
Isolation: Transactions should be isolated from one
another, meaning that the actions of one transaction should not affect the
actions of another transaction until the first transaction is completed. This
ensures that the database remains in a consistent state even if multiple
transactions are being executed simultaneously.
4.
Durability: Once a transaction has been completed
successfully, the changes made by the transaction should be permanent and
should not be lost in the event of a system failure or crash. This ensures that
the database remains consistent even in the face of unexpected events.
CONCURRENT
TRANSACTIONS
Concurrent transactions are
transactions that are executed simultaneously or overlap in time. They may
occur in database management systems when multiple users or processes are
accessing and modifying the data simultaneously.
For example, consider a database that
contains a table of customer accounts. If two users want to access and update
the same customer account at the same time, the database management system must
manage the concurrent access and modification of the data to ensure that the
data remains in a consistent state and that the integrity of the data is
maintained.
Conflicting
Operations in Schedule with example
A conflicting operation in a schedule
refers to a situation where two or more operations in the schedule are
attempting to access or modify the same data simultaneously, and the order in
which these operations are executed could affect the final outcome of the
schedule.
For example, consider a schedule with
two transactions, T1 and T2, that each modify a value in a database table. If
T1 and T2 are conflicting operations, it means that the order in which they are
executed could affect the final value of the data they are modifying.
Here is an example of a schedule with
conflicting operations:
- T1: Read value x from table A
- T2: Read value y from table B
- T1: Modify value x in table A
- T2: Modify value y in table B
In this schedule, T1 and T2 are
conflicting operations because they both modify data in separate tables. If T1
is executed before T2, the final values of x and y will be different than if T2
is executed before T1. This can lead to problems with the consistency and
integrity of the data in the database, and must be carefully managed to ensure
that the final results are correct.
Problems of
Concurrent Transactions
Concurrent transactions refer to
situations where multiple transactions are executing simultaneously within a
database. While concurrent transactions can often improve the performance and
efficiency of a database, they can also introduce a number of problems if not
carefully managed.
Here are some examples of problems
that can occur with concurrent transactions, along with more detailed
explanations:
1.
Deadlocks: A deadlock occurs when two or more
transactions are waiting for each other to release a lock on a piece of data,
causing a standstill. This can lead to performance issues and can cause the
transactions to time out.
For example, consider the following situation:
Transaction T1:
·
Acquire a lock on
data A
·
Acquire a lock on
data B
Transaction T2:
·
Acquire a lock on
data B
·
Acquire a lock on
data A
If T1 and T2 are executing concurrently, T1
will be waiting for T2 to release the lock on data B, while T2 is waiting for
T1 to release the lock on data A. This creates a deadlock, as neither
transaction can proceed until the other releases its lock. To prevent
deadlocks, transactions must be carefully designed to avoid waiting for locks
that are held by other transactions.
2.
Inconsistency: If conflicting operations are not properly
managed, concurrent transactions can lead to data inconsistencies within the
database. For example, if two transactions are modifying the same data
simultaneously, the final result may not be what was intended.
For example, consider the following situation:
Transaction T1:
·
Read value x from
table A
·
Modify value x in
table A
Transaction T2:
·
Read value x from
table A
·
Modify value x in
table A
If T1 and T2 are executing concurrently, it is
possible that T2 will overwrite the changes made by T1, leading to data
inconsistency. To prevent this problem, transactions must be properly
synchronized and managed to ensure that conflicting operations are executed in
the correct order.
3.
Lost updates: If two transactions are modifying the same
data simultaneously and one of the transactions is not able to see the changes
made by the other transaction, it may overwrite the changes, leading to data
loss.
For example, consider the following situation:
Transaction T1:
·
Read value x from
table A
·
Modify value x in
table A
Transaction T2:
·
Read value x from
table A
·
Modify value x in
table A
If T1 is executing and acquires a lock on
value x, and T2 begins executing before T1 has released its lock, T2 will not
be able to see the changes made by T1. As a result, T2 may overwrite the
changes made by T1, leading to data loss. To prevent this problem, transactions
must be properly synchronized and managed to ensure that all transactions have
access to the most up-to-date data.
Serialisable
Schedules
A serializable schedule is a schedule
of transactions that has the same effect as if the transactions were executed
one at a time, in a serial (single-threaded) manner. This means that the
schedule produces the same final results as if the transactions were executed
sequentially, without any concurrent execution.
Serializable schedules are important
in the context of database management because they ensure that the database
remains in a consistent state even when multiple transactions are executing
concurrently. By using a serializable schedule, the database can guarantee that
the final results of the transactions will be the same as if they were executed
one at a time, without any potential conflicts or inconsistencies.
Here is an example of a serializable schedule:
Transaction
T1:
·
Read value x from
table A
·
Modify value x in
table A
Transaction
T2:
·
Read value y from
table B
·
Modify value y in
table B
In this example, T1 and T2 are
independent transactions that do not conflict with each other, so they can be
executed concurrently without affecting the final results. This schedule is
serializable because the final results would be the same as if the transactions
were executed one at a time, in the order T1 followed by T2.
To ensure that a schedule is
serializable, transactions must be properly synchronized and managed to avoid
conflicts and inconsistencies. This can be achieved through the use of locking protocols
and other techniques, such as timestamp ordering and conflict serializability.
Locks
A lock is a mechanism that is used to
prevent multiple transactions from accessing or modifying the same data
simultaneously. Locks are used in database management to ensure the consistency
and integrity of data, and to prevent conflicts and inconsistencies that can
arise when transactions are executing concurrently.
1.
Binary locks
Binary locks can be used in
transactions to synchronize access to shared data and prevent conflicts and
inconsistencies that can arise when transactions are executing concurrently.
For example, consider a database table
that is accessed by multiple transactions. If a transaction needs to read or
modify data in the table, it can acquire a binary lock on the data to ensure
that no other transactions can access or modify the data simultaneously. This
can help to prevent problems such as deadlocks, data inconsistencies, and lost
updates, which can occur when transactions are executing concurrently.
lock = Semaphore(1)
def transaction():
lock.acquire()
# Read or modify data in the database
lock.release()
In this example, the lock variable is a binary lock that is used to synchronize access to data in the database. The transaction acquires the lock before accessing or modifying the data, and releases the lock after it is finished. This ensures that no other transactions can access or modify the data simultaneously, and helps to maintain the consistency and integrity of the data in the database.
2.
Multiple-mode
locks
Multiple-mode locks, also known as
reader-writer locks, are a type of lock that allows multiple transactions
(referred to as "readers") to read data simultaneously, while
allowing only a single transaction (referred to as a "writer") to
modify the data at a time.
Multiple-mode locks are used in
transactions to synchronize access to shared data and prevent conflicts and
inconsistencies that can arise when transactions are executing concurrently.
They are often used in databases and other systems where there is a high ratio
of read operations to write operations, as they allow multiple readers to
access the data simultaneously, improving the performance and concurrency of
the system.
Here is an example of using a
multiple-mode lock in a transaction:
lock = ReaderWriterLock()
def transaction():
lock.acquire_read()
# Read data from the database
lock.release_read()
def transaction_modify():
lock.acquire_write()
# Modify data in the database
lock.release_write()
In this example, the lock variable is a multiple-mode lock that is used to synchronize access to data in the database. The transaction function acquires a read lock before reading the data, and releases the lock after it is finished. The transaction_modify function acquires
Two Phase
Locking (2PL)
Two Phase Locking (2PL) is a protocol
that is used to ensure the serializability of transactions in a database
management system. It is based on the idea of dividing the execution of a
transaction into two phases: the growing phase and the shrinking phase.
1.
Growing phase:
During the growing phase, the transaction acquires locks on the data it needs to
access or modify. Locks can be acquired at any time during this phase, but they
cannot be released until the transaction enters the shrinking phase.
2.
Shrinking phase:
During the shrinking phase, the transaction releases all of the locks it has
acquired. Once the transaction has entered the shrinking phase, it cannot
acquire any new locks, but it can continue to access and modify data that it
has already locked.
Two Phase Locking is used to ensure
the serializability of transactions by preventing conflicting operations from
occurring simultaneously. By enforcing a strict lock acquisition and release
order, 2PL can prevent problems such as deadlocks, data inconsistencies, and
lost updates, which can occur when transactions are executing concurrently.
2PL can be implemented using various types of locks, such as shared locks, exclusive locks, and update locks. It is a widely used protocol in database management systems, and is an important tool for maintaining the consistency and integrity of data in a concurrent environment.
DEADLOCK AND
ITS PREVENTION
A deadlock is a situation that occurs
when two or more transactions are waiting for each other to release a lock on a
piece of data, causing a standstill. Deadlocks can cause performance issues and
can cause transactions to time out. They can occur in a database management
system when transactions are executing concurrently and conflicting operations
are not properly managed.
There are several ways to prevent
deadlocks in a database management system:
1.
Lock
ordering: One approach to preventing deadlocks
is to establish a strict lock acquisition order, such that transactions always
acquire locks on data in a specific order. This can help to prevent deadlocks
by avoiding situations where two transactions are waiting for each other to
release a lock.
2.
Timeouts: Another approach is to set a timeout for
transactions that are waiting to acquire a lock. If a transaction is unable to
acquire a lock within the specified time limit, it can be terminated and rolled
back, allowing other transactions to proceed.
3.
Deadlock detection
and resolution: Some database
management systems include built-in mechanisms for detecting and resolving
deadlocks. When a deadlock is detected, the system can choose one of the
transactions involved in the deadlock to roll back and release its locks,
allowing the other transactions to proceed.
4.
Avoiding
resource hierarchies: Deadlocks can
also be prevented by avoiding resource hierarchies, where a transaction holds a
lock on a resource that another transaction needs in order to acquire a lock on
its own resource. To avoid this problem, transactions should acquire all
necessary locks at the same time, rather than acquiring them incrementally.
5.
Avoiding
lock escalation: Lock escalation
refers to the process of converting many small locks on individual rows or
pages into a single, larger lock on a table or database. Lock escalation can
lead to deadlocks, as it can cause transactions to wait for locks that are held
by other transactions. To avoid this problem, transactions should acquire locks
at the lowest possible granularity (e.g., at the row or page level) and avoid
holding locks for an extended period of time.
Wait-die
scheme
The wait-die scheme is a deadlock
prevention technique that is based on the concept of aging. In this scheme,
each transaction is assigned an age, which is a measure of how long it has been
waiting for a lock. When a transaction attempts to acquire a lock, it compares
its age to the age of the transaction that holds the lock. If the requesting
transaction's age is greater than the age of the holding transaction, it enters
the waiting state. If the requesting transaction's age is less than the age of
the holding transaction, it is terminated (i.e., it "dies").
The wait-die scheme is based on the
assumption that younger transactions are more likely to succeed in the future,
and that older transactions are less likely to succeed. By allowing younger
transactions to wait and terminating older transactions, the scheme can help to
prevent deadlocks and improve the performance of the database.
Here is an example of the wait-die
scheme in action:
Transaction T1:
- Attempts to acquire a lock on data A
- Compares its age to the age of the
transaction holding the lock (T2)
- If T1's age is greater than T2's age, T1
enters the waiting state
- If T1's age is less than T2's age, T1 is
terminated
Transaction T2:
- Attempts to acquire a lock on data B
- Compares its age to the age of the
transaction holding the lock (T1)
- If T2's age is greater than T1's age, T2
enters the waiting state
- If T2's age is less than T1's age, T2 is
terminated
In this example, T1 and T2 are both attempting to acquire locks on data that is held by the other transaction. If T1's
Wound-wait
scheme
The wound-wait scheme is a mechanism
for preventing deadlocks in a database management system. It is based on the
idea of allowing transactions to "wound" each other, meaning that a
transaction can force another transaction to wait for a lock, rather than
waiting itself.
In the wound-wait scheme, a
transaction that is waiting for a lock can be "wounded" by another
transaction that needs the lock. When this occurs, the waiting transaction is
rolled back and released, allowing the other transaction to acquire the lock
and proceed. The wounded transaction can then be restarted and will try to
acquire the lock again, potentially wounding another transaction if necessary.
The wound-wait scheme is designed to
prevent deadlocks by breaking the cycle of mutual waiting that can occur when
two transactions are waiting for each other to release a lock. By allowing
transactions to wound each other and releasing waiting transactions, the scheme
can help to ensure that all transactions are able to make progress and avoid
becoming stuck in a deadlock.
However, the wound-wait scheme can
also lead to performance issues if it is used excessively, as it requires
transactions to be rolled back and restarted. As a result, it is typically used
in conjunction with other deadlock prevention mechanisms, such as lock ordering
and deadlock detection and resolution.
Here is an example of how the
wound-wait scheme might work in a database management system:
Transaction T1:
- Acquire a lock on data A
- Read value x from table A
- Release the lock on data A
- Acquire a lock on data B
- Wait for a lock on data C (wounded by T2)
Transaction T2:
- Acquire a lock on data C
- Read value y from table C
- Release the lock on data C
- Acquire a lock on data A (wounds T1)
- Modify value x in table A
- Release the lock on data A
In this example, T1 is waiting for a
lock on data C, which is held by T2. T2, in turn, is waiting for a lock on data
A, which is held by T1. This creates a deadlock situation, as neither
transaction can proceed until the other releases its lock.
To resolve the deadlock, the
wound-wait scheme allows T2 to "wound" T1, causing T1 to be rolled
back and released. T2 can then acquire the lock on data A and proceed with its
operation, modifying value x in table A. T1 can then be restarted and will try to
acquire the lock on data C again. If T2 is still holding the lock on data C at
this point, T1 may be wounded again, causing it to be rolled back and released
once more. This process continues until the deadlock is resolved and both
transactions are able to complete their operations.
Thank you to being reader of GoogleClass post Transactions and Concurrency
Related post :
https://www.googleclass.in/2023/01/an-overview-of-distributed-database.html
https://www.googleclass.in/2023/01/understanding-database-recovery-and.html
https://www.googleclass.in/2022/12/sql-database-objects-views-synonyms.html
https://www.googleclass.in/2022/12/introduction-on-join-query-and-nested.html
https://www.googleclass.in/2022/12/introduction-to-sql-structured-query.html