PostgreSQL / Inserting and modifying data
Understanding and using transactions in PostgreSQL
Transactions are a mechanism that encapsulates multiple statements into a single operation for the database to process. Instead of feeding in individual statements, the database is able to interpret and act on the group of commands as a cohesive unit. This helps ensure the consistency of the dataset over the course of many closely related statements.
In this guide, we'll start by discussing what transactions are and why they are beneficial. Afterwards, we'll take a look at how PostgreSQL implements transactions and the various options you have when using them.
What are transactions?
Transactions are a way to group together and isolate multiple statements for processing as a single operation. Instead of executing each command individually as it is sent to the server, in a transaction, commands are bundled together and executed in a separate context than other requests.
Isolation is an important part of transactions. Within a transaction, the executed statements can only affect the environment within the transaction itself. From inside the transaction, statements can modify data and the results are immediately visible. From the outside, no changes are made until the transaction is committed, at which time all of the actions within the transaction become visible at once.
These features help databases achieve ACID compliance by providing atomicity (actions in a transaction are either all committed or all rolled back) and isolation (outside of the transaction, nothing changes until the commit while inside, the statements have consequences). These together help the database maintain consistency (by guaranteeing that partial data transformations cannot occur). Furthermore, changes in transactions are not returned as successful until they are committed to non-volatile storage, which provides durability.
To achieve these goals, transactions employ a number of different strategies and different database systems use different methods. PostgreSQL uses a system called Multiversion Concurrency Control (MVCC), which allows the database to perform these actions without unnecessary locking using data snapshots. All together, these systems comprise one of the fundamental building blocks of modern relational databases, allowing them to safely process complex data in a crash-resistant manner.
Types of consistency failures
One reason people use transactions is to gain certain guarantees about the consistency of their data and the environment in which it is processed. Consistency can be broken in many different ways, which affects how databases attempt to prevent them.
There are four primary ways that inconsistency can arise depending on the transaction implementation. Your tolerance for scenarios where these scenarios may arise will affect how you use transactions in your applications.
Dirty reads occur when the statements within a transaction are able to read data written by other in-progress transactions. This means that even though the statements of a transaction have not been committed yet, they can be read and thus influence other transactions.
This is often considered a severe breach of consistency, as transactions are not properly isolated form one another. Statements that may never be committed to the database can affect the execution of other transactions, modifying their behavior.
Transactions that allow dirty reads cannot make any reasonable claims about the consistency of the resulting data.
Nonrepeatable reads occur when a commit outside of the transaction alters the data seen within the transaction. You can recognize this type of problem if, within a transaction, the same data is read twice but different values are retrieved in each instance.
As with dirty reads, transactions that allow non-repeatable reads don't offer full isolation between transactions. The difference is that with non-repeatable reads, the statements affecting the transaction have actually been committed outside of the transaction.
A phantom read is a specific type of non-repeatable read that occurs when the rows returned by a query are different the second time it is executed within a transaction.
For instance, if a query within the transaction returns four rows the first time it is executed, but five rows the second time, this is a phantom read. Phantom reads are caused by commits outside of the transaction altering the number of rows that satisfy the query.
Serialization anomalies occur when the results of multiple transactions committed concurrently will result in different outcomes than if they were committed one after another. This can occur any time that a transaction allows two commits to occur that each modify the same table or data without resolving conflicts.
Serialization anomalies are a special type of problem that early types of transactions had no understanding of. This is because early transactions were implemented with locking, where one could not continue if another transaction was reading from or altering the same piece of data.
Transaction isolation levels
Transactions are not a "one size fits all" solution. Different scenarios require different trade-offs between performance and protection. Fortunately, PostgreSQL allows you to specify the type of transaction isolation you need.
The isolation levels offered by most database systems include the following:
Read uncommitted is the isolation level that offers the fewest guarantees about maintaining data consistency and isolation. While transactions using
read uncommitted have certain features frequently associated with transactions, like the ability to commit multiple statements at once or to roll back statements if a mistake occurs, they do allow numerous situations where consistency can be broken.
Transactions configured with the
read uncommitted isolation level allow:
- dirty reads
- non-repeatable reads
- phantom reads
- serialization anomalies
This level of isolation is actually not implemented in PostgreSQL. Although PostgreSQL recognizes the isolation level name, internally, it is not actually supported and "read committed" (described below) will be used instead.
Read committed is an isolation level that specifically protects against dirty reads. When transactions use the
read committed level of consistency, uncommitted data can never affect the internal context of a transaction. This provides a basic level of consistency by ensuring that uncommitted data never influences a transaction.
read committed offers greater protection than
read uncommitted, it does not protect against all types of inconsistency. These problems can still arise:
- non-repeatable reads
- phantom reads
- serialization anomalies
PostgreSQL will use the
read committed level by default if no other isolation level is specified.
The repeatable read isolation level builds off of the guarantee provided by
read committed. It avoids dirty reads as before, but prevents non-repeatable reads as well.
This means that no changes committed outside of the transaction will ever impact the data read within the transaction. A query executed at the start of a transaction will never have a different result at the end of the transaction unless directly caused by statements within the transaction.
While the standard definition of the
repeatable read isolation level requires only that dirty and non-repeatable reads are prevented, PostgreSQL also prevents phantom reads at this level. This means that commits outside of the transaction cannot alter the number of rows that satisfy a query.
Since the state of the data seen within the transaction can deviate from the up-to-date data in the database, transactions can fail on commit if the two datasets cannot be reconciled. Because of this, one drawback of this isolation level is that you may have to retry transactions if there is a serialization failure on commit.
repeatable read isolation level blocks most types of consistency issues but serialization anomalies can still occur.
The serializable isolation level offers the highest level of isolation and consistency. It prevents all of the scenarios that the
repeatable read level does while also removing the possibility of serialization anomalies.
Serializable isolation guarantees that concurrent transactions are committed as if they were executed one after another. If a scenario occurs where a serialization anomaly could be introduced, one of the transactions will have a serialization failure instead of introducing inconsistency to the data set.
Defining a transaction
Now that we've covered the different isolation levels that PostgreSQL can use in transactions, let's demonstrate how to define transactions.
In PostgreSQL, every statement outside of an explicitly marked transaction is actually executed in its own, single-statement transaction. To explicitly start a transaction block, you can use either the
START TRANSACTION commands (they are synonymous). To commit a transaction, issue the
The basic syntax of a transaction therefore looks like this:
As a more concrete example, imagine that we are attempting to transfer $1000 from one account to another. We want to ensure that the money will always be in one of the two accounts but never in both of them.
We can wrap the two statements that together encapsulate this transfer in a transaction that looks like this:
BEGIN;UPDATE accountsSET balance = balance - 1000WHERE id = 1;UPDATE accountsSET balance = balance + 1000WHERE id = 2;COMMIT;
Here, the $1000 will not be taken out of the account with
id = 1 without also putting $1000 into the account with
id = 2. While these two statements are executed sequentially within the transaction, they will be committed, and thus be executed on the underlying data set, simultaneously.
Rolling back transactions
Within a transaction, either all or none of the statements will be committed to the database. Abandoning the statements and modifications made within a transaction instead of applying them to the database is known as "rolling back" the transaction.
Transactions can be rolled back either automatically or manually. PostgreSQL automatically rolls back transactions if one of the statements within results in an error. It also rolls back the transaction if serialization error would occur if the chosen isolation level does not allow them.
To manually roll back statements that have been given during the current transaction, you can use the
ROLLBACK command. This will cancel all of the statements within the transaction, in essence turning back the clock to the start of the transaction.
For instance, supposing we're using the same bank accounts example we were using before, if we find out after issuing the
UPDATE statements that we accidentally transferred the wrong amount or used the wrong accounts, we could rollback the changes instead of committing them:
BEGIN;UPDATE accountsSET balance = balance - 1500WHERE id = 1;UPDATE accountsSET balance = balance + 1500WHERE id = 3; -- Wrong account number here! Must rollback/* Gets us back to where we were before the transaction started */ROLLBACK;
ROLLBACK, the $1500 will still be in the account with
id = 1.
Using save points when rolling back
By default, the
ROLLBACK command resets the transaction to where it was when the
START TRANSACTION commands were first called. But what if we only want to revert some of the statements within the transaction?
While you cannot specify arbitrary places to roll back to when issuing
ROLLBACK command, you can roll back to any "save points" that you've set throughout the transaction. You can mark places in your transaction ahead of time with the
SAVEPOINT command and then reference those specific locations when you need to roll back.
These save points allow you to create an intermediate roll back point. You can then optionally revert any statements made between where you are currently and the save point and then continue working on your transaction.
To specify a save point, issue the
SAVEPOINT command followed by a name for the save point:
To roll back to that save point, use the
ROLLBACK TO command:
ROLLBACK TO save_1;
Let's continue the account-focused example we've been using:
BEGIN;UPDATE accountsSET balance = balance - 1500WHERE id = 1;/* Set a save point that we can return to */SAVEPOINT save_1;UPDATE accountsSET balance = balance + 1500WHERE id = 3; -- Wrong account number here! We can rollback to the save point though!/* Gets us back to the state of the transaction at `save_1` */ROLLBACK TO save_1;/* Continue the transaction with the correct account number */UPDATE accountsSET balance = balance + 1500WHERE id = 4;COMMIT;
Here, we're able to recover from a mistake we made without losing all of the work we've done in the transaction so far. After rolling back, we continue with the transaction as planned using the correct statements.
Setting the isolation level of transactions
To set the level of isolation you'd like for a transaction, you can add an
ISOLATION LEVEL clause to your
START TRANSACTION or
BEGIN command. The basic syntax looks like this:
BEGIN ISOLATION LEVEL <isolation_level>;statementsCOMMIT;
<isolation_level> can be any of these (described in detail earlier):
READ UNCOMMITTED(will result in
READ COMMITTEDsince this level isn't implemented in PostgreSQL)
SET TRANSACTION command can also used to set the isolation level after a transaction is started. However, you can only use
SET TRANSACTION before any queries or data modifying commands are executed, so it doesn't allow for increased flexibility.
If you have multiple transactions that should be executed sequentially, you can optionally chain them together using the
COMMIT AND CHAIN command.
COMMIT AND CHAIN command completes the current transaction by committing the statements within. After the commit has been processed, it immediately opens a new transaction. This allows you to group another set of statements together in a transaction.
The statement works exactly as if you'd issued
BEGIN;UPDATE accountsSET balance = balance - 1500WHERE id = 1;UPDATE accountsSET balance = balance + 1500WHERE id = 2;/* Commit the data and start a new transaction that will take into account the committed from the last transaction */COMMIT AND CHAIN;UPDATE accountsSET balance = balance - 1000WHERE id = 2;UPDATE accountsSET balance = balance + 1000WHERE id = 3;COMMIT;
Chaining transactions doesn't offer much in terms of new functionality, but it can be helpful for committing data at natural boundaries while continuing to focus on the same type of operations.
Transactions are not a silver bullet. There are a lot of trade offs that come with various isolation levels and understanding what types of consistency you need to protect can take thought and planning. This is especially true with long running transactions where the underlying data may change significantly and the possibility of conflict with other concurrent transactions increases.
That being said, the transaction mechanic offers a lot of flexibility and power. It goes a long way towards ensuring ACID guarantees are maintained even while performing interrelated, concurrent operations. Knowing when and how to properly use transactions to perform complex, safe operations is invaluable.