pg_dirtyread
is a PostgreSQL extension that leverages PostgreSQL's Multi-Version Concurrency Control (MVCC) mechanism to read data affected by uncommitted transactions. This plugin can be used in development and testing environments to quickly view the immediate impact of transactions on data. When data in critical tables is accidentally deleted, business systems may experience service interruptions, data loss, and data inconsistencies, impacting user services.
Overview
pg_dirtyread
is a PostgreSQL extension that utilizes PostgreSQL's MVCC mechanism to read data affected by uncommitted transactions. This tool is useful for quickly checking the immediate impact of transactions on data in development and testing environments. If data is accidentally deleted from critical tables, it could lead to service disruptions, data loss, or data inconsistencies, thereby affecting user services. By using pg_dirtyread
, you can quickly review data from uncommitted delete or update transactions, recover it, restore normal service, and ensure data completeness and consistency.
Recovery Mechanism
In PostgreSQL, even after a transaction is committed, data can sometimes be recovered if it hasn't been cleared by VACUUM. This is because PostgreSQL uses the MVCC mechanism, which allows old tuples (known as dead tuples) to remain in the system after a transaction is committed until they are cleared by VACUUM. Specifically:
MVCC Mechanism: Even after a transaction is committed, deleted or updated rows remain as old versions, allowing recovery before VACUUM runs.
Role of VACUUM: VACUUM cleans up dead tuples in tables, reclaims storage space, and resets row identifiers (TIDs), allowing new rows to reuse the space. Once VACUUM is executed, dead tuples are cleared, making data recovery through standard methods impossible.
Advantages and Disadvantages
The
pg_dirtyread
plugin is very convenient for recovering data with a simple installation process.The plugin returns all data, including non-deleted data.
Due to PostgreSQL's MVCC mechanism,
pg_dirtyread
can only be used for recovering data from DML operations.Point-in-time recovery requires enabling
track_commit_timestamp = on
.It supports PostgreSQL version 16 and earlier.
Installation
# Install dependencies yum install postgresql16-devel clang wget https://github.com/df7cb/pg_dirtyread/archive/refs/tags/2.7.tar.gz tar xvf 2.7.tar.gz mv pg_dirtyread-2.7/ /jesong/pgdata/contrib/ cd /jesong/pgdata/contrib/ make PG_CONFIG=/usr/pgsql-16/bin/pg_config make install PG_CONFIG=/usr/pgsql-16/bin/pg_config # Log in to the database and install the plugin postgres=# CREATE EXTENSION pg_dirtyread; postgres=# SELECT * FROM pg_available_extensions; postgres=# \dx
Prerequisites for Data Recovery
# If accidental operations lead to data deletion, immediately disable VACUUM on the table ALTER TABLE foo SET (autovacuum_enabled = false, toast.autovacuum_enabled = false); # Check that the table has not been vacuumed postgres=# SELECT * FROM pg_stat_all_tables WHERE relname='foo'; -[ RECORD 1 ]-------+----------------------------- last_vacuum | last_autovacuum | vacuum_count | 0 autovacuum_count | 0
Operational Examples
Flashback Based on Deleted Table Data
-- Create a test table CREATE TABLE saas (id bigint, name text, mail text); -- Disable autovacuum for testing ALTER TABLE saas SET ( autovacuum_enabled = false, toast.autovacuum_enabled = false ); -- Insert data INSERT INTO saas VALUES (1, 'Test1','111@qq.com'), (2, 'New Test2','111@qq.com'); -- Delete all data DELETE FROM saas; postgres=# SELECT * FROM saas; -- Retrieve deleted data postgres=# SELECT * FROM pg_dirtyread('saas') AS t(id bigint, name text, mail text); id | name | mail ----+----------+------------ 1 | Test1 | 111@qq.com 2 | New Test2| 111@qq.com (2 rows)
Flashback Based on Deleted Table Columns
CREATE TABLE saas1 (id bigint, name text, mail text); INSERT INTO saas1 VALUES (1, 'Test1','111@qq.com'), (2, 'New Test2','111@qq.com'); ALTER TABLE saas1 DROP COLUMN mail; DELETE FROM saas1; postgres=# SELECT * FROM saas1; -- Retrieve deleted column data postgres=# SELECT * FROM pg_dirtyread('saas1') t(id bigint, name text, dropped_3 text); id | name | dropped_3 ----+----------+------------ 1 | Test1 | 111@qq.com 2 | New Test2| 111@qq.com (2 rows)
Column Specification Rules: Use dropped_N
to access the Nth column, counting from 1.
Limitations:
When PostgreSQL deletes a column, the original metadata is removed, so
pg_dirtyread
requires explicit type specification in column definitions. This includes type length, alignment, and type modifiers.Data must be passed by value.
1. Explicit Type Specification: Columns must be explicitly defined with the correct data type when reading dead tuples.
2. Integrity Check: Ensure accurate type information for effective data recovery and integrity verification.
Point-in-Time Flashback
To restore data to a specific point in time, the pg_xact_commit_timestamp
system function can be used to get the commit time for each transaction.
Procedure:
Get Commit Time: Use
pg_xact_commit_timestamp
to retrieve commit times for transactions, including write transactions (xmin) and delete/update transactions (xmax).Time-Based Flashback Query: Filter data by commit times to restore data to the desired point.
-- Parameter configuration track_commit_timestamp = on; -- Simulate data CREATE TABLE saas2 (id bigint, name text, mail text); INSERT INTO saas2 VALUES (1, 'Test1','111@qq.com'), (2, 'New Test2','111@qq.com'); INSERT INTO saas2 VALUES (3, 'Test3','111@qq.com'), (4, 'New Test4','111@qq.com'); DELETE FROM saas2; -- Query historical versions of data SELECT pg_xact_commit_timestamp(xmin) AS xmin_time, pg_xact_commit_timestamp(CASE xmax WHEN 0 THEN NULL ELSE xmax END) AS xmax_time, * FROM pg_dirtyread('saas2') AS t(tableoid oid, ctid tid, xmin xid, xmax xid, cmin cid, cmax cid, id bigint, name text, mail text);
Restoring Deleted Columns with a Time Filter:
SELECT pg_xact_commit_timestamp(xmin) AS xmin_time, pg_xact_commit_timestamp(CASE xmax WHEN 0 THEN NULL ELSE xmax END) AS xmax_time, * FROM pg_dirtyread('saas3') AS t(tableoid oid, ctid tid, xmin xid, xmax xid, cmin cid, cmax cid, id bigint, name text, dropped_3 text, qq text) WHERE pg_xact_commit_timestamp(xmin) <= '2024-09-13 17:28:48';
Precautions:
Applicability:
pg_dirtyread
is primarily applicable for DML operations under MVCC. It is ineffective for data loss due to DDL operations likeDROP TABLE
.After Transaction Commit: Data recovery becomes more challenging post-commit. If VACUUM hasn't been run, dead tuples from uncommitted transactions remain accessible through
pg_dirtyread
. However, once VACUUM clears dead tuples, recovery is impossible.Security Concerns: Reading uncommitted data with
pg_dirtyread
can raise data consistency and security concerns. It should be used cautiously in production, ensuring data safety.
Conclusion
pg_dirtyread
is a PostgreSQL extension mainly used in development and testing to quickly view and recover data lost due to accidental operations. It leverages the MVCC mechanism to read uncommitted data, applicable to DML operations. Before transaction commit, pg_dirtyread
offers a convenient data recovery method. However, once a transaction is committed, timely VACUUM runs are crucial; otherwise, recovery becomes impossible. Thus, pg_dirtyread
should be cautiously employed in production, with reliance on backups and WAL log archiving as more reliable recovery methods.