Re: "OLD." || myColumnNameVar (How to generically access columns in a trigger's OLD or NEW records)

From: Basil Bourque <basil(dot)list(at)me(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: "OLD." || myColumnNameVar (How to generically access columns in a trigger's OLD or NEW records)
Date: 2011-05-01 08:58:53
Message-ID: 78BB7F3D-841C-485C-86E1-7D73EB5974DE@me.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hoorah! I was able to complete my single PL/pgSQL function to create history records tracking individual field value changes generically for all my tables. Some developers call this an "audit trail", though an accountant might say otherwise.

Thanks for all the replies.

Special thanks to John DeSoi for pointing me to a crucial code example to make PL/pgSQL interpret:
"OLD." || myColumnNameVar
as:
OLD.first_name (for example)

The crucial line is:
EXECUTE 'SELECT ($1).' || ri.column_name || '::text' INTO t USING NEW;
found here:
http://wiki.postgresql.org/wiki/PL/pgSQL_Dynamic_Triggers
I'm new to SQL and Postgres, so I may be wrong but my interpretation of this is:
Since there appears no way to make the PL/pgSQL interpreter interpret our desired string, the trick is to look outside of PL/pgSQL and instead use the SQL interpreter. Brilliant! It works, and it seems to be fast enough, at least for my needs.

I'll share my current code & table structure below. Caveat: This code has not yet been thoroughly tested, nor has it been deployed. I only finalized it today.

--Basil Bourque

[A] I'm working on a greenfield project, where:
• I've built every table to have a primary key of type uuid named exactly "pkey_".
• Every table has a TIMESTAMPTZ field named "record_modified_".
My approach below hard-codes these assumptions.

[B] I have this "history_" table:

CREATE TABLE history_
(
pkey_ uuid NOT NULL DEFAULT uuid_generate_v1mc(), -- The primary key for this table, though no primary key constraint was created (for the sake of performance and conservation). This column and timestamp_ column are the only two columns about this table itself. All other columns are about the inserted/modified/deleted record in some other table.
table_name_ character varying(120) NOT NULL, -- Name of table whose row is being affected (inserted, deleted, or modified).
column_name_ character varying(120) NOT NULL, -- Name of the column in some other table whose row value is being modified. This column's value is empty string if the operation was DELETE.
timestamp_ timestamp with time zone NOT NULL DEFAULT clock_timestamp(), -- The moment this record was created. Using the clock_timestamp() function as a default, to capture the actual moment in time rather than moment when transaction began.
db_user_name_ character varying(120) NOT NULL DEFAULT "current_user"(), -- The name of the Postgres user logged in to this database connection/session.
app_name_ character varying(120) NOT NULL DEFAULT current_setting('application_name'::text), -- The name of the application connected to the database. May also include the version number of app, and the name of the human user authenticated within the app.
old_value_ character varying(120) NOT NULL DEFAULT ''::character varying,
new_value_ character varying(120) NOT NULL DEFAULT ''::character varying,
uuid_ uuid NOT NULL, -- The UUID of the row being affected, the row being inserted, updated, or deleted. Assumes every table whose history is being recorded uses the 'uuid' data type as its primary key.
operation_ character varying(120) NOT NULL, -- What database operation resulted in this trigger running: INSERT, UPDATE, DELETE, or TRUNCATE.
table_oid_ oid NOT NULL, -- The oid of the table whose record is being modified. May be helpful if a table name changes over time.
ordinal_position_of_column_ integer NOT NULL, -- The position of the affected column in its table. Every new column gets a number, incremented by one for each. This may be helpful when analyzing changes across a stretch of time during which a column's name was changed. Apparently columns have no oid, so we are recording this number instead.
transaction_began_ timestamp with time zone NOT NULL DEFAULT transaction_timestamp() -- The time when the current transaction began. Can act like a transaction identifier, to group multiple "history_" rows of the same transaction together. This is not foolproof, as multiple transaction could possibly start in the same split second moment. Assuming the computer's clock has a fine resolution, this chance of a coincidence should be quite miniscule.
)

I do not have a primary constraint for this table. The "pkey_" column acts as a primary key, but there is no need for an index or uniqueness testing for this special table.

[C] For every table I want to track field-level value changes, I create a trigger like this:
CREATE TRIGGER XXX_trigger_history_
AFTER INSERT OR UPDATE OR DELETE ON XXX_
FOR EACH ROW EXECUTE PROCEDURE make_history_();
where 'XXX' is the name of the table.

[D] I created this function:

CREATE OR REPLACE FUNCTION make_history_()
RETURNS TRIGGER
LANGUAGE plpgsql
AS
$BODY$

/* Purpose: Make a history of changes to most fields in the table calling this trigger function.
This kind of history tracking is also known as an "audit trail".
This function works by detecting each change in value for important fields in a certain table.
This trigger function then calls another function to create a row in the "history_" table.

This kind of feature is often called an "audit trail" by software developers. I avoid using that term in this context as a real
audit trail in accounting terms involves more than this simple field change tracking.
*/

/* © 2011 Basil Bourque. This source code may be used freely forever by anyone taking full responsibility for doing so, without warranty.

Thanks so very much to John DeSoi of the pgsql-general(at)postgresql(dot)org mailing list for pointing me to this crucial code example:
http://wiki.postgresql.org/wiki/PL/pgSQL_Dynamic_Triggers
Before reading that example, my previous efforts led me to conclude a generic history facility written in PL/pgSQL was impossible.
*/

/* We make these assumptions about any table using this function in its trigger:
• Has a primary key named "pkey_" of type uuid.
• Has a field tracking the datetime the record was last modified, named "record_modified_" of type timestamptz.
• Is in the default/current schema.

While it might be nice to rewrite this function to escape these assumptions, I've spent all my energies to get this far.
I welcome feedback from anyone who want to take this further.
*/

/* Notes:

The 'OLD' and 'NEW' variables represent the entire row whose INSERT/UPDATE/DELETE caused this trigger to run.

The 'TG_xxx' variables are special variables created automatically by Postgres for the trigger function.
For example, TG_OP indicates which modification operation is happening: INSERT, UPDATE, or DELETE.
http://www.postgresql.org/docs/current/static/plpgsql-trigger.html

"clock_timestamp()" gets the actual time at the moment of execution. In contrast, most other timestamp
functions return the time when the current transaction began.
For more information, see: http://www.postgresql.org/docs/current/static/functions-datetime.html

The "history_" table also includes a column "transaction_began_" defaulting to "transaction_timestamp()". This timestamp can act
like a transaction identifier, to group multiple "history_" rows of the same transaction together. This is not foolproof, as
multiple transaction could possibly start in the same split second moment. Assuming the computer's clock has a fine resolution,
this chance of a coincidence should be quite miniscule. If someone knows a way to get a true transaction id, please share.
*/

DECLARE
ri RECORD; -- About this data type "RECORD": http://www.postgresql.org/docs/current/static/plpgsql-declarations.html#PLPGSQL-DECLARATION-RECORDS
oldValue TEXT;
newValue TEXT;
isColumnSignificant BOOLEAN;
isValueModified BOOLEAN;
BEGIN
/*RAISE NOTICE E'\n Running function: make_history_for_row_ ----------------\n\n Operation: %\n Schema: %\n Table: %\n',
TG_OP,
TG_TABLE_SCHEMA,
TG_TABLE_NAME;*/

IF (TG_OP = 'INSERT') OR (TG_OP = 'UPDATE') THEN
NEW.record_modified_ = clock_timestamp(); -- Record the moment this row is being saved.

FOR ri IN
-- Fetch a ResultSet listing columns defined for this trigger's table.
SELECT ordinal_position, column_name, data_type
FROM information_schema.columns
WHERE
table_schema = quote_ident(TG_TABLE_SCHEMA)
AND table_name = quote_ident(TG_TABLE_NAME)
ORDER BY ordinal_position

LOOP
-- For each column in this trigger's table, copy the OLD & NEW values into respective variables.
-- NEW value
EXECUTE 'SELECT ($1).' || ri.column_name || '::text' INTO newValue USING NEW;
-- OLD value
IF (TG_OP = 'INSERT') THEN -- If operation is an INSERT, we have no OLD value, so use an empty string.
oldValue := ''::varchar;
ELSE -- Else operation is an UPDATE, so capture the OLD value.
EXECUTE 'SELECT ($1).' || ri.column_name || '::text' INTO oldValue USING OLD;
END IF;

-- Make noise for debugging.
/*RAISE NOTICE E'\n Column #: %\n Name: %\n Type: %\n Old: %\n New: %\n',
ri.ordinal_position,
ri.column_name,
ri.data_type,
oldValue,
newValue;*/

-- ToDo: Add code to throw an Exception if the primary key value is changing (other than from NULL on an INSERT).

-- ToDo: Add code to ignore columns whose data type does not cast well to TEXT/VARCHAR.

-- Ignore some columns:
-- • Those whose names are marked with a trailing x.
-- • The primary key.
-- • Our timestamp fields recording the row's creation and most recent modification.
isColumnSignificant := (position( '_x_' in ri.column_name ) < 1) AND (ri.column_name <> 'pkey_') AND (ri.column_name <> 'record_created_') AND (ri.column_name <> 'record_modified_');
IF isColumnSignificant THEN
isValueModified := oldValue <> newValue; -- If this nthField in the table was modified, make history.
IF isValueModified THEN
/*RAISE NOTICE E'Inserting history_ row for INSERT or UPDATE.\n';*/
INSERT INTO history_ ( operation_, table_oid_, table_name_, uuid_, column_name_, ordinal_position_of_column_, old_value_, new_value_ )
VALUES ( TG_OP, TG_RELID, TG_TABLE_NAME, NEW.pkey_, ri.column_name::VARCHAR, ri.ordinal_position, oldValue::VARCHAR, newValue::VARCHAR );
END IF;
END IF;
END LOOP;

RETURN NEW;

ELSIF (TG_OP = 'DELETE') THEN
/*RAISE NOTICE E'Inserting history_ row for DELETE.\n';*/
-- Similar to INSERT above, but refers to OLD instead of NEW, and passes empty values for last 4 fields.
INSERT INTO history_ ( operation_, table_oid_, table_name_, uuid_, column_name_, ordinal_position_of_column_, old_value_, new_value_ )
VALUES ( TG_OP, TG_RELID, TG_TABLE_NAME, OLD.pkey_, ''::VARCHAR, 0, ''::VARCHAR, ''::VARCHAR );
RETURN OLD;

END IF;

/* Should never reach this point. Branching in code above should always reach a call to RETURN. */
RAISE EXCEPTION 'Unexpectedly reached the bottom of this function without calling RETURN.';

END;

$BODY$;

<end of code>

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Jasen Betts 2011-05-01 12:08:37 Re: Postgresql, PSN hack and table limits
Previous Message Mark Morgan Lloyd 2011-05-01 08:43:43 Re: Postgresql, PSN hack and table limits