Still struggling with history tables

From: "Ken Winter" <ken(at)sunward(dot)org>
To: "'Richard Huxton'" <dev(at)archonet(dot)com>
Cc: "'PostgreSQL pg-sql list'" <pgsql-sql(at)postgresql(dot)org>
Subject: Still struggling with history tables
Date: 2006-01-18 03:29:05
Message-ID: 004c01c61bdf$569b8ed0$6603a8c0@kenxp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-sql

Friends ~

I'm still trying to implement a solution to the requirement to keep a
complete history of data changes to a "person" table. (See earlier
correspondence below.) I'm trying for a variant of the architecture
suggested by Richard Huxton (also below). In my variant, I have this
"identifier" table, carrying the id and invariant info about each person:

/*==============================================================*/
/* Table: person_i */
/*==============================================================*/
create table person_i (
idi BIGSERIAL not null,
date_of_birth DATE null,
constraint PK_person_i_key_1 primary key (idi)
)
;

And then I have this "history" table, carrying the time-varying info on
which I want to keep a complete history:

/*==============================================================*/
/* Table: person_h */
/*==============================================================*/
create table person_h (
idh INT8 not null,
start_date DATE not null default 'now()',
end_date DATE null,
name VARCHAR(255) null,
constraint PK_person_h_key_1 primary key (idh, start_date),
constraint fk_reference_6 foreign key (idh)
references person_i (idi)
on delete restrict on update restrict
)
;

Triggers are in place on the "person_h" table so that when an app does an
update, the current h record is expired (with its old data) and a new record
(wuth the updated data)is inserted and made effective "now". What I'm now
trying to build is this view:

/*==============================================================*/
/* View: person */
/*==============================================================*/
create view person as
select idi, date_of_birth, start_date, end_date, name
from person_i i, person_h h
where i.idi = h.idh;

I want to enable users (and apps) who need to deal only with current data to
be able to treat "person" as a real table (i.e. to write to it as well as
read from it). Specifically, the requirements are that when a user does:

. Insert - The system inserts a record into the i table and the first
record in the h table.
. Select - The system returns attributes of i and h tables (not
duplicating the identifier columns).
. Update - The system allows updating of i attributes
(update-in-place, not creating a new history record) and h attributes
(creating a new history record).
. Delete - The system deletes the i record and all of its h records.

I'm stuck on how to implement the "insert" action, which I thought would be
simple. The problem is this: The i table id is of type BIGSERIAL, i.e.
sequence-assigned. I've tried writing the following rule to get both the i
record and the first h record inserted:

CREATE RULE ru AS
ON INSERT TO person
DO INSTEAD (
INSERT INTO person_i DEFAULT VALUES;
INSERT INTO person_h (idh) VALUES (NEW.idi)
);

I thought this would grab the sequence-assigned value of person_i.idi to put
into person_h.idh (this trick works in trigger functions), but instead it
just assigns Null to person_h.idh, and the transaction fails with a "not
null" violation on person_h.idh. And now that I look at the documentation
(http://www.postgresql.org/docs/7.4/static/rules-update.html), it says that
a column not assigned in the invoking query "is replaced by a null value
(for an INSERT)" in the NEW pseudo-record. Bummer. Triggers behave nicely,
but rules don't.

I'd be willing to do it with a trigger function instead, but I can't attach
a trigger to a view.

I considered doing it with a trigger function on the person_i table, but I
don't know how that could be made to cause an insert of the person_h table
record - and the assignment of h table values such as "name" from the app's
query.

Suggestions?

~ TIA
~ Ken

> -----Original Message-----
> From: Richard Huxton [mailto:dev(at)archonet(dot)com]
> Sent: Tuesday, December 20, 2005 4:16 AM
> To: Ken Winter
> Cc: 'PostgreSQL pg-sql list'
> Subject: Re: [SQL] Rule causes baffling error
>
> Ken Winter wrote:
> > Richard ~
> >
> > Let me zoom out for a moment, for the bigger picture.
> >
> > As you have inferred, what I'm trying to do is develop a history-
> preserving
> > table ("my_data" in the example that started this thread). *Most* user
> > programs would see and manipulate this table as if it contained only the
> > current rows (marked by effective_date_and_time <= 'now' and
> > expiration_date_and_time = 'infinity').
> >
> > When these programs do an INSERT, I need automatic actions that set the
> > expiration and date timestamps to 'now' and 'infinity'; when they do an
> > UPDATE, I need automatic actions that save the old data in a history
> record
> > and expire it as of 'now' and the new data in a record that's effective
> > 'now' and expires at 'infinity'; when they do a DELETE, I need an
> automatic
> > action to expire the target record as of 'now' rather than actually
> deleting
> > it.
...
>
> > However, I also need certain maintenance programs, designed to enable
> > certain users to correct inaccurately entered data. These need to be
> able
> > to "rewrite history" by doing actions against "my_data" without these
> > automatic actions occurring. It may prove advisable to provide some
> > automatic actions for these programs too, but they definitely won't be
> the
> > actions described above. If the above actions were implemented as
> triggers,
> > all the ways I could think of to conditionally disable them (and
> possibly
> > replace them with other actions) seemed architecturally very klunky.
> That's
> > when I decided I needed the "my_data_now" view, and from that I inferred
> > (apparently correctly) that the actions would have to be implemented as
> > rewrite rules.
>
> > The cascading problem was solvable. But the solution was a bit hard to
> > reach because the user-invoked UPDATE action triggered both an INSERT
> and an
> > UPDATE on the same table (and user DELETE triggered an UPDATE), and so
> one
> > had to take into account that all of these triggered actions would cause
> > their triggers to fire again. Not a deal-killer, but the solution felt
> > brittle.
> >
> > Yes, I did consider having a "live" table and a separate "history"
> table.
> > The killer of that idea was my inability to find a way to implement
> foreign
> > keys that could refer to both tables and that could follow a record when
> it
> > was moved from "live" to "history". Much of the history I'm trying to
> > preserve is not in the "my_data" table; it's in related tables that
> refer to
> > it. I presumably could do this by not declaring the FKs to PostgreSQL,
> and
> > implementing the necessary referential integrity with triggers, but -
> well,
> > in a word, yuck.
>
> If you're going to do this with multiple tables you actually need (at
> least) three. For example, if you had different versions of e.g.
> documents being stored you would want:
> document - invariants: the id, perhaps document-type.
> FKeys link to this.
> A row is only deleted from here if all live+history
> is also deleted.
> document_live - the one that gets edited.
> 1:1 relationship with document if still live
> document_hist - with timestamps. N:1 with document
>
> Have a google for Temporal Databases too - there's a lot of thinking
> been done about this.
>
> >
> > As it happens, I have found a rewrite of my UPDATE rule that works, so
> my
> > immediate need is past. FYI, the old update rule was:
> >
> [snip]
>
> > The relevant change is that I'm now expiring the record with the old
> data
> > and inserting the one with the new data, rather than vice versa. I
> still
> > don't know why the old rule didn't work and this one does, but hey,
> > whatever. Another advantage of the new one is that I don't have to re-
> point
> > foreign keys that were already pointed to the record containing the old
> > data, because that record stays in place.
> >
> > (The other change, adding the lines
> > AND effective_date_and_time <= CURRENT_TIMESTAMP
> > AND expiration_date_and_time >= CURRENT_TIMESTAMP;
> > to the UPDATE, was necessary to keep updates to the "my_data_now" from
> > updating the expired rows as well.)
>
> Make sure you test it with inserts/updates of multiple rows too.
>
> --
> Richard Huxton
> Archonet Ltd

In response to

Responses

Browse pgsql-sql by date

  From Date Subject
Next Message Ivan Steganov 2006-01-18 04:43:18 Matching several rows
Previous Message Christian Paul B. Cosinas 2006-01-17 22:21:25 Characters that needs escape characters when inserting to database