Re: Still struggling with history tables

From: Achilleus Mantzios <achill(at)matrix(dot)gatewaynet(dot)com>
To: Ken Winter <ken(at)sunward(dot)org>
Cc: "'Richard Huxton'" <dev(at)archonet(dot)com>, "'PostgreSQL pg-sql list'" <pgsql-sql(at)postgresql(dot)org>
Subject: Re: Still struggling with history tables
Date: 2006-01-18 10:23:28
Message-ID: Pine.LNX.4.44.0601181145380.14453-100000@matrix.gatewaynet.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-sql

O Ken Winter έγραψε στις Jan 17, 2006 :

> Friends ~
>
> I'm still trying to implement a solution to the requirement to keep a
> complete history of data changes to a "person" table. (See earlier
> correspondence below.) I'm trying for a variant of the architecture
> suggested by Richard Huxton (also below). In my variant, I have this
> "identifier" table, carrying the id and invariant info about each person:
>
> /*==============================================================*/
> /* Table: person_i */
> /*==============================================================*/
> create table person_i (
> idi BIGSERIAL not null,
> date_of_birth DATE null,
> constraint PK_person_i_key_1 primary key (idi)
> )
> ;
>
> And then I have this "history" table, carrying the time-varying info on
> which I want to keep a complete history:
>
> /*==============================================================*/
> /* Table: person_h */
> /*==============================================================*/
> create table person_h (
> idh INT8 not null,
> start_date DATE not null default 'now()',
> end_date DATE null,
> name VARCHAR(255) null,
> constraint PK_person_h_key_1 primary key (idh, start_date),
> constraint fk_reference_6 foreign key (idh)
> references person_i (idi)
> on delete restrict on update restrict
> )
> ;
>
> Triggers are in place on the "person_h" table so that when an app does an
> update, the current h record is expired (with its old data) and a new record
> (wuth the updated data)is inserted and made effective "now". What I'm now
> trying to build is this view:
>
> /*==============================================================*/
> /* View: person */
> /*==============================================================*/
> create view person as
> select idi, date_of_birth, start_date, end_date, name
> from person_i i, person_h h
> where i.idi = h.idh;
>
> I want to enable users (and apps) who need to deal only with current data to
> be able to treat "person" as a real table (i.e. to write to it as well as
> read from it). Specifically, the requirements are that when a user does:
>
> . Insert - The system inserts a record into the i table and the first
> record in the h table.
> . Select - The system returns attributes of i and h tables (not
> duplicating the identifier columns).
> . Update - The system allows updating of i attributes
> (update-in-place, not creating a new history record) and h attributes
> (creating a new history record).
> . Delete - The system deletes the i record and all of its h records.
>
> I'm stuck on how to implement the "insert" action, which I thought would be
> simple. The problem is this: The i table id is of type BIGSERIAL, i.e.
> sequence-assigned. I've tried writing the following rule to get both the i
> record and the first h record inserted:
>
> CREATE RULE ru AS
> ON INSERT TO person
> DO INSTEAD (
> INSERT INTO person_i DEFAULT VALUES;
> INSERT INTO person_h (idh) VALUES (NEW.idi)
> );

How about

INSERT INTO person_i (idi,dateofbirth)
VALUES(nextval('public.person_i_idi_seq'::text),new.dateofbirth);
INSERT INTO person_h(idh)
VALUES(currval('public.person_i_idi_seq'::text));

However, i have treated similar problems with two different approaches:

Generally it is very hard to distinguish between two kind of UPDATES:

a) UPDATEs that mean real data updates and they should be recorded
to the history system.
b) UPDATEs that are just false data entry, and they should mean
just plain correction UPDATES, with no recording.

This distinguishability is very important, otherwise someone would
define a way to store historic data of changes to the historic data
themselves, and so on.

1) Is the history data frequently needed? Are there a lot of apps
hitting these historic data? Do we need the freedom to query current live
data as well as past data in a uniform manner?

Then i just use one table, with endtimestamp is null meaning
this is a current (alive) record.
Then i write triggers to enforce interval wise integrity to the table,
(e.g. No records A,B exist with A<>B,A,B for the same person, so that
(A.starttimestamp,coalesce(A.endtimestamp,now())) overlaps with
(B.starttimestamp,coalesce(B.endtimestamp,now()))

This way i give people the ability to do what they want with the table.
The triggers do the job of enforcing integrity.

In this case what we mean as historic is
"what users define and input as historic".
Users are in charge here, not the DB.

2) If on the other hand, historic data are just a convinient way
of accessing history data, instead of going to find the backup of this
past day
in the computer room,
then I keep one and only live table, and one trigger managed
history table.

The trigger as in your case does the "blind" job of creating
history records, and the users view only the real table.
Here the drawback is that even just erroneous data entry
creates historic data.

A variation of 2) is what Richard suggested.
I think when we are dealing with computer generated data,
solution 2) is best.
When we are dealing with human data, (when the human knows how to
differentiate a real update from a wrong input) i think full freedom
to the user must be given with solution 1).

If you ask me, (altho i havent implemented that), i would do it
ALL from the application, and define 2 kind of updates operations:

UPDATE = Real Data Update, (creates historic data)
CORRECTION = Correction to either the live or historic data (creates no
historic data)

If you write in a modern language (java) it is very easy to create
modules for these kind of things.

>
> I thought this would grab the sequence-assigned value of person_i.idi to put
> into person_h.idh (this trick works in trigger functions), but instead it
> just assigns Null to person_h.idh, and the transaction fails with a "not
> null" violation on person_h.idh. And now that I look at the documentation
> (http://www.postgresql.org/docs/7.4/static/rules-update.html), it says that
> a column not assigned in the invoking query "is replaced by a null value
> (for an INSERT)" in the NEW pseudo-record. Bummer. Triggers behave nicely,
> but rules don't.
>
> I'd be willing to do it with a trigger function instead, but I can't attach
> a trigger to a view.
>
> I considered doing it with a trigger function on the person_i table, but I
> don't know how that could be made to cause an insert of the person_h table
> record - and the assignment of h table values such as "name" from the app's
> query.
>
> Suggestions?
>
> ~ TIA
> ~ Ken
>
> > -----Original Message-----
> > From: Richard Huxton [mailto:dev(at)archonet(dot)com]
> > Sent: Tuesday, December 20, 2005 4:16 AM
> > To: Ken Winter
> > Cc: 'PostgreSQL pg-sql list'
> > Subject: Re: [SQL] Rule causes baffling error
> >
> > Ken Winter wrote:
> > > Richard ~
> > >
> > > Let me zoom out for a moment, for the bigger picture.
> > >
> > > As you have inferred, what I'm trying to do is develop a history-
> > preserving
> > > table ("my_data" in the example that started this thread). *Most* user
> > > programs would see and manipulate this table as if it contained only the
> > > current rows (marked by effective_date_and_time <= 'now' and
> > > expiration_date_and_time = 'infinity').
> > >
> > > When these programs do an INSERT, I need automatic actions that set the
> > > expiration and date timestamps to 'now' and 'infinity'; when they do an
> > > UPDATE, I need automatic actions that save the old data in a history
> > record
> > > and expire it as of 'now' and the new data in a record that's effective
> > > 'now' and expires at 'infinity'; when they do a DELETE, I need an
> > automatic
> > > action to expire the target record as of 'now' rather than actually
> > deleting
> > > it.
> ...
> >
> > > However, I also need certain maintenance programs, designed to enable
> > > certain users to correct inaccurately entered data. These need to be
> > able
> > > to "rewrite history" by doing actions against "my_data" without these
> > > automatic actions occurring. It may prove advisable to provide some
> > > automatic actions for these programs too, but they definitely won't be
> > the
> > > actions described above. If the above actions were implemented as
> > triggers,
> > > all the ways I could think of to conditionally disable them (and
> > possibly
> > > replace them with other actions) seemed architecturally very klunky.
> > That's
> > > when I decided I needed the "my_data_now" view, and from that I inferred
> > > (apparently correctly) that the actions would have to be implemented as
> > > rewrite rules.
> >
> > > The cascading problem was solvable. But the solution was a bit hard to
> > > reach because the user-invoked UPDATE action triggered both an INSERT
> > and an
> > > UPDATE on the same table (and user DELETE triggered an UPDATE), and so
> > one
> > > had to take into account that all of these triggered actions would cause
> > > their triggers to fire again. Not a deal-killer, but the solution felt
> > > brittle.
> > >
> > > Yes, I did consider having a "live" table and a separate "history"
> > table.
> > > The killer of that idea was my inability to find a way to implement
> > foreign
> > > keys that could refer to both tables and that could follow a record when
> > it
> > > was moved from "live" to "history". Much of the history I'm trying to
> > > preserve is not in the "my_data" table; it's in related tables that
> > refer to
> > > it. I presumably could do this by not declaring the FKs to PostgreSQL,
> > and
> > > implementing the necessary referential integrity with triggers, but -
> > well,
> > > in a word, yuck.
> >
> > If you're going to do this with multiple tables you actually need (at
> > least) three. For example, if you had different versions of e.g.
> > documents being stored you would want:
> > document - invariants: the id, perhaps document-type.
> > FKeys link to this.
> > A row is only deleted from here if all live+history
> > is also deleted.
> > document_live - the one that gets edited.
> > 1:1 relationship with document if still live
> > document_hist - with timestamps. N:1 with document
> >
> > Have a google for Temporal Databases too - there's a lot of thinking
> > been done about this.
> >
> > >
> > > As it happens, I have found a rewrite of my UPDATE rule that works, so
> > my
> > > immediate need is past. FYI, the old update rule was:
> > >
> > [snip]
> >
> > > The relevant change is that I'm now expiring the record with the old
> > data
> > > and inserting the one with the new data, rather than vice versa. I
> > still
> > > don't know why the old rule didn't work and this one does, but hey,
> > > whatever. Another advantage of the new one is that I don't have to re-
> > point
> > > foreign keys that were already pointed to the record containing the old
> > > data, because that record stays in place.
> > >
> > > (The other change, adding the lines
> > > AND effective_date_and_time <= CURRENT_TIMESTAMP
> > > AND expiration_date_and_time >= CURRENT_TIMESTAMP;
> > > to the UPDATE, was necessary to keep updates to the "my_data_now" from
> > > updating the expired rows as well.)
> >
> > Make sure you test it with inserts/updates of multiple rows too.
> >
> > --
> > Richard Huxton
> > Archonet Ltd
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 5: don't forget to increase your free space map settings
>

--
-Achilleus

In response to

Responses

Browse pgsql-sql by date

  From Date Subject
Next Message Volkan YAZICI 2006-01-18 11:55:17 Re: Matching several rows
Previous Message Michael Glaesemann 2006-01-18 09:59:28 Re: Still struggling with history tables