Re: Temporal extensions

From: Jim Nasby <Jim(dot)Nasby(at)BlueTreble(dot)com>
To: Dave Jones <dave(at)waveform(dot)org(dot)uk>, <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Temporal extensions
Date: 2015-04-28 02:49:34
Message-ID: 553EF53E.2070007@BlueTreble.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 4/27/15 6:08 PM, Dave Jones wrote:
>> (Though, I dislike using timestamps to do change/history tracking, but
>> >that's just me...)
> I've been playing around with history tracking (in the context of BI,
> typically with batch loaded reporting databases) for about 7-8 years now
> and always found timestamps perfect for the purpose, but are you perhaps
> referring to using it for audit purposes? If that's the case I'd agree
> entirely - this is absolutely the wrong tool for such things (which is
> something I need to put a bit more prominently in the docs - it's buried
> in the design section at the moment).

Most warehouses dumb things down to a day level, so it's probably OK there.

What I specifically don't like is that using a timestamp to try and
determine the order in which something happened is just fraught with
gotchas. For starters, now() is locked in when you do a BEGIN, but maybe
a newer transaction modifies a table before an older one does. Now the
ordering is *backwards*. You have the same problem with using an XID.
The only way I've thought of to make this guaranteed safe is to somehow
serialize the logging with something like

CREATE TABLE customer_history(
customer_hid serial primary key -- hid == history_id
, previous_customer_hid int references customer_history
, customer_id int NOT NULL references customer
...
);
CREATE UNIQUE INDEX ... ON customer_history(previous_customer_hid) WHERE
previous_customer_hid IS NOT NULL;
CREATE UNIQUE INDEX ... ON customer_history(customer_hid) WHERE
previous_customer_hid IS NULL;

and then have a before trigger enforce
NEW.previous_customer_hid := customer_history__get_latest(customer_id)

where customer_history__get_latest() will 'walk the chain' starting with
the first link customer_id = blah AND previous_customer_id = NULL

Because of the indexes that will serialize inserts on a per-customer
basis. You could still run into problems with a newer snapshot creating
a history record before a transaction with an older snapshot does
though. :( Though, if you included txid_current_snapshot() with each
record you could probably detect when that happens.

> Or did you mean ranges would be better? They certainly looked intriguing
> when I started moving this stuff to postgres, and I'd like to re-visit
> them in the near future as they offer capabilities I don't have with
> timestamps (such as guaranteeing no overlapping ranges via exclusion
> constraints) but my initial tests suggested some rather major
> performance degradation so I put it on the back-burner at first.

If you're going to keep both a start and end for each record you'd
definitely want to do it with a range. If you're only keeping the change
time then you can handle it differently.
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrew Dunstan 2015-04-28 03:06:17 Re: Allow SQL/plpgsql functions to accept record
Previous Message Sawada Masahiko 2015-04-28 02:37:23 Re: Proposal: knowing detail of config files via SQL