Re: temporal support patch

From: "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To: "Gavin Flower" <GavinFlower(at)archidevsys(dot)co(dot)nz>
Cc: "Josh Berkus" <josh(at)agliodbs(dot)com>, "Jeff Davis" <pgsql(at)j-davis(dot)com>,<pgsql-hackers(at)postgresql(dot)org>
Subject: Re: temporal support patch
Date: 2012-08-21 22:07:53
Message-ID: 5033C06902000025000499AE@gw.wicourts.gov
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Gavin Flower <GavinFlower(at)archidevsys(dot)co(dot)nz> wrote:

> So if I understand correctly...
>
> If there is a very long running transaction, say 1 hour, then all
> (or just some? - depending) transactions that nominally start and
> finish within that time, can not have definitive start times until
> the very long running transaction finishes, even if they are
> successfully committed?

That's not correct. Any transaction which started can certainly
have a start time. Any transaction which completed can certainly
have a commit or rollback time. What they *can't* have is a known
position in the apparent order of execution for serializable
transactions, which might be different from the order of start and
commit times.

The fact that it has an unknown sequence number or timestamp for
purposes of ordering visibility of transactions doesn't mean you
can't show that it completed in an audit log. In other words, I
think the needs for a temporal database are significantly different
from the needs of an auditing system.

And keep in mind, we are only talking about seeing read-write
serializable transactions which might yet conflict with other
read-write serializable transactions when choosing to look at the
prior state within a temporal serializable transaction. That's easy
enough to avoid if you want to do so.

> So if someone looks at the audit log they might not see all the
> transactions they expect to see.

I would assume an audit log would have very different needs from
tracking changes for a temporal database view. It even seems
possible that you might want to see what people *looked* at, versus
just changes. You might want to see transactions which were rolled
back, which are irrelevant for a temporal view. If we're talking
about an auditing system, we're talking about an almost completely
different animal from a temporal view of the database.

> So, if I had an automatic query A which updated statistics based
> on on transactions committed over the last 10 minutes, then many
> (all?) transactions starting and successfully completing during
> the time of the very long running transaction will never show up!

A statistics capture process like that doesn't seem like a place
where you care about the apparent order of execution of serializable
transactions.

> Here I am envisioning a query fired off every ten minutes looking
> for audit records with timestamps within the previous ten minutes.

Which timestamp would make sense for that?

> However, if I ran a query B looking at audit record numbers with
> in 10 minute intervals for a week, but kicked off 24 hours after
> the week finished -- then I would see the records I did not see in
> query A.
>
> Hmm... if I am at all right, then probably best to have some
> suitably worded 'government health warning' prominent in the
> documentation!

We're clearly talking at cross purposes. I'm discussing what is
needed to be able to see a past state of the database in a
transaction which would only see states of the database which are
consistent with some serial execution of serializable transactions
which modified the database, and you're talking about an audit
table. If we're actually talking about an audit system, I have a
whole different set of concerns, and I would not be bringing this
one up.

The whole point of my concern is that if you have a business rule
enforced by database triggers that rows in a certain table contain
some batch identifier and no rows can be added to a batch after some
database change flags that batch as closed, then *without* what I
suggest, you could view a closed batch and see one set of rows, and
view the batch at a later point in time and magically see rows
appear in violation of the enforced business rules. I'm talking
about *preventing* surprising appearances of data "after the fact".
You would need a big warning if you *don't* have what I suggest.

Now, it is quite possible that one table (or set of tables) could do
double-duty for both temporal queries and auditing, but the fact
that something is not needed there for one purpose doesn't mean it
isn't needed for the other.

There are still some fuzzy areas around how things would look with a
*mix* of serializable and other transactions updating the database;
but I think in the long run we will find that people either want to
do all of their modifications through SSI, or none of them. The
guarantees get pretty weak if you don't know that all transactions
were part of the review of "dangerous structures" which can cause
anomalies.

If anyone is still unclear about serializable transactions in
PostgreSQL and wants to invest the time needed to read 12 pages
(including footnotes and pretty colored performance graphs) to
learn about it -- the paper which is going to be presented at the
VLDB conference next week goes at it from a different angle than I
usually approach it, and it may "click" with many people where my
discussions have fallen short.

http://vldb.org/pvldb/vol5/p1850_danrkports_vldb2012.pdf

-Kevin

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Kevin Grittner 2012-08-21 22:16:44 Re: temporal support patch
Previous Message Alvaro Herrera 2012-08-21 22:05:09 Re: temporal support patch