Re: Data archiving/warehousing idea

From: Gavin Sherry <swm(at)alcove(dot)com(dot)au>
To: Chris Dunlop <chris(at)onthe(dot)net(dot)au>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Data archiving/warehousing idea
Date: 2007-02-01 02:24:41
Message-ID: Pine.LNX.4.58.0702011307380.12097@linuxworld.com.au
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, 1 Feb 2007, Chris Dunlop wrote:

> G'day hackers,

G'Day Chris,

> already - I couldn't find anything in the mail archives, but
> that doesn't mean it's not there...)

There has been a lot of discussion about this kind of thing over the
years.

> The main idea is that, there might be space utilisation and
> performance advantages if postgres had "hard" read-only tables,
> i.e. tables which were guaranteed (by postgres) to never have
> their data changed (insert/update/delete).
>
> This could potentially save disk space by allowing "book
> keeping" elements in the page/tuple headers to be removed, e.g.
> visibility information etc. Also, some indexes could
> potentially be packed tighter if we know the data will never
> change (of course this is already available using the fillfactor
> control).

Well, there is also CPU overhead doing MVCC but there are a few
fundamental problems that must be overcome. The most significant is that
no useful table is always read only, otherwise you could never load it.
What do we do in the presence of a failure during the load or a user
issued ABORT? I guess we'd truncate the table... What about replay after a
crash?

Another way of looking at it is, we use the 'bookkeeping' information in
the tuple header for concurrency and for handling the abortion of the
transaction.

> The idea would be to introduce a statement something like:
>
> ALTER TABLE foo SET ARCHIVE;

I'd not thought of that approach. There are two problems: some archive
tables are so large that loading them and then reprocessing them isn't
appealing. Secondly, we'd be rewriting the binary structure of the table
and this does not suit the non-overwriting nature of Postgres's storage
system.

A different approach discussed earlier involves greatly restricting the
way in which the table is used. This table could only be written to if an
exclusive lock is held; on error or ABORT, the table is truncated.

The problem is that a lot of this looks like a hack and I haven't seen a
very clean approach which has gone beyond basic brain dump.

Thanks,

Gavin

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2007-02-01 02:38:36 Re: DROP FUNCTION failure: cache lookup failed for relation X
Previous Message Koichi Suzuki 2007-02-01 02:07:27 Full page writes improvement