A more general approach (Re: Data archiving/warehousing idea)

From: Hannu Krosing <hannu(at)skype(dot)net>
To: Gavin Sherry <swm(at)alcove(dot)com(dot)au>
Cc: Chris Dunlop <chris(at)onthe(dot)net(dot)au>, pgsql-hackers(at)postgresql(dot)org, Simon Riggs <simon(at)2ndquadrant(dot)com>
Subject: A more general approach (Re: Data archiving/warehousing idea)
Date: 2007-02-01 12:38:14
Message-ID: 1170333495.3226.9.camel@localhost.localdomain
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Ühel kenal päeval, N, 2007-02-01 kell 13:24, kirjutas Gavin Sherry:

> A different approach discussed earlier involves greatly restricting the
> way in which the table is used. This table could only be written to if an
> exclusive lock is held; on error or ABORT, the table is truncated.
>
> The problem is that a lot of this looks like a hack and I haven't seen a
> very clean approach which has gone beyond basic brain dump.

A more radical variation of the "restricted-use archive table" approach
is storing all tuple visibility info in a separate file.

At first it seems to just add overhead, but for lots (most ? ) usecases
the separately stored visibility should be highly compressible, so for
example for bulk-loaded tables you could end up with one bit per page
saying that all tuples on this page are visible.

Also this could be used to speed up vacuums, as only the visibility
table needs to be scanned duting phase 1 of vacuum, and so tables with
localised/moving hotspots can be vacuumed withoutd scanning lots of
static data.

Also, storing the whole visibility info, but in a separate heap, lifts
all restrictions of the "restricted-use archive table" variant.

And the compression of visibility info (mostly replacing per-tuple info
with per-page info) can be carried out by a separate vacuum-like
process.

And it has many of the benefits of static/RO tables, like space saving
and index-only queries. Index-only will of course need to get the
visibility info from visibility heap, but if it is mostly heavily
compressed, it will be a lot cheaper than random access to data heap.

--
----------------
Hannu Krosing
Database Architect
Skype Technologies OÜ
Akadeemia tee 21 F, Tallinn, 12618, Estonia

Skype me: callto:hkrosing
Get Skype for free: http://www.skype.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Hannu Krosing 2007-02-01 12:51:51 Re: A more general approach (Re: Data archiving/warehousing idea)
Previous Message Simon Riggs 2007-02-01 11:49:55 Re: Improving NOT IN