Re: A more general approach (Re: Data archiving/warehousing idea)

From: Hannu Krosing <hannu(at)skype(dot)net>
To: Gavin Sherry <swm(at)alcove(dot)com(dot)au>
Cc: Chris Dunlop <chris(at)onthe(dot)net(dot)au>, pgsql-hackers(at)postgresql(dot)org, Simon Riggs <simon(at)2ndquadrant(dot)com>
Subject: Re: A more general approach (Re: Data archiving/warehousing idea)
Date: 2007-02-01 12:51:51
Message-ID: 1170334311.3226.12.camel@localhost.localdomain
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Ühel kenal päeval, N, 2007-02-01 kell 14:38, kirjutas Hannu Krosing:
> Ühel kenal päeval, N, 2007-02-01 kell 13:24, kirjutas Gavin Sherry:
>
> > A different approach discussed earlier involves greatly restricting the
> > way in which the table is used. This table could only be written to if an
> > exclusive lock is held; on error or ABORT, the table is truncated.
> >
> > The problem is that a lot of this looks like a hack and I haven't seen a
> > very clean approach which has gone beyond basic brain dump.
>
> A more radical variation of the "restricted-use archive table" approach
> is storing all tuple visibility info in a separate file.
>
> At first it seems to just add overhead, but for lots (most ? ) usecases
> the separately stored visibility should be highly compressible, so for
> example for bulk-loaded tables you could end up with one bit per page
> saying that all tuples on this page are visible.
>
> Also this could be used to speed up vacuums, as only the visibility
> table needs to be scanned duting phase 1 of vacuum, and so tables with
> localised/moving hotspots can be vacuumed withoutd scanning lots of
> static data.
>
> Also, storing the whole visibility info, but in a separate heap, lifts
> all restrictions of the "restricted-use archive table" variant.
>
> And the compression of visibility info (mostly replacing per-tuple info
> with per-page info) can be carried out by a separate vacuum-like
> process.
>
> And it has many of the benefits of static/RO tables, like space saving
> and index-only queries. Index-only will of course need to get the
> visibility info from visibility heap, but if it is mostly heavily
> compressed, it will be a lot cheaper than random access to data heap.

For tables with fixed-width tuples it can probably be extended to
support vertical fragmentation as well, to get DWH benefits similar to
http://monetdb.cwi.nl/ .

--
----------------
Hannu Krosing
Database Architect
Skype Technologies OÜ
Akadeemia tee 21 F, Tallinn, 12618, Estonia

Skype me: callto:hkrosing
Get Skype for free: http://www.skype.com

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Simon Riggs 2007-02-01 13:09:16 Re: A more general approach (Re: Dataarchiving/warehousing idea)
Previous Message Hannu Krosing 2007-02-01 12:38:14 A more general approach (Re: Data archiving/warehousing idea)