Re: A more general approach (Re: Dataarchiving/warehousing idea)

From: "Simon Riggs" <simon(at)2ndquadrant(dot)com>
To: "Hannu Krosing" <hannu(at)skype(dot)net>
Cc: "Gavin Sherry" <swm(at)alcove(dot)com(dot)au>, "Chris Dunlop" <chris(at)onthe(dot)net(dot)au>, <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: A more general approach (Re: Dataarchiving/warehousing idea)
Date: 2007-02-01 13:09:16
Message-ID: 1170335357.3681.574.camel@silverbirch.site
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, 2007-02-01 at 14:38 +0200, Hannu Krosing wrote:
> Ühel kenal päeval, N, 2007-02-01 kell 13:24, kirjutas Gavin Sherry:
>
> > A different approach discussed earlier involves greatly restricting the
> > way in which the table is used. This table could only be written to if an
> > exclusive lock is held; on error or ABORT, the table is truncated.
> >
> > The problem is that a lot of this looks like a hack and I haven't seen a
> > very clean approach which has gone beyond basic brain dump.
>
> A more radical variation of the "restricted-use archive table" approach
> is storing all tuple visibility info in a separate file.
>
> At first it seems to just add overhead, but for lots (most ? ) usecases
> the separately stored visibility should be highly compressible, so for
> example for bulk-loaded tables you could end up with one bit per page
> saying that all tuples on this page are visible.
>
> Also this could be used to speed up vacuums, as only the visibility
> table needs to be scanned duting phase 1 of vacuum, and so tables with
> localised/moving hotspots can be vacuumed withoutd scanning lots of
> static data.
>
> Also, storing the whole visibility info, but in a separate heap, lifts
> all restrictions of the "restricted-use archive table" variant.
>
> And the compression of visibility info (mostly replacing per-tuple info
> with per-page info) can be carried out by a separate vacuum-like
> process.
>
> And it has many of the benefits of static/RO tables, like space saving
> and index-only queries. Index-only will of course need to get the
> visibility info from visibility heap, but if it is mostly heavily
> compressed, it will be a lot cheaper than random access to data heap.

I like that idea, as a non-default option, since in-line visibility is
important for OLTP applications.

This idea is sufficiently flexible to allow a range of use cases without
requiring a complete removal of functionality. Read-mostly is an
important use case for very large databases.

This is essentially the same thing as PhantomCommandId, just for the
whole tuple header. It's something that would go on pg_class easily,
just as WITH/WITHOUT OIDS has done.

--
Simon Riggs
EnterpriseDB http://www.enterprisedb.com

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2007-02-01 16:34:06 Re: "May", "can", "might"
Previous Message Hannu Krosing 2007-02-01 12:51:51 Re: A more general approach (Re: Data archiving/warehousing idea)