Re: possible TODO: read-only tables, select from indexes

From: Hannu Krosing <hannu(at)skype(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Ron Mayer <rm_pg(at)cheapcomplexdevices(dot)com>, Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: possible TODO: read-only tables, select from indexes
Date: 2005-04-25 10:54:20
Message-ID: 1114426460.5848.1.camel@fuji.krosing.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I send it now the 3rd time because I'm not sure my mail works, as it has not appeared on pgsql-hackers

On L, 2005-04-23 at 18:27 -0400, Tom Lane wrote:
> Ron Mayer <rm_pg(at)cheapcomplexdevices(dot)com> writes:
> > Is this a fair summary of the potential benefits of READ-ONLY
> > tables? (from both this thread and the archives):
>
> > 1. Index-only scans are made possible fairly easily because
> > you wouldn't need to check the heap for visibility.
>
> > 2. Simple tables can be much smaller since you don't need
> > most of the HeapTupleHeaderData.
>
> What you are talking about is not a "read only" table, it is a
> "non-MVCC" table. This is a much greater assault on the fundamental
> semantics of Postgres than it's being painted to be in this thread.

That's why I proposed a much less ambitious, and much more "low-hanging-
fruit-first" compliant thing when I satrted the thread - a simple two-
flag system to mark a relation as safe to use for index-only queries.

Main use of that would be in partiotioned table setups in data
warehouses, where older partitions can be switched to read-only state.

and such all-or-nothing scheme would also make it much easier to
estimate cost of index(only)scan.

> In particular, how is such a table going to come into being? You'd
> not be able to just flip the READ-ONLY flag on and off.

original idea (with HeapTupleHeader intact) was to first mark the table
as READ-ONLY, and then either run VACUUM, or preferrably VACUUM-FULL +
REINDEX TABLE on it, so that all index tuples point to valid and visible
tuples.
A check must be made to make sure, that all transactions started before
setting the READ-ONLY flag have finished before starting VACUUM or
REINDEX.

> (The notion of having tuples in the system that don't have the standard
> HeapTupleHeader is not as easy to implement as you might think, either,
> because that data structure is *everywhere*.)

My impression was, that HeapTupleHeader is usually not carried with
fields after doing the initial visibility checks ?

But if it is needed, then it should be added when generating tuples from
index scan, preferrably in such a way, that non-SELECT queries get these
tuples with Xids set in a way which prevent them from being modified.

> While I don't say it's impossible to do, I do think that the work and
> semantic ugliness involved would outweigh the possible benefits. In
> particular, there are other, more transparent ways of doing #1.

If HeapTupleHeader is essential for pg, than #1 should generate fake
HeapTupleHeader with some constant values (xmin=FrozenTransactionId,
xmax=MAXINT).

BTW, do we really store tableoid column in heap tuples or is it added
somewhere on the way from heap ?

--
Hannu Krosing <hannu(at)skype(dot)net>

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Hannu Krosing 2005-04-25 10:55:01 How to make lazy VACUUM of one table run in several transactions ?
Previous Message Simon Riggs 2005-04-25 08:57:47 Re: Bad n_distinct estimation; hacks suggested?