Quick Links

Re: Rewriting Free Space Map

From:	Hannu Krosing <hannu(at)krosing(dot)net>
To:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Heikki Linnakangas <heikki(at)enterprisedb(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Luke Lonergan <llonergan(at)greenplum(dot)com>
Subject:	Re: Rewriting Free Space Map
Date:	2008-03-17 10:28:45
Message-ID:	1205749726.7381.23.camel@huvostro
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Sun, 2008-03-16 at 21:33 -0300, Alvaro Herrera wrote:
> Tom Lane wrote:
>
> > The idea that's becoming attractive to me while contemplating the
> > multiple-maps problem is that we should adopt something similar to
> > the old Mac OS idea of multiple "forks" in a relation. In addition
> > to the main data fork which contains the same info as now, there could
> > be one or more map forks which are separate files in the filesystem.

Are'nt we in a way doing this for indexes ?

> I think something similar could be used to store tuple visibility bits
> separately from heap tuple data itself, so +1 to this idea.

Not just "bits", but whole visibility info (xmin,xmax,tmin,tmax, plus
bits) should be stored separately.

A separate "fork" for visibility should be organized as a b-tree index
(as we already have well-honed mechanisms for dealing with those
effectively) but visibility fork is stored in a compressed form by
storing ranges of all-visible or all-deleted tuples as two endpoints
only and also the tree is reorganized when possible similar to what we
currently do for HOT updates.

This will keep the visibility index really small for cases with little
updates, most likely one or two pages regardless of table size.

One important difference from indexes is that visibility info should be
stored first, before writing data to heap and creating ordinary index
entries.

> (The rough idea in my head was that you can do an indexscan and look
> up visibility bits without having to pull the whole heap along; and
> visibility updates are also cheaper, whether they come from indexscans
> or heap scans. Of course, the implicit cost is that a seqscan needs to
> fetch the visibility pages, too; and the locking is more complex.)

another cost is heavy inserting/updating where there will probably be
more lock contention as visibility info for new tuples will more often
land on the same visibility pages due to visibility info being generally
smaller.

Of course, with visibility info in a separate fork, very narrow tables
will have the ratios reversed - for one byte wide table visibility info
will be a few times bigger than actual data, at least initially before
compression has kicked in.

--------------------
Hannu

In response to

Re: Rewriting Free Space Map at 2008-03-17 00:33:02 from Alvaro Herrera

Responses

Re: Rewriting Free Space Map at 2008-03-17 13:29:31 from Tom Lane

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Zeugswetter Andreas OSB SD	2008-03-17 10:55:58	Re: Remove hacks for old bad qsort() implementations?
Previous Message	Magnus Hagander	2008-03-17 08:17:29	Re: Commit fest?